-
Notifications
You must be signed in to change notification settings - Fork 8
Basic concepts
Abel Serrano Juste edited this page Apr 13, 2018
·
12 revisions
Here we explain basic concepts we use in WikiChron, so that you can understand: what exactly we refer with certain terms, how is the data we work from and what are the assumptions we have taken.
- Page: Any wiki page of the wiki. Pages can be content pages (articles), but also User pages, Talk pages, Help Pages, etc. For our research, we are currently analyzing a subset of pages from the most relevant mediawiki namespaces. More on this later.
- Article. An article, as defined by mediawiki, is a page of content. More technically speaking, it's any page which belongs to the (Main) namespace of a wiki. However, sometimes Wikia refers to articles when it says pages (for instance, in the top right counter of "pages" of any Wikia wiki)
- Edition. This is the most atomic unity of a change in a wiki. It has info about the text changed, the date-time when it was made, the author (either anonymous or registered) and the page changed. An edition can be made only by one user in one page.
- Wiki dump. A record of all the edits made in a wiki. Wikichron uses a processed csv version of this dump in order to generate the plots.
- User. There are two types, and if no one is specified, we mean both of them. Registered user i.e. it has a user account with a account name an user page, and an Anonymous user which is identified by an IP address. This decision adds noise, but we consider that the no-aggregation of edits from the same IP is more misleading and less informative. Furthermore, an anonymous user can be editing from different IPs, or an anonymous user can be turn into a registered user at some time and being registered edit anonimously; however, in WikiChron we don't attemp these kinds of id merging. Read about a study about anonymous editors in Wikipedia here.
- Active users. We use the MediaWiki's definition which states that an active user is any user who have made any action (edit) to any page during the last 30 days.
- Article:Talk y User:Talk pages. These are discussion pages that wiki users use to coordinate and communicate with each other. In the past, previous research focused has been done on these because they help to measure coordination within a community. This is why we have created some specific metrics to show edits on this.
- Monthly and cumulative metrics. For many metrics we are interested in having both the value in a per-month basis (Monthly) and the value considering the sum of the values for the current month and all the previous months (Cumulative).
- Ratio metric. Metric which consists in a quotient of two metrics.
- Calendar dates. Time expressed in natural calendar dates i.e. Jan 2010, Feb 2010.
- Months from birth. Time expressed in discrete numbers of months from the date when the wiki was created.
- Contribution. As of today, whenever we say contribution we are referring to edition. However, we don't discard to use more complex evaluation in the future, like using the number of bytes, words, editions, etc.
- We are generating the dumps from the page history of the following namespaces: (-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 110, 111, 500, 501, 502, 503, 1200, 1201, 1202, 2000, 2001, 2002). All these namespaces are available automatically to any wiki created in Wikia, but there can be more pages in other namespaces, as well as wiki admins can add extra extensions which add more pages so the total number of pages. This is why the number of pages we analyze and the total number of pages shown in Special:Statistics may differ.
- Bots edits. In WikiChron, we have removed all the bot editions in order to drop off artificial noise. To do this, we need the user ids of those bots. This data should be in the
wikis.jsonfile. To retrieve the bots ids, we query for all the users in bot or bot-global user groups. - New users and user number. Since we are using a wiki editions dump, for a user to be counted in WikiChron stats she has to have done at least one edition during the whole wiki life. Hence, registered users with no editions will not be counted.
- Anonymous users. We always count anonymous editions, but we don't do identity merging more than assuming that anon users with same IP are the same user. The consideration that every IP is an anonumous editor has been used in other studies such as this work by Aaron Halfaker. As a result, actual users and editions per user can be slightly different in the reality.
- As of today, for distribution of work metrics we are using the cumulative data until any given date. However, we know that this approach can be lead to very inflexible values as wiki grows, so we are exploring better time ranges for these metrics.
- We have set a minimum number of users per metric that a wiki must have in order to calculate the metrics. You can find those values here: https://github.com/Grasia/WikiChron/blob/master/lib/metrics/stats.py#L17
- For more information, there is a wiki page with detailed description of the Metrics about distribution of participation.
There are two possibilities to display the time axis (x axis) for the time series graphs:
- Calendar dates. This option plots the data into the corresponding date it was generated, i.e. the axis will show the dates Dec 2011, Jan 2012, Feb 2012 and so on. This option is specially useful for one-wiki analysis.
- Months from birth. This option sets the time axis to a count of natural numbers starting from 1, where 1 is the month when the wiki was born, and the following numbers are the offset in months relative to when the wiki was born. This option is more useful for multi-wiki analysis, in this case, the count will start with the month of birth of the oldest (oldest birth date) wiki. Note that we are not taking in account that the birth month possibly have less days, and, hence, expected lower values than the rest of months. For instance, if foowiki was born in 9th of December 2011, the month 1 refers to the interval between 9 to 31st of December, including 23 days only; while the next month, month 2, will include the whole 31 days of January, the next one 28 and so on.