Slide 1

Slide 1 text

Time Lag Analysis of Adding Scholarly References to English Wikipedia 1 iConference 2023: Normality, Virtuality, Physicality, Inclusivity Jiro Kikkawa Masao Takaku Fuyuki Yoshikane { jiro, masao, fuyuki } @ slis.tsukuba.ac.jp How rapidly are they added to and how fresh are they? University of Tsukuba, Japan Paper: https://doi.org/10.1007/978-3-031-28032-0_33 Slide: https://speakerdeck.com/corgies/iconference2023

Slide 2

Slide 2 text

2 Background • Mass digitization of scholarly communication – Various kinds of communities and people, including non-traditional readers, such as researchers and specialists can utilize scholarly documents. – Wikipedia offers numerous references and access to scholarly documents, and Wikipedia is one of the largest referrers of Crossref DOIs as of 2015. • Scholarly references on Wikipedia – complement and improve the quality of Wikipedia content. Difficulties defining LIS "The question, 'What is library and information science?' does not elicit responses [...] Chua & Yang (2008) [10] studied papers published in Journal of the American Society for Information Science and Technology in the period 1988–1997 and found, among other things: "Top authors have grown in diversity from those being affiliated predominantly with library/information-related departments to include those from information systems management, information technology, business, and the humanities. […] " References 1. Bates, M.J. and Maack, M.N. (eds.). (2010). Encyclopedia of Library and Information Sciences. Vol. 1–7. CRC Press, Boca Raton, USA. Also available as an electronic source. […] 10. Chua, Alton Y.K.; Yang, Christopher C. (November 2008). "The shift towards multi- disciplinarity in information science". Journal of the American Society for Information Science and Technology. 59 (13): 2156– 2170. doi:10.1002/asi.20929. Figure 1. Example of the scholarly reference on English Wikipedia. Library and information science - Wikipedia https://en.wikipedia.org/wiki/Library_and_information_science

Slide 3

Slide 3 text

3 • Scholarly references complement and improve the quality of Wikipedia content. – Scholarly references on Wikipedia articles should be added as soon as possible. Moreover, the quantity and freshness of scholarly references are crucial to cover the latest academic knowledge. – However, little is known about them, such as how rapidly they are added and how fresh they are. – In this study, we conduct a time lag analysis regarding the editors and edits for adding scholarly references to Wikipedia to answer the following RQs. 1. How does the number of Wikipedia articles with scholarly references grow over time? 2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? 3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? RQ RQ RQ Purpose

Slide 4

Slide 4 text

4 • In this study, we conduct a time lag analysis regarding the editors and edits for adding scholarly references to Wikipedia to answer the following RQs. 1. How does the number of Wikipedia articles with scholarly references grow over time? 2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? 3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? RQ RQ RQ The contributions of this study 1. We clarified the long-term changes in the use of scholarly articles in the online encyclopedia community. 2. We attempted to identify the factors behind these changes in the online encyclopedia community. • If the time lag mentioned in RQ2 and RQ3 decreased over time, the factors causing this were investigated. Purpose

Slide 5

Slide 5 text

Related Works 5 n Analysis of scholarly references on Wikipedia n The shift from quantity to quality in the Wikipedia community n Analysis of the freshness of the references in scholarly articles

Slide 6

Slide 6 text

• Most previous studies have focused on the scholarly document itself, and little is known about the editors and their contributions to adding scholarly references to Wikipedia. 1. whether the scholarly articles published in high-impact factor journals tend to be more referenced on Wikipedia [Nielsen, 2007; Teplitskiy, 2016] 2. whether the scholarly articles published in open access journals tend to be more referenced on Wikipedia [Teplitskiy, 2016; Lin and Fenner, 2014; Pooladian and Borrego, 2017] 3. whether the references on Wikipedia are usable as a data source for research evaluations [Kousha and Thelwall, 2017] 4. investigations regarding the characteristics of Wikipedia articles with scholarly references [Pooladian and Borrego, 2017] 5. investigations regarding the references focused on specific identifiers (e.g., DOI, arXiv, ISSN, and ISBN) [Kikkawa, 2016; Kikkawa, 2020b; Halfaker and Taraborelli, 2019] or research fields [Thelwall, 2016; Pooladian and Borrego, 2017] 6. investigations regarding the editors and their edits for adding scholarly references to Wikipedia [Kikkawa, 2020a; Kikkawa, 2021b] Previous studies focused on the scholarly document itself Analysis of scholarly references on Wikipedia 6

Slide 7

Slide 7 text

• We proposed methods to identify the first appearances of scholarly references on Wikipedia using paper ttitles and their identifiers. – We built a dataset of the first appearances of scholarly references on English Wikipedia articles as of 1st March 2017. Next, we evaluated the precision for detecting the first appearance, which was overall 93.3% and exceeded 90% in 20 out of 22 research fields [Kikkawa, 2020a; Kikkawa, 2022] – In addition, we published an updated version of the dataset of the first appearances of scholarly references on English Wikipedia articles as of 1st October 2021 [Kikkawa, 2021a; Kikkawa, 2022] – Using this dataset above, we conduct a time lag analysis regarding added scholarly references to Wikipedia. Analysis of scholarly references on Wikipedia 7

Slide 8

Slide 8 text

Materials and Methods 8 n Definition of the terms n Dataset n Analysis Methods

Slide 9

Slide 9 text

1. Scholarly reference • The reference added to Wikipedia articles by which a certain paper and its research field are uniquely identifiable. • We did not consider roles, such as references, being used as evidence for a certain part of the content of the Wikipedia article, those just mentioning a paper, or those listed in further readings. 2. First appearance of the scholarly reference • The oldest scholarly reference added to each Wikipedia article. • If multiple references corresponding to the same paper in the same article were found, the oldest one was treated as the first appearance. Definition of the terms 9

Slide 10

Slide 10 text

• Dataset of first appearances of the scholarly references on English Wikipedia articles as of 1st October 2021 [Kikkawa et al., 2022] - The first appearances of scholarly references and their research fields were identified using Crossref DOIs and Essential Science Indicator categories - 1,474,347 scholarly references appearing in 313,240 English Wikipedia articles in the main namespace • Each editor is classified as the follows: - User editor: human editors among the registered editors - Bot editor: non-human editors among the registered editors - IP editor: non-registered editors Dataset 10

Slide 11

Slide 11 text

• We investigated the number of created Wikipedia articles containing scholarly references by editor types and their time-series transitions. Analysis Methods 11 Basic statistics of Wikipedia articles with scholarly references RQ1. How does the number of Wikipedia articles with scholarly references grow over time? Time lag between publishing each scholarly article and adding the corresponding reference to the Wikipedia article RQ2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? • We calculated the time lag between publishing of each scholarly article and adding the corresponding reference to the Wikipedia articles. - e.g., In the case when the timestamp of the first appearance of a scholarly reference is “2016-08-06 16:05:57 UTC” and the published year of the paper is 2015, the time lag is one year (= 2016 - 2015). - We removed cases when the published year was empty or the time lag was less than zero as an error.

Slide 12

Slide 12 text

Step 1 • We set the target as the first scholarly reference on each Wikipedia article. • The reason for filtering only the oldest references was to clarify the time period without references for each article and its transitions over time. Step 2 • We calculated the time lags between the creation date of each Wikipedia article and the date of adding the first reference to the article. • e.g., If the creation date of the Wikipedia article is “2001-11-22 16:37:56 UTC” and the date of adding the first reference to the article is “2016-08-06 16:05:57 UTC,” the time lag is 5370.98 days (converted from 464,052,481 seconds) Step 3 • We analyzed the characteristics and transitions of the time lag by comparing the groups for the creation years of Wikipedia articles. Analysis Methods 12 Time lag between the creation date of each Wikipedia article and the date of adding the first scholarly reference to the corresponding article RQ3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article?

Slide 13

Slide 13 text

Results and Discussion 13

Slide 14

Slide 14 text

Basic statistics of Wikipedia articles with scholarly references RQ1. How does the number of Wikipedia articles with scholarly references grow over time? 14 Years Total User editors Bot editors IP editors 2001-2002 14,951 12,280 82.13 % 0 0.00 % 2,671 17.87 % 2003-2004 34,633 25,913 74.82 % 111 0.32 % 8,609 24.86 % 2005-2006 53,211 46,054 86.55 % 174 0.33 % 6,983 13.12 % 2007-2008 52,395 39,592 75.56 % 12,782 24.40 % 21 0.04 % 2009-2010 30,439 28,241 92.78 % 2,103 6.91 % 95 0.31 % 2011-2012 23,954 23,635 98.67 % 167 0.70 % 152 0.63 % 2013-2014 22,920 22,491 98.13 % 261 1.14 % 168 0.73 % 2015-2016 21,677 21,298 98.25 % 214 0.99 % 165 0.76 % 2017-2018 28,222 23,283 82.50 % 4,810 17.04 % 129 0.46 % 2019-2020 22,151 21,926 98.98 % 6 0.03 % 219 0.99 % 2021 8,687 8,632 99.37 % 0 0.00 % 55 0.63 % Overall 313,240 273,345 87.26 % 20,628 6.59 % 19,267 6.15 % Table 2. Number of created Wikipedia articles containing scholarly references by editor types for every 2 years (n=313,240) • The total number of articles created peaked at 53,211 in 2005-2006, and approximately 20,000-30,000 articles were consistently created every 2 years.

Slide 15

Slide 15 text

Years Total User editors Bot editors IP editors 2001-2002 14,951 12,280 82.13 % 0 0.00 % 2,671 17.87 % 2003-2004 34,633 25,913 74.82 % 111 0.32 % 8,609 24.86 % 2005-2006 53,211 46,054 86.55 % 174 0.33 % 6,983 13.12 % 2007-2008 52,395 39,592 75.56 % 12,782 24.40 % 21 0.04 % 2009-2010 30,439 28,241 92.78 % 2,103 6.91 % 95 0.31 % 2011-2012 23,954 23,635 98.67 % 167 0.70 % 152 0.63 % 2013-2014 22,920 22,491 98.13 % 261 1.14 % 168 0.73 % 2015-2016 21,677 21,298 98.25 % 214 0.99 % 165 0.76 % 2017-2018 28,222 23,283 82.50 % 4,810 17.04 % 129 0.46 % 2019-2020 22,151 21,926 98.98 % 6 0.03 % 219 0.99 % 2021 8,687 8,632 99.37 % 0 0.00 % 55 0.63 % Overall 313,240 273,345 87.26 % 20,628 6.59 % 19,267 6.15 % Table 2. Number of created Wikipedia articles containing scholarly references by editor types for every 2 years (n=313,240) • Most articles were created by User editors, accounting for 87.26 % of the total. • The percentage for the Bot editors was low, at 6.59 %. Basic statistics of Wikipedia articles with scholarly references RQ1. How does the number of Wikipedia arAcles with scholarly references grow over Ame? 15

Slide 16

Slide 16 text

Time lag between publishing each scholarly article and adding the corresponding reference to the Wikipedia article RQ2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? Table 3. Results regarding the time lag between publishing scholarly articles and adding the corresponding references to Wikipedia articles every 2 years (n=1,458,546) Years # of the references added to Wikipedia articles The time lag in years Max Median Mode Mean SD 2001-2002 607 131 18.0 0 25.97 27.09 2003-2004 3,818 164 11.0 0 21.52 26.32 2005-2006 35,416 174 6.0 0 13.79 19.06 2007-2008 211,750 206 6.0 5 10.31 14.13 2009-2010 135,900 207 7.0 1 12.14 17.25 2011-2012 147,498 209 7.0 0 12.51 17.50 2013-2014 157,427 196 7.0 0 12.72 17.38 2015-2016 185,958 207 6.0 0 11.97 16.72 2017-2018 221,565 201 7.0 0 12.33 16.90 2019-2020 258,928 205 7.0 0 13.07 17.51 2021 99,679 204 7.0 0 12.92 17.53 • The “years” referred to when scholarly references were added to Wikipedia articles. e.g., 211,750 references were added to Wikipedia articles during 2007-2008. • The maximum values were consistently near 200 since 2007-2008. 16

Slide 17

Slide 17 text

Table 3. Results regarding the time lag between publishing scholarly articles and adding the corresponding references to Wikipedia articles every 2 years (n=1,458,546) Years The time lag in years Max Median Mode Mean SD 2001-2002 131 18.0 0 25.97 27.09 2003-2004 164 11.0 0 21.52 26.32 2005-2006 174 6.0 0 13.79 19.06 2007-2008 206 6.0 5 10.31 14.13 2009-2010 207 7.0 1 12.14 17.25 2011-2012 209 7.0 0 12.51 17.50 2013-2014 196 7.0 0 12.72 17.38 2015-2016 207 6.0 0 11.97 16.72 2017-2018 201 7.0 0 12.33 16.90 2019-2020 205 7.0 0 13.07 17.51 2021 204 7.0 0 12.92 17.53 • The median, mean, and standard deviation values were stable near 6.0-7.0, 10-13, and 14-17, respectively, after 2005-2006. Time lag between publishing each scholarly article and adding the corresponding reference to the Wikipedia article RQ2. How long is the Ame lag between the publishing date of each scholarly arAcle and the addiAon of the corresponding scholarly reference to Wikipedia arAcles? 17

Slide 18

Slide 18 text

• The mode values were either 0 or 1, except for 2007-2008. • The reason why the mode value was 5 in 2007-2008 is that the two papers [1, 2] published in 2002-2003 were added to 1,722 and 1,212 Wikipedia articles, respectively, by the Bot editor ProteinBoxBot in this period. • ProteinBoxBot creates articles related to human genes. Table 3. Results regarding the time lag between publishing scholarly articles and adding the corresponding references to Wikipedia articles every 2 years (n=1,458,546) Years The time lag in years Max Median Mode Mean SD 2001-2002 131 18.0 0 25.97 27.09 2003-2004 164 11.0 0 21.52 26.32 2005-2006 174 6.0 0 13.79 19.06 2007-2008 206 6.0 5 10.31 14.13 2009-2010 207 7.0 1 12.14 17.25 2011-2012 209 7.0 0 12.51 17.50 2013-2014 196 7.0 0 12.72 17.38 2015-2016 207 6.0 0 11.97 16.72 2017-2018 201 7.0 0 12.33 16.90 2019-2020 205 7.0 0 13.07 17.51 2021 204 7.0 0 12.92 17.53 1. Mammalian Gene Collection (MGC) Program Team: Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. PNAS 99(26), 16899–16903 (2002). 2. Ota, T., Suzuki, Y., Nishikawa, T., et al.: Complete sequencing and characterization of 21,243 full-length human cDNAs. Nature Genetics 36(1), 40–45 (2003). Time lag between publishing each scholarly article and adding the corresponding reference to the Wikipedia article RQ2. How long is the time lag between the publishing date of each scholarly article and the addition of the corresponding scholarly reference to Wikipedia articles? 18

Slide 19

Slide 19 text

Time lag between the creation date of each Wikipedia article and the date of adding the first scholarly reference to the corresponding article RQ3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? 19 A. 0 days and at the same time B. 0 days but not at the same time C. less than 1 month D. equal to or more than 1 month but less than 6 months E. equal to or more than 6 months but less than 1 year F. equal to or more than 1 year but less than 3 years G. equal to or more than 3 years but less than 5 years H. equal to or more than 5 years 2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Figure 2. Distribution of the time lag between creating the Wikipedia articles and adding the first scholarly references for every 2 years. • Figure 2 presents the distribution of the time lag between the creation Wikipedia articles and adding the first scholarly references for every 2 years. • Regarding the group of “0 days and at the same time,” the percentage increased significantly from 2005–2006 to 2007–2008 (from 9.05% to 36.00%).

Slide 20

Slide 20 text

A. 0 days and at the same time B. 0 days but not at the same time C. less than 1 month D. equal to or more than 1 month but less than 6 months E. equal to or more than 6 months but less than 1 year F. equal to or more than 1 year but less than 3 years G. equal to or more than 3 years but less than 5 years H. equal to or more than 5 years 2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Figure 2. Distribution of the time lag between creating the Wikipedia articles and adding the first scholarly references for every 2 years. • Regarding the group of “0 days and at the same time,” the percentage increased significantly from 2005–2006 to 2007–2008 (from 9.05% to 36.00%). A. 0 days and at the same time B. 0 days but not at the same time C. less than 1 month D. equal to or more than 1 month but less than 6 months E. equal to or more than 6 months but less than 1 year F. equal to or more than 1 year but less than 3 years G. equal to or more than 3 years but less than 5 years H. equal to or more than 5 years 2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html file:///Users/mona26/Dropbox/working/wikipedia_timelag2022/pageid_and_oldest_ref/highchart/timelag_add_between_page_created_and_first_ref_added.html 1/1 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% • In 2005, a hoax stating that a certain journalist had been a suspect in the assassinations of the president of the USA was added to the Wikipedia article, which became a social problem. • In 2006, Jimmy Wales declared that the Wikipedia community has traded in quantity for the quality of its contents. • The increase observed here could be seen as a response to this movement. Time lag between the creation date of each Wikipedia article and the date of adding the first scholarly reference to the corresponding article RQ3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? 20

Slide 21

Slide 21 text

A. 0 days and at the same time B. 0 days but not at the same time C. less than 1 month D. equal to or more than 1 month but less than 6 months E. equal to or more than 6 months but less than 1 year F. equal to or more than 1 year but less than 3 years G. equal to or more than 3 years but less than 5 years H. equal to or more than 5 years 2022/06/25 18:27 timelag_add_between_page_created_and_first_ref_added.html 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Figure 2. Distribution of the time lag between creating the Wikipedia articles and adding the first scholarly references for every 2 years. • The percentage of “0 days and at the same time” gradually increased over the years, except for 2009-2010, at 30.50%. In particular, it exceeded 50% and 60% in 2013–2014 and 2017–2018, respectively. 55.04% 61.60% Time lag between the creation date of each Wikipedia article and the date of adding the first scholarly reference to the corresponding article RQ3. How long is the time lag between the creation date of each Wikipedia article and the date of the first scholarly reference added to that article? 21

Slide 22

Slide 22 text

• We conducted a time lag analysis of adding scholarly references to the English Wikipedia as of October 2021. • We detected no tendencies for Wikipedia articles created recently referring to more fresh references because the time lag between publishing scholarly articles and adding references for the corresponding paper to Wikipedia articles was generally constant over the years. Conclusion 22

Slide 23

Slide 23 text

• We conducted a time lag analysis of adding scholarly references to the English Wikipedia as of October 2021. • Next, tendencies to decrease over time in the time lag between creating Wikipedia articles and adding the first scholarly references were observed. - The percentage of cases where scholarly references were added at the same time as Wikipedia articles were created increased over the years, particularly since the period 2007-2008. - This trend was regarded as a response to the policy changes in the Wikipedia community and adopted by various editors, rather than depending on massive activities conducted by a small number of editors. Conclusion 23

Slide 24

Slide 24 text

• Halfaker and Taraborelli, 2019. Halfaker, A. and Taraborelli, D. (2019). Research:Scholarly article citations in Wikipedia - Meta. https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia • Kikkawa, 2016. Kikkawa, J., Takaku, M., and Yoshikane, F. (2016). DOI Links on Wikipedia: Analyses of English, Japanese, and Chinese Wikipedias. In Proceedings of the 18th International Conference on Asia- Pacific Digital Libraries (ICADL 2016), pages 369–380. https://doi.org/10.1007/978-3-319-49304-6_40 • Kikkawa, 2020a. Kikkawa, J., Takaku, M., and Yoshikane, F. (2020a). A Method to Identify the Edits Adding Bibliographic References to Wikipedia. Journal of Japan Society of Information and Knowledge, 30(3):370– 389. (in Japanese, English abstract available). https://doi.org/10.2964/jsik_2020_033 • Kikkawa, 2020b. Kikkawa, J., Takaku, M., and Yoshikane, F. (2020b). Analyses of Wikipedia Editors Adding Bibliographic References based on DOI Links. Journal of Japan Society of Information and Knowledge, 30(1):21–41. (in Japanese, English abstract available). https://doi.org/10.2964/jsik_2020_004 • Kikkawa, 2021a. Kikkawa, J., Takaku, M., and Yoshikane, F. (2021a). Dataset of first appearances of the scholarly bibliographic references on English Wikipedia articles as of 1 March 2017 and as of 1 October 2021. Zenodo. https://doi.org/10.5281/zenodo.5595573 • Kikkawa, 2021b. Kikkawa, J., Takaku, M., and Yoshikane, F. (2021b). Time-series Analyses of the Editors and Their Edits for Adding Bibliographic References on Wikipedia. Journal of Japan Society of Information and Knowledge, 31(1):3–19. (in Japanese, English abstract available). https://doi.org/10.2964/jsik_2020_037 • Kikkawa, 2022. Kikkawa, J., Takaku, M., and Yoshikane, F. (2022). Dataset of first appearances of the scholarly bibliographic references on Wikipedia articles. Scientific Data, 9:article no. 85, pp. 1–11. https://doi.org/10.1038/s41597-022-01190-z References 24

Slide 25

Slide 25 text

• Kousha and Thelwall, 2017. Kousha, K. and Thelwall, M. (2017). Are wikipedia cita- tions important evidence of the impact of scholarly articles and books? Journal of the Association for Information Science and Technology, 68(3):762–779. https://doi.org/10.1002/asi.23694 • Lin and Fenner, 2014. Lin, J. and Fenner, M. (2014). An analysis of Wikipedia references across PLOS publications. figshare. https://doi.org/10.6084/m9.figshare.1048991.v3 • Nielsen, 2007. Nielsen, F. Å. (2007). Scientific citations in Wikipedia. First Monday, 12(8). https://doi.org/10.5210/fm.v12i8.1997 • Pooladian and Borrego, 2017. Pooladian, A. and Borrego, Á. (2017). Methodological issues in measuring citations in Wikipedia: a case study in Library and Information Science. Scientometrics, 113(1):455–464. https://doi.org/10.1007/s11192-017-2474-z • Teplitskiy, 2016. Teplitskiy, M., Lu, G., and Duede, E. (2016). Amplifying the impact of open access: Wikipedia and the diffusion of science. Journal of the Asso- ciation for Information Science and Technology, 68(9):2116–2127. https://doi.org/10.1002/asi.23687 • Thelwall, 2016. Thelwall, M. (2016). Does Astronomy research become too dated for the public? Wikipedia citations to Astronomy and Astrophysics journal articles 1996- 2014. El Profesional de la Información, 25(6):893–900. https://doi.org/10.3145/epi.2016.nov.06 References 25

Slide 26

Slide 26 text

Time Lag Analysis of Adding Scholarly References to English Wikipedia 26 iConference 2023: Normality, Virtuality, Physicality, Inclusivity Jiro Kikkawa Masao Takaku Fuyuki Yoshikane { jiro, masao, fuyuki } @ slis.tsukuba.ac.jp How rapidly are they added to and how fresh are they? University of Tsukuba, Japan Paper: https://doi.org/10.1007/978-3-031-28032-0_33 Slide: https://speakerdeck.com/corgies/iconference2023