Slide 1

Slide 1 text

Stats of 2021.05.07 https://speakerdeck.com/2hz9qeedd/stats-of-arxiv-2020 Related Work:

Slide 2

Slide 2 text

Summary nSurvey of papers published in bioRxiv uIn terms of disciplines, neuroscience is growing remarkably uMore than 40% of submitted manuscripts may eventually be published in journals. pFor those that were published, it took about half a year from the first submission to bioRxiv for publication, and the difference between the fields was not that great. 2

Slide 3

Slide 3 text

Specification nbioRxiv u Collect all items that can be collected as of April 17, 2021 through the API. nTotal Data: 117,293.* u Loading item: Title, Abstract, Author, Field, DOI, etc. u Period: November 07, 2013 - April 17, 2021 u Using Semantic Scholar, the cited references were also obtained. u If the article DOI has been assigned... pCollect journal name, publication date, etc. separately using CrossRef's API. 3 * Collected independently

Slide 4

Slide 4 text

bioRxiv Official Stats n https://api.biorxiv.org/reporting/home 4

Slide 5

Slide 5 text

Sample of bioRxiv 5

Slide 6

Slide 6 text

bioRxiv Category ( 27 + None = 28 ) 6

Slide 7

Slide 7 text

Number of recorded data Category 7 * It's not cumulative. has Journal DOI

Slide 8

Slide 8 text

Research award in DOI information 8 Number of papers with Journal DOI with Award information with Award information containing "Japan".

Slide 9

Slide 9 text

Percentage of papers with Journal DOI by field n Calculations are based on the 5-year period from October 2016 to the end of September 2020. 9

Slide 10

Slide 10 text

Time from submission to publication with Journal DOI n Calculations are based on the 8-year period from 2013 to the end of September 2020. 10

Slide 11

Slide 11 text

Number of citations per field n Calculations are based on five years of submissions from Oct 2016 to Sep 2020. 11

Slide 12

Slide 12 text

Highly Cited Paper n There is a bias in the top fields in terms of the num of citations. u Neuroscience, genetics, and ecology seem to be the most frequently cited fields. u The maximum number of citations is less than 1,000 within the scope of this survey, which is an order of magnitude higher than the 10,000 citations in the field of information science. 12 DOI date category title cite 1 10.1101/080333 2016-10-12 Neuroscience Genetic, transcriptome, proteomic and epidemiologi... 741 2 10.1101/099192 2017-01-09 Genetics Watching the clock for 25 years in FlyClockbase: V... 587 3 10.1101/203943 2017-10-16 Neuroscience Degeneracy in hippocampal physiology and plasticit... 573 4 10.1101/535005 2019-02-01 Ecology GIFT – A Global Inventory of Floras and Traits for... 562 5 10.1101/310763 2018-04-30 Epidemiology MicroCOSM: a model of social and structural driver... 558 6 10.1101/2020.03.23.003384 2020-03-23 Genetics Rat models of human diseases and related phenotype... 487 7 10.1101/2020.03.23.003392 2020-03-23 Genetics Rat models of human diseases and related phenotype... 487 8 10.1101/833988 2019-11-07 Neuroscience Arc Regulates Transcription of Genes for Plasticit... 485 9 10.1101/425488 2018-09-24 Ecology Complex responses of global insect pests to climat... 453 10 10.1101/503334 2018-12-26 Ecology Data paper: FoRAGE (Functional Responses from Arou... 445 11 10.1101/142760 2017-05-28 Bioinformatics Opportunities and obstacles for deep learning in b... 432 12 10.1101/307652 2018-04-28 Neuroscience Mapping molecular datasets back to the brain regio... 412 13 10.1101/405688 2018-08-31 Animal Behavior and Cognition The evolution of infanticide by females in mammals 406 14 10.1101/152264 2017-06-22 Bioinformatics Informatics for Cancer Immunotherapy 404 15 10.1101/2020.07.14.202085 2020-07-14 Neuroscience 10 years of EPOC: A scoping review of Emotiv’s por... 398 October 2016 ~ End of September 2020 Submission Score

Slide 13

Slide 13 text

Frequency by number of citations n The shape appears to be similar to a power distribution. 13 2016-2020

Slide 14

Slide 14 text

Frequency by number of citations (by field) 14 2016-2020

Slide 15

Slide 15 text

Frequency by number of citations (by field) 15 2016-2020

Slide 16

Slide 16 text

Frequency by number of citations (by field) 16 2016-2020

Slide 17

Slide 17 text

Difference from bioRxiv official data 17 -100 -50 0 50 100 150 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 2013-11 2014-04 2014-09 2015-02 2015-07 2015-12 2016-05 2016-10 2017-03 2017-08 2018-01 2018-06 2018-11 2019-04 2019-09 2020-02 2020-07 2020-12 Number of posts per month (own collection) Deviation from official values Independently collected data tends to be approximately 0.30% less than the official values. From 2020, there will be many months with almost no error.

Slide 18

Slide 18 text

Degree to which COVID-19-related 18 Those listed as related to COVID-19 account for a fairly small percentage of the total. The percentage of the total is quite small. 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 2014-01 2014-05 2014-09 2015-01 2015-05 2015-09 2016-01 2016-05 2016-09 2017-01 2017-05 2017-09 2018-01 2018-05 2018-09 2019-01 2019-05 2019-09 2020-01 2020-05 2020-09 2021-01 New Papers Cumulative COVID-19

Slide 19

Slide 19 text

Countries/Regions and Number of Submissions 19 Use only the first author's email address for each manuscript. gmail and hotmail are classified as Unknown .com, .edu, and .org are classified by the country code of the administrator. 3FHJPO $PVOU 64 6OLOPXO 6, (FSNBOZ $IJOB 'SBODF $BOBEB "VTUSBMJB +BQBO 4XJU[FSMBOE /FUIFSMBOET *OEJB 4QBJO 4XFEFO *UBMZ 3FHJPO $PVOU *TSBFM #SB[JM %FONBSL #FMHJVN /PSXBZ ,PSFB 'JOMBOE "VTUSJB 4JOHBQPSF 1PSUVHBM /FX;FBMBOE 1PMBOE .FYJDP 5BJXBO "SHFOUJOB

Slide 20

Slide 20 text

Countries/Regions and Number of Posts 20 : . 5PUBM 64 6OLOPXO 6, (FSNBOZ $IJOB 'SBODF $BOBEB "VTUSBMJB +BQBO 4XJU[FSMBOE /FUIFSMBOET *OEJB 4QBJO 4XFEFO *UBMZ *TSBFM #SB[JM %FONBSL #FMHJVN /PSXBZ ,PSFB 'JOMBOE "VTUSJB 4JOHBQPSF 1PSUVHBM : . 5PUBM 64 6OLOPXO 6, (FSNBOZ $IJOB 'SBODF $BOBEB "VTUSBMJB +BQBO 4XJU[FSMBOE /FUIFS 5PUBM Use only the first author's email address for each manuscript. gmail and hotmail are classified as Unknown. .com, .edu, and .org are classified by the country code of the administrator.

Slide 21

Slide 21 text

Field distribution 21

Slide 22

Slide 22 text

Field distribution 22

Slide 23

Slide 23 text

Field distribution 23

Slide 24

Slide 24 text

Field distribution 24

Slide 25

Slide 25 text

Field distribution 25

Slide 26

Slide 26 text

Field distribution 26

Slide 27

Slide 27 text

Field distribution 27

Slide 28

Slide 28 text

Field distribution 28

Slide 29

Slide 29 text

Field distribution 29

Slide 30

Slide 30 text

Field distribution 30

Slide 31

Slide 31 text

Field distribution 31

Slide 32

Slide 32 text

Field distribution 32

Slide 33

Slide 33 text

Field distribution 33

Slide 34

Slide 34 text

Field distribution 34

Slide 35

Slide 35 text

Field distribution 35

Slide 36

Slide 36 text

Field distribution 36 Field composition ratio compressed into two dimensions by multidimensional scaling method The composition of China and India is similar. Unknown, China and India, Italy, and Japan are different from the composition of other countries and regions.