Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stats of arXiv (2020)

FSCjJh3NeB
February 08, 2020

Stats of arXiv (2020)

We collected papers that archived at arXiv.
That number about 1.6M,
since 1986.04 to 2020.01.

Mathematics and Computer Science 's number of paper increases are quickly.

About 40% paper had DOI (2014-2018 at arXiv).
- Astro-phisics: 75%
- Math & CS: 20%
- Physics: 60%

In addition,
distance of pub-date, between arXiv and DOI (2000-2017 at arXiv),
are as follows:
- Astro-phisics: 3 month
- CS: 7 month
- Math: 11 month
- Physics: 6 month

FSCjJh3NeB

February 08, 2020
Tweet

More Decks by FSCjJh3NeB

Other Decks in Research

Transcript

  1. Summary nA survey of papers submitted to the arXiv uIn

    terms of disciplines, the growth of information science is remarkable since 2017 uMore than 40% of the submitted papers have DOIs. p There is a possibility that more than 40% of the papers are eventually accepted for publication. p However, there is a large bias in each field, with informatics accounting for about 20% of submissions. p The interval to publication varies greatly by field, but it takes about 6 months from the time of arXiv registration, and even in fields that take a long time, it takes about a year or less. 2
  2. Data desc. narXiv uCollect all items that can be collected

    as of Jan 21, 2020 through the API. ntotal number of data : 1,622,763. uItems: Title, Abstract, Author, Field, DOI, etc. uPeriod: Apr 25, 1986* - Jan 17, 2020 pAlso obtain cited references through the Semantic Scholar pIf a DOI is assigned... • Using CrossRef's API, journal name, publication date, etc. are also collected separately. 3 * The arXiv started in 1991, but some of the submission dates are earlier than that.
  3. Number of recorded data Field has/hasn’t DOI 4 * It's

    not cumulative. Only the first field is counted.
  4. 12 Fields of this slides 5 Meta Class Description astro-ph

    Astrophysics cond-mat Material cs Computer Science econ Economics hep Hi-energy Pysics math Math nlin Non-Linier nucl NewClear physics Physics q-bio Biology q-fin Finance stat Statistics Restructuring of arXiv's 153 fields into 12 categories For details, see Appendix
  5. Research grants/award in DOI information 6 Number of papers with

    DOI with Award information with Award information including "Japan" in the list
  6. Rate of DOI granting per field n Calculations are based

    on five years of submissions from 2014 to 2018. 7 A small percentage of mathematics and computer science fields are published in journals
  7. Time from submission to publication with DOI n Calculations are

    based on 18 years of submissions between 2000 and 2017. 8 The time between preprint submission and journal publication is long in the field of mathematics.
  8. Top 5 DOI recipients by field 9 ctg title count

    astro-ph The Astrophysical Journal 66168 astro-ph Monthly Notices of the Royal Astronomical Society 46747 astro-ph Physical Review D 34640 astro-ph Astronomy & Astrophysics 29896 astro-ph Journal of Cosmology and Astroparticle Physics 9880 cond-mat Physical Review B 74769 cond-mat Physical Review Letters 34033 cond-mat Physical Review E 20297 cond-mat Physical Review A 11216 cond-mat Applied Physics Letters 6801 cs Electronic Proceedings in Theoretical Computer Science 3983 cs IEEE Transactions on Signal Processing 1143 cs IEEE Transactions on Information Theory 1060 cs Logical Methods in Computer Science 583 cs IEEE Transactions on Wireless Communications 488 hep Physical Review D 65614 hep Journal of High Energy Physics 33701 hep Physics Letters B 27796 hep Nuclear Physics B 14027 hep Physical Review Letters 10340 math Journal of Mathematical Physics 7674 math Communications in Mathematical Physics 6842 math Journal of Physics A: Mathematical and Theoretical 6132 math Journal of Statistical Physics 2962 math Journal of High Energy Physics 2757 ctg title count nlin Physical Review E 4411 nlin Physical Review Letters 1942 nlin Journal of Physics A: Mathematical and Theoretical 940 nlin Journal of Physics A: Mathematical and General 815 nlin Physics Letters A 805 nucl Physical Review C 14503 nucl Physical Review D 4835 nucl Nuclear Physics A 4555 nucl Physics Letters B 4094 nucl Physical Review Letters 3130 physics Physical Review A 27012 physics Physical Review Letters 14807 physics Physical Review E 8539 physics Physical Review B 5882 physics New Journal of Physics 3941 q-bio Physical Review E 1836 q-bio Physical Review Letters 411 q-bio PLoS ONE 317 q-bio The Journal of Chemical Physics 269 q-bio PLoS Computational Biology 231 q-fin Physica A: Statistical Mechanics and its Applications 647 q-fin Physical Review E 119 stat The Annals of Statistics 1385 stat The Annals of Applied Statistics 897 stat Bernoulli 524 stat Statistical Science 411 stat IEEE Transactions on Signal Processing 208 1986-2020
  9. Number of citations per field n Calculations are based on

    five years of submissions from 2014 to 2018. 10 The field of computer science has a high number of citations.
  10. Number of citations per field (Top 15) 11 aid date

    category title cite 1 1502.03167v3 2015-02 cs.LG Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 9999 2 1409.4842v1 2014-09 cs.CV Going Deeper with Convolutions 9998 3 1201.0490v4 2012-01 cs.LG|cs.MS Scikit-learn: Machine Learning in Python 9997 4 1310.4546v1 2013-10 cs.CL|cs.LG|stat.ML Distributed Representations of Words and Phrases and their Compositionality 9997 5 1409.1556v6 2014-09 cs.CV Very Deep Convolutional Networks for Large-Scale Image Recognition 9996 6 1412.6980v9 2014-12 cs.LG Adam: A Method for Stochastic Optimization 9996 7 1512.03385v1 2015-12 cs.CV Deep Residual Learning for Image Recognition 9996 8 1409.0575v3 2014-09 cs.CV|I.4.8; I.5.2 ImageNet Large Scale Visual Recognition Challenge 9994 9 1506.01497v3 2015-06 cs.CV Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 9994 10 1301.3781v3 2013-01 cs.CL Efficient Estimation of Word Representations in Vector Space 8977 11 1408.5093v1 2014-06 cs.CV|cs.LG|cs.NE Caffe: Convolutional Architecture for Fast Feature Embedding 8977 12 1409.0473v7 2014-09 cs.CL|cs.LG|cs.NE|stat.ML Neural Machine Translation by Jointly Learning to Align and Translate 8727 13 1406.5823v1 2014-06 stat.CO Fitting Linear Mixed-Effects Models using lme4 8708 14 1311.2524v5 2013-11 cs.CV Rich feature hierarchies for accurate object detection and semantic segmentation 8145 15 1505.04597v1 2015-05 cs.CV U-Net: Convolutional Networks for Biomedical Image Segmentation 7797 Highly cited papers are biased toward information science fields
  11. Percentage of email addresses detected by category 21 n Approximately

    75% of papers include a contact email address. Linking regions to papers based on email addresses If there are multiple email addresses, use only the first one. Gmail and Hotmail are classified as unknown.
  12. US nFirst or second place in all fields nComputer Science

    is the field with the largest number u Artificial Intelligence (cs.LG, stat.ML, cs.AI) dominate in number 22 3BOL $PVOU 1DU 1DU                                         0WFS    /POF    DBUFHPSZ DOU SBOL DT-(   TUBU.-   DT$7   DT$-   DT"*   NBUI0$   DPOENBUNUSMTDJ   DT4:   DT30   BTUSPQI("   FFTT41   DT$3   DPOENBUNFTIBMM   BTUSPQI)&   DT*5   Distribution of rankings in 153 fields High ranking and number fields Ranked 1st in 97 of 153 fields Fields with zero publications Fields below 10th place
  13. Japan 23 3BOL $PVOU 1DU 1DU    

                                        0WFS    /POF    DBUFHPSZ DOU SBOL OMJO$(   DPOENBUTVQSDPO   NBUI(5   IFQMBU   RCJP$#   DPOENBUTUSFM   DPOENBUTUBUNFDI   OVDMUI   NBUI"$   NBUI"5   NBUI0"   NBUI4(   NBUI(/   DPOENBUPUIFS   DT..   RGJO45   RCJP./   DPOENBUNUSMTDJ   NBUI"(   DPOENBUNFTIBMM   NBUI%(   NBUI35   Ratio of the number of papers in the U.S. to 1
  14. China 24 3BOL $PVOU 1DU 1DU    

                                        0WFS    /POF    DBUFHPSZ DOU SBOL IFQQI   OVDMUI   DPOENBUTVQSDPO   DPOENBURVBOUHBT   DT..   DT$7   RVBOUQI   DPOENBUNUSMTDJ   DPOENBUNFTIBMM   DT*5   NBUI*5   FFTT41   QIZTJDTPQUJDT   NBUI"1   HSRD   FFTT*7   DPOENBUTUSFM   QIZTJDTBQQQI   IFQFY   DT/*   Ratio of the number of papers in the U.S. to 1
  15. 3BOL $PVOU 1DU 1DU      

                                      0WFS    /POF    France 25 Ratio of the number of papers in the U.S. to 1 DBUFHPSZ DOU SBOL RGJO$1   DT4$   NBUI13   NBUIQI   NBUI.1   NBUI/5   NBUI45   TUBU5)   DPOENBUEJTOO   NBUI41   QIZTJDTDMBTTQI   QIZTJDTHFPQI   DT'-   QIZTJDTBPQI   RCJP50   NBUI)0   DT..   RGJO(/   RCJP$#   RCJP4$   RCJP05   OMJO$(  
  16. 3BOL $PVOU 1DU 1DU      

                                      0WFS    /POF    Germany 26 Ratio of the number of papers in the U.S. to 1 DBUFHPSZ DOU SBOL DT.4   QIZTJDTBUNDMVT   NBUI/"   DPOENBUTPGU   QIZTJDTDIFNQI   QIZTJDTJOTEFU   QIZTJDTBUPNQI   DT-0   IFQMBU   QIZTJDTCJPQI   DT%.   NBUI"5   DT$&   QIZTJDTBDDQI   NBUI,5   DT'-   DT%-   RCJP$#   QIZTJDTQPQQI   DT(-   OMJO$(  
  17. 3BOL $PVOU 1DU 1DU      

                                      0WFS    /POF    UK 27 Ratio of the number of papers in the U.S. to 1 DBUFHPSZ DOU SBOL RGJO3.   RCJP$#   RGJO.'   RGJO53   RGJO1.   RGJO13   RCJP05   BTUSPQI("   BTUSPQI$0   BTUSPQI43   BTUSPQI&1   QIZTJDTGMVEZO   TUBU.&   TUBU"1   DT$:   NBUI(3   DT."   RCJP/$   TUBU$0   RCJP2.   NBUI"5   RCJP1&   FDPO&.   DT&5  
  18. 3BOL $PVOU 1DU 1DU      

                                      0WFS    /POF    Italy 28 Ratio of the number of papers in the U.S. to 1 DBUFHPSZ DOU SBOL OMJO$(   DT%-   QIZTJDTFEQI   QIZTJDTQPQQI   DT(-   OMJO"0   QIZTJDTDMBTTQI   QIZTJDTTQBDFQI   RCJP$#   RCJP05   RGJO13   TUBU05   BTUSPQI)&   NBUIQI   NBUI"$   NBUI"1   NBUI$7   NBUI(/   NBUI)0   NBUI.1   QIZTJDTIJTUQI   QIZTJDTJOTEFU   QIZTJDTTPDQI   RGJO(/   RGJO.'   RGJO3.   RGJO45  
  19. Apdx: Correspondence between 153 fields and 12 categories 29 Meta

    Class Class Description astro-ph astro-ph Astrophysics astro-ph astro-ph.CO Cosmology and Nongalactic Astrophysics astro-ph astro-ph.EP Earth and Planetary Astrophysics astro-ph astro-ph.GA Astrophysics of Galaxies astro-ph astro-ph.HE High Energy Astrophysical Phenomena astro-ph astro-ph.IM Instrumentation and Methods for Astrophysics astro-ph astro-ph.SR Solar and Stellar Astrophysics astro-ph gr-qc General Relativity and Quantum Cosmology cond-mat cond-mat.dis-nn Disordered Systems and Neural Networks cond-mat cond-mat.mes-hall Mesoscale and Nanoscale Physics cond-mat cond-mat.mtrl-sci Materials Science cond-mat cond-mat.other Other Condensed Matter cond-mat cond-mat.quant-gas Quantum Gases cond-mat cond-mat.soft Soft Condensed Matter cond-mat cond-mat.stat-mech Statistical Mechanics cond-mat cond-mat.str-el Strongly Correlated Electrons cond-mat cond-mat.supr-con Superconductivity
  20. Apdx: Correspondence between 153 fields and 12 categories 30 Meta

    Class Class Description cs cs.AI Artificial Intelligence cs cs.AR Hardware Architecture cs cs.CC Computational Complexity cs cs.CE Computational Engineering, Finance, and Science cs cs.CG Computational Geometry cs cs.CL Computation and Language cs cs.CR Cryptography and Security cs cs.CV Computer Vision and Pattern Recognition cs cs.CY Computers and Society cs cs.DB Databases cs cs.DC Distributed, Parallel, and Cluster Computing cs cs.DL Digital Libraries cs cs.DM Discrete Mathematics cs cs.DS Data Structures and Algorithms cs cs.ET Emerging Technologies cs cs.FL Formal Languages and Automata Theory cs cs.GL General Literature cs cs.GR Graphics cs cs.GT Computer Science and Game Theory cs cs.HC Human-Computer Interaction cs cs.IR Information Retrieval
  21. Apdx: Correspondence between 153 fields and 12 categories 31 Meta

    Class Class Description cs cs.IT Information Theory cs cs.LG Learning cs cs.LO Logic in Computer Science cs cs.MA Multiagent Systems cs cs.MM Multimedia cs cs.MS Mathematical Software cs cs.NA Numerical Analysis cs cs.NE Neural and Evolutionary Computing cs cs.NI Networking and Internet Architecture cs cs.OH Other Computer Science cs cs.OS Operating Systems cs cs.PF Performance cs cs.PL Programming Languages cs cs.RO Robotics cs cs.SC Symbolic Computation cs cs.SD Sound cs cs.SE Software Engineering cs cs.SI Social and Information Networks cs cs.SY Systems and Control cs eess.AS Audio and Speech Processing cs eess.IV Image and Video Processing cs eess.SP Signal Processing
  22. Apdx: Correspondence between 153 fields and 12 categories 32 Meta

    Class Class Description econ econ.EM Econometrics hep hep-ex High Energy Physics - Experiment hep hep-lat High Energy Physics - Lattice hep hep-ph High Energy Physics - Phenomenology hep hep-th High Energy Physics - Theory
  23. Apdx: Correspondence between 153 fields and 12 categories 33 Meta

    Class Class Description math math-ph Mathematical Physics math math.AC Commutative Algebra math math.AG Algebraic Geometry math math.AP Analysis of PDEs math math.AT Algebraic Topology math math.CA Classical Analysis and ODEs math math.CO Combinatorics math math.CT Category Theory math math.CV Complex Variables math math.DG Differential Geometry math math.DS Dynamical Systems math math.FA Functional Analysis math math.GM General Mathematics math math.GN General Topology math math.GR Group Theory math math.GT Geometric Topology math math.HO History and Overview math math.IT Information Theory math math.KT K-Theory and Homology math math.LO Logic
  24. Apdx: Correspondence between 153 fields and 12 categories 34 Meta

    Class Class Description math math.MG Metric Geometry math math.MP Mathematical Physics math math.NA Numerical Analysis math math.NT Number Theory math math.OA Operator Algebras math math.OC Optimization and Control math math.PR Probability math math.QA Quantum Algebra math math.RA Rings and Algebras math math.RT Representation Theory math math.SG Symplectic Geometry math math.SP Spectral Theory math math.ST Statistics Theory
  25. Apdx: Correspondence between 153 fields and 12 categories 35 Meta

    Class Class Description nlin nlin.AO Adaptation and Self-Organizing Systems nlin nlin.CD Chaotic Dynamics nlin nlin.CG Cellular Automata and Lattice Gases nlin nlin.PS Pattern Formation and Solitons nlin nlin.SI Exactly Solvable and Integrable Systems nucl nucl-ex Nuclear Experiment nucl nucl-th Nuclear Theory
  26. Apdx: Correspondence between 153 fields and 12 categories 36 Meta

    Class Class Description physics physics.acc-ph Accelerator Physics physics physics.ao-ph Atmospheric and Oceanic Physics physics physics.app-ph Applied Physics physics physics.atm-clus Atomic and Molecular Clusters physics physics.atom-ph Atomic Physics physics physics.bio-ph Biological Physics physics physics.chem-ph Chemical Physics physics physics.class-ph Classical Physics physics physics.comp-ph Computational Physics physics physics.data-an Data Analysis, Statistics and Probability physics physics.ed-ph Physics Education physics physics.flu-dyn Fluid Dynamics physics physics.gen-ph General Physics physics physics.geo-ph Geophysics physics physics.hist-ph History and Philosophy of Physics physics physics.ins-det Instrumentation and Detectors physics physics.med-ph Medical Physics physics physics.optics Optics physics physics.plasm-ph Plasma Physics physics physics.pop-ph Popular Physics physics physics.soc-ph Physics and Society physics physics.space-ph Space Physics physics quant-ph Quantum Physics
  27. Apdx: Correspondence between 153 fields and 12 categories 37 Meta

    Class Class Description q-bio q-bio.BM Biomolecules q-bio q-bio.CB Cell Behavior q-bio q-bio.GN Genomics q-bio q-bio.MN Molecular Networks q-bio q-bio.NC Neurons and Cognition q-bio q-bio.OT Other Quantitative Biology q-bio q-bio.PE Populations and Evolution q-bio q-bio.QM Quantitative Methods q-bio q-bio.SC Subcellular Processes q-bio q-bio.TO Tissues and Organs q-fin q-fin.CP Computational Finance q-fin q-fin.EC Economics q-fin q-fin.GN General Finance q-fin q-fin.MF Mathematical Finance q-fin q-fin.PM Portfolio Management q-fin q-fin.PR Pricing of Securities q-fin q-fin.RM Risk Management q-fin q-fin.ST Statistical Finance q-fin q-fin.TR Trading and Market Microstructure
  28. Apdx: Correspondence between 153 fields and 12 categories 38 Meta

    Class Class Description stat stat.AP Applications stat stat.CO Computation stat stat.ME Methodology stat stat.ML Machine Learning stat stat.OT Other Statistics stat stat.TH Statistics Theory