Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CSIndexbr: A Brazilian Computer Science Index (Talk at CIn/UFPE)

CSIndexbr: A Brazilian Computer Science Index (Talk at CIn/UFPE)

ASERG, DCC, UFMG

February 25, 2019
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. Scholarly Communication Tools 2 Kramer, Bianca; Bosman, Jeroen (2015): 101

    Innovations in Scholarly Communication - the Changing Research Workflow. https://doi.org/10.6084/m9.figshare.1286826.v1
  2. CSIndexbr is an experimental scholarly communication tool, with two goals:

    - Discovery e.g. What are the "best" Brazilian papers in my area? - Assessment e.g. What are the "best" Brazilian CS depts in my area? 3
  3. Index of papers published by Brazilian CS professors in selected

    conferences and journals in the last five years (2014-today) - Transparent - Open - but "unofficial" 4
  4. Primary data source: DBLP (dblp.org) - High-quality metadata about CS

    papers - Covers all relevant CS venues - Open-license - Very reliable API 5
  5. Research Areas (21) 7 1. Software Engineering 2. Programming Languages

    3. Human-Computer Interaction 4. Computer Networks 5. Distributed Systems 6. Computer Architecture & HPC 7. Hardware Design 8. Databases & Inf. Systems 9. Web & Information Retrieval 10. Data Mining & Mach. Learning 11. Artificial Intelligence 12. Algorithms & Complexity 13. Formal Methods & Logic 14. Operational Research 15. Security & Cryptography 16. Computer Vision 17. Computer Graphics 18. Robotics 19. CS Education 20. Bioinformatics 21. Computer Science (General)
  6. Conferences (178) - 15 conferences / area (max) - Only

    full, main-track papers (10 pages) - short, tool, workshop etc papers are not indexed - Criteria: - submitted > 100 papers - acceptance < 30% - h5-index > 20 10
  7. Exceptions: - Many areas: full papers < 10 pages -

    Computer Networks: 18 confs - Algorithms & Complexity: accept. ~ 40% - etc 11
  8. Top-Conferences ( ) - 3 top-conferences / area (max) -

    submitted > 180 papers - h5-index > 30 13
  9. Journals (175) - 15 journals / area (max) - Criteria:

    - Indexed by JCR - h5-index > 25 14
  10. Top-Journals - 3 top-journals / area (max) - Criteria: -

    ACM Transactions or IEEE Transactions (or similar) 15
  11. Multi-Journals - Publish papers in more than one area -

    Require manual division of the papers - 6 multi-journals: - Softw., Pract. Exper. (SE, PL, DS) - Journal of Systems and Software (SE, DS) - TACO (PL, Arch) - Sci. Comput. Program. (PL, SE, Formal) - Concurrency and Computation (DS, Arch) - Theor. Comput. Sci. (Algorith, Formal) 16
  12. Department Rankings - 1.0: paper in top-conference or top-journal -

    0.40: paper in journals - 0.33: paper in - conference - magazines - journals with short papers - mega-journals - journals with normalized-h5-index < 0.2 24
  13. (2) Journals vs Conferences - 178 conferences, 175 journals -

    Papers - conferences: 676 - journals: 1,943 - ~3 times more papers in journals 31
  14. 34 43% of the papers are from 12 journals (7%)

    All are commercial journals 11 journals from Elsevier
  15. arXiv popularity (worldwide): 23% 40 Popularity of arXiv.org within Computer

    Science. Charles Sutton and Linan Gong, https://arxiv.org/pdf/1710.05225.pdf
  16. CrossRef Citations - Crossref is an official DOI registration agency

    - They maintain a database of citations - used by ACM DL, IEEE DL, Dimensions etc - Has a public API (unlike Google Scholar) 42
  17. CrossRef vs Google Scholar "Google Scholar unique citations have, on

    average, a much lower scientific impact than citations also found by WoS/Scopus". Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories, J. of Informetrics, 2018. https://arxiv.org/pdf/1808.05053.pdf 43
  18. Deprecated feature: "Trending Papers" - Page with most cited papers

    (in all areas) - Reason for deprecation: - Areas with more papers, have more citations - Some papers attract more citations (systematic literature review, surveys, generic tools etc) 44
  19. How to Contribute - Indicating a missing Brazilian professor -

    Indicating a missing (and relevant) conference/journal - Indicating a missing paper: - But check before if it is a full paper - Also check if you don't have multiple DBLP pages - In case of multiple DBLP pages: - Mail "[email protected]" asking the merge 46
  20. Constraints and Requirements - Team: 2 developers (MT and Klerisson,

    UFU) - Zero budget - High availability - No (or very few) production bugs - Easy-to-use UI - Fast 49
  21. Backend Module (Python) - Regenerates all data at each execution

    - Data format: csv - Executed in my laptop and pushed to GitHub - 3 hours to download authors' papers from DBLP - 20 minutes to regenerate all data - 90 minutes to update citations 50
  22. Frontend Module (HTML and JavaScript) - Plots the csv files

    (using Google Charts library) - Almost no computation on browsers (performance) - Hosted at GitHub pages (0-budget & high availability) 51
  23. Backlog 1. Internal improvements, scripts, refactorings etc 2. Update conferences

    and journals statistics (2018) 3. Integration with CNPq (link to Lattes and Bolsa PQ) 4. Integration with Unpaywall (links to preprints) 5. "Global" depts ranking (all areas) 6. Adjust papers' scores by number of authors 7. Adjust dept's scores by number of professors 8. Extend data collection to more than 5 years 9. Other countries (2020?) 53