CSIndexbr: A Brazilian Computer Science Index (Talk at CIn/UFPE)

CSIndexbr: A Brazilian Computer Science Index (Talk at CIn/UFPE)

13beaa3b7239eca3319d54c6a9f3a85a?s=128

ASERG, DCC, UFMG

February 25, 2019
Tweet

Transcript

  1. csindexbr.org, @csindexbr Marco Tulio Valente, ASERG/DCC/UFMG CSIndexbr: A Brazilian Computer

    Science Index
  2. Scholarly Communication Tools 2 Kramer, Bianca; Bosman, Jeroen (2015): 101

    Innovations in Scholarly Communication - the Changing Research Workflow. https://doi.org/10.6084/m9.figshare.1286826.v1
  3. CSIndexbr is an experimental scholarly communication tool, with two goals:

    - Discovery e.g. What are the "best" Brazilian papers in my area? - Assessment e.g. What are the "best" Brazilian CS depts in my area? 3
  4. Index of papers published by Brazilian CS professors in selected

    conferences and journals in the last five years (2014-today) - Transparent - Open - but "unofficial" 4
  5. Primary data source: DBLP (dblp.org) - High-quality metadata about CS

    papers - Covers all relevant CS venues - Open-license - Very reliable API 5
  6. Key decision: organization by research areas 6

  7. Research Areas (21) 7 1. Software Engineering 2. Programming Languages

    3. Human-Computer Interaction 4. Computer Networks 5. Distributed Systems 6. Computer Architecture & HPC 7. Hardware Design 8. Databases & Inf. Systems 9. Web & Information Retrieval 10. Data Mining & Mach. Learning 11. Artificial Intelligence 12. Algorithms & Complexity 13. Formal Methods & Logic 14. Operational Research 15. Security & Cryptography 16. Computer Vision 17. Computer Graphics 18. Robotics 19. CS Education 20. Bioinformatics 21. Computer Science (General)
  8. Brazilian CS Professors (1,071) - Exactly 800 professors with indexed

    papers 8
  9. Key contribution: Curated dataset of conferences and journals 9

  10. Conferences (178) - 15 conferences / area (max) - Only

    full, main-track papers (10 pages) - short, tool, workshop etc papers are not indexed - Criteria: - submitted > 100 papers - acceptance < 30% - h5-index > 20 10
  11. Exceptions: - Many areas: full papers < 10 pages -

    Computer Networks: 18 confs - Algorithms & Complexity: accept. ~ 40% - etc 11
  12. Exceptions are highlighted in yellow in Stats (C) 12

  13. Top-Conferences ( ) - 3 top-conferences / area (max) -

    submitted > 180 papers - h5-index > 30 13
  14. Journals (175) - 15 journals / area (max) - Criteria:

    - Indexed by JCR - h5-index > 25 14
  15. Top-Journals - 3 top-journals / area (max) - Criteria: -

    ACM Transactions or IEEE Transactions (or similar) 15
  16. Multi-Journals - Publish papers in more than one area -

    Require manual division of the papers - 6 multi-journals: - Softw., Pract. Exper. (SE, PL, DS) - Journal of Systems and Software (SE, DS) - TACO (PL, Arch) - Sci. Comput. Program. (PL, SE, Formal) - Concurrency and Computation (DS, Arch) - Theor. Comput. Sci. (Algorith, Formal) 16
  17. Goal 1: Discovery 17

  18. Papers / Conference [always in the last 5 yrs] 18

  19. Papers / Journal 19

  20. Papers (Conferences & Journals) 20

  21. Professors with Papers (in a Research Area) 21

  22. Author Pages 22

  23. Goal 2: Assessment 23

  24. Department Rankings - 1.0: paper in top-conference or top-journal -

    0.40: paper in journals - 0.33: paper in - conference - magazines - journals with short papers - mega-journals - journals with normalized-h5-index < 0.2 24
  25. Dept Rankings: per Research Area 25

  26. More details: FAQ https://csindexbr.org/faq.html 26

  27. Beyond rankings: a repository for scientometrics studies on Brazilian scientific

    production in CS 27
  28. Source code and data is public on GitHub https://github.com/aserg-ufmg/CSIndex 28

  29. Documentation (in progress) https://github.com/aserg-ufmg/CSIndex 29

  30. (1) h-index correlates with size 30

  31. (2) Journals vs Conferences - 178 conferences, 175 journals -

    Papers - conferences: 676 - journals: 1,943 - ~3 times more papers in journals 31
  32. 32 42% of the papers from 12 journals (7%)

  33. 33 42% of the papers are from 12 journals (7%)

    All are commercial journals
  34. 34 43% of the papers are from 12 journals (7%)

    All are commercial journals 11 journals from Elsevier
  35. How common is this pattern (journals = 3 * conf)?

    35
  36. 36 Turing Awards (20): conferences vs journals Source: DBLP (February,

    2019)
  37. Other features: arXiv links & citations 37

  38. Links to arXiv preprints (if available) 38

  39. Only 5% of papers have preprints on arXiv 39

  40. arXiv popularity (worldwide): 23% 40 Popularity of arXiv.org within Computer

    Science. Charles Sutton and Linan Gong, https://arxiv.org/pdf/1710.05225.pdf
  41. Another feature: citations 41

  42. CrossRef Citations - Crossref is an official DOI registration agency

    - They maintain a database of citations - used by ACM DL, IEEE DL, Dimensions etc - Has a public API (unlike Google Scholar) 42
  43. CrossRef vs Google Scholar "Google Scholar unique citations have, on

    average, a much lower scientific impact than citations also found by WoS/Scopus". Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories, J. of Informetrics, 2018. https://arxiv.org/pdf/1808.05053.pdf 43
  44. Deprecated feature: "Trending Papers" - Page with most cited papers

    (in all areas) - Reason for deprecation: - Areas with more papers, have more citations - Some papers attract more citations (systematic literature review, surveys, generic tools etc) 44
  45. How to Contribute 45

  46. How to Contribute - Indicating a missing Brazilian professor -

    Indicating a missing (and relevant) conference/journal - Indicating a missing paper: - But check before if it is a full paper - Also check if you don't have multiple DBLP pages - In case of multiple DBLP pages: - Mail "dblp@dagstuhl.de" asking the merge 46
  47. Google Forms (for submitting issues) https://goo.gl/forms/kz3F1fZIKtubWYiu1 or from csindexbr.org 47

  48. Implementation 48

  49. Constraints and Requirements - Team: 2 developers (MT and Klerisson,

    UFU) - Zero budget - High availability - No (or very few) production bugs - Easy-to-use UI - Fast 49
  50. Backend Module (Python) - Regenerates all data at each execution

    - Data format: csv - Executed in my laptop and pushed to GitHub - 3 hours to download authors' papers from DBLP - 20 minutes to regenerate all data - 90 minutes to update citations 50
  51. Frontend Module (HTML and JavaScript) - Plots the csv files

    (using Google Charts library) - Almost no computation on browsers (performance) - Hosted at GitHub pages (0-budget & high availability) 51
  52. Future Work 52

  53. Backlog 1. Internal improvements, scripts, refactorings etc 2. Update conferences

    and journals statistics (2018) 3. Integration with CNPq (link to Lattes and Bolsa PQ) 4. Integration with Unpaywall (links to preprints) 5. "Global" depts ranking (all areas) 6. Adjust papers' scores by number of authors 7. Adjust dept's scores by number of professors 8. Extend data collection to more than 5 years 9. Other countries (2020?) 53
  54. csindexbr.org @csindexbr Thanks 54