$30 off During Our Annual Pro Sale. View Details »

CSIndexbr: A Brazilian Computer Science Index (Talk at CIn/UFPE)

CSIndexbr: A Brazilian Computer Science Index (Talk at CIn/UFPE)

ASERG, DCC, UFMG

February 25, 2019
Tweet

More Decks by ASERG, DCC, UFMG

Other Decks in Research

Transcript

  1. csindexbr.org, @csindexbr
    Marco Tulio Valente, ASERG/DCC/UFMG
    CSIndexbr:
    A Brazilian Computer Science Index

    View Slide

  2. Scholarly Communication Tools
    2
    Kramer, Bianca; Bosman, Jeroen (2015): 101 Innovations in Scholarly Communication -
    the Changing Research Workflow. https://doi.org/10.6084/m9.figshare.1286826.v1

    View Slide

  3. CSIndexbr is an experimental scholarly
    communication tool, with two goals:
    - Discovery
    e.g. What are the "best" Brazilian papers in my area?
    - Assessment
    e.g. What are the "best" Brazilian CS depts in my area?
    3

    View Slide

  4. Index of papers published by Brazilian CS
    professors in selected conferences and journals
    in the last five years (2014-today)
    - Transparent
    - Open
    - but "unofficial"
    4

    View Slide

  5. Primary data source: DBLP (dblp.org)
    - High-quality metadata about CS papers
    - Covers all relevant CS venues
    - Open-license
    - Very reliable API
    5

    View Slide

  6. Key decision:
    organization by research areas
    6

    View Slide

  7. Research Areas (21)
    7
    1. Software Engineering
    2. Programming Languages
    3. Human-Computer Interaction
    4. Computer Networks
    5. Distributed Systems
    6. Computer Architecture & HPC
    7. Hardware Design
    8. Databases & Inf. Systems
    9. Web & Information Retrieval
    10. Data Mining & Mach. Learning
    11. Artificial Intelligence
    12. Algorithms & Complexity
    13. Formal Methods & Logic
    14. Operational Research
    15. Security & Cryptography
    16. Computer Vision
    17. Computer Graphics
    18. Robotics
    19. CS Education
    20. Bioinformatics
    21. Computer Science (General)

    View Slide

  8. Brazilian CS Professors (1,071)
    - Exactly 800 professors with indexed papers
    8

    View Slide

  9. Key contribution:
    Curated dataset of conferences and journals
    9

    View Slide

  10. Conferences (178)
    - 15 conferences / area (max)
    - Only full, main-track papers (10 pages)
    - short, tool, workshop etc papers are not indexed
    - Criteria:
    - submitted > 100 papers
    - acceptance < 30%
    - h5-index > 20
    10

    View Slide

  11. Exceptions:
    - Many areas: full papers < 10 pages
    - Computer Networks: 18 confs
    - Algorithms & Complexity: accept. ~ 40%
    - etc
    11

    View Slide

  12. Exceptions are highlighted in yellow in Stats (C)
    12

    View Slide

  13. Top-Conferences ( )
    - 3 top-conferences / area (max)
    - submitted > 180 papers
    - h5-index > 30
    13

    View Slide

  14. Journals (175)
    - 15 journals / area (max)
    - Criteria:
    - Indexed by JCR
    - h5-index > 25
    14

    View Slide

  15. Top-Journals
    - 3 top-journals / area (max)
    - Criteria:
    - ACM Transactions or IEEE Transactions (or similar)
    15

    View Slide

  16. Multi-Journals
    - Publish papers in more than one area
    - Require manual division of the papers
    - 6 multi-journals:
    - Softw., Pract. Exper. (SE, PL, DS)
    - Journal of Systems and Software (SE, DS)
    - TACO (PL, Arch)
    - Sci. Comput. Program. (PL, SE, Formal)
    - Concurrency and Computation (DS, Arch)
    - Theor. Comput. Sci. (Algorith, Formal)
    16

    View Slide

  17. Goal 1: Discovery
    17

    View Slide

  18. Papers / Conference [always in the last 5 yrs]
    18

    View Slide

  19. Papers / Journal
    19

    View Slide

  20. Papers (Conferences & Journals)
    20

    View Slide

  21. Professors with Papers (in a Research Area)
    21

    View Slide

  22. Author Pages
    22

    View Slide

  23. Goal 2: Assessment
    23

    View Slide

  24. Department Rankings
    - 1.0: paper in top-conference or top-journal
    - 0.40: paper in journals
    - 0.33: paper in
    - conference
    - magazines
    - journals with short papers
    - mega-journals
    - journals with normalized-h5-index < 0.2
    24

    View Slide

  25. Dept Rankings: per Research Area
    25

    View Slide

  26. More details: FAQ
    https://csindexbr.org/faq.html
    26

    View Slide

  27. Beyond rankings: a repository for scientometrics
    studies on Brazilian scientific production in CS
    27

    View Slide

  28. Source code and data is public on GitHub
    https://github.com/aserg-ufmg/CSIndex
    28

    View Slide

  29. Documentation (in progress)
    https://github.com/aserg-ufmg/CSIndex 29

    View Slide

  30. (1) h-index correlates with size
    30

    View Slide

  31. (2) Journals vs Conferences
    - 178 conferences, 175 journals
    - Papers
    - conferences: 676
    - journals: 1,943
    - ~3 times more papers in journals
    31

    View Slide

  32. 32
    42% of the papers from 12 journals (7%)

    View Slide

  33. 33
    42% of the papers are from 12 journals (7%)
    All are commercial journals

    View Slide

  34. 34
    43% of the papers are from 12 journals (7%)
    All are commercial journals
    11 journals from Elsevier

    View Slide

  35. How common is this pattern (journals = 3 * conf)?
    35

    View Slide

  36. 36
    Turing Awards (20): conferences vs journals
    Source: DBLP (February, 2019)

    View Slide

  37. Other features:
    arXiv links & citations
    37

    View Slide

  38. Links to arXiv preprints (if available)
    38

    View Slide

  39. Only 5% of papers have preprints on arXiv
    39

    View Slide

  40. arXiv popularity (worldwide): 23%
    40
    Popularity of arXiv.org within Computer Science. Charles Sutton and Linan Gong,
    https://arxiv.org/pdf/1710.05225.pdf

    View Slide

  41. Another feature: citations
    41

    View Slide

  42. CrossRef Citations
    - Crossref is an official DOI registration agency
    - They maintain a database of citations
    - used by ACM DL, IEEE DL, Dimensions etc
    - Has a public API (unlike Google Scholar)
    42

    View Slide

  43. CrossRef vs Google Scholar
    "Google Scholar unique citations have, on
    average, a much lower scientific impact than
    citations also found by WoS/Scopus".
    Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories, J. of
    Informetrics, 2018. https://arxiv.org/pdf/1808.05053.pdf
    43

    View Slide

  44. Deprecated feature: "Trending Papers"
    - Page with most cited papers (in all areas)
    - Reason for deprecation:
    - Areas with more papers, have more citations
    - Some papers attract more citations (systematic
    literature review, surveys, generic tools etc)
    44

    View Slide

  45. How to Contribute
    45

    View Slide

  46. How to Contribute
    - Indicating a missing Brazilian professor
    - Indicating a missing (and relevant) conference/journal
    - Indicating a missing paper:
    - But check before if it is a full paper
    - Also check if you don't have multiple DBLP pages
    - In case of multiple DBLP pages:
    - Mail "[email protected]" asking the merge 46

    View Slide

  47. Google Forms (for submitting issues)
    https://goo.gl/forms/kz3F1fZIKtubWYiu1
    or from csindexbr.org 47

    View Slide

  48. Implementation
    48

    View Slide

  49. Constraints and Requirements
    - Team: 2 developers (MT and Klerisson, UFU)
    - Zero budget
    - High availability
    - No (or very few) production bugs
    - Easy-to-use UI
    - Fast
    49

    View Slide

  50. Backend Module (Python)
    - Regenerates all data at each execution
    - Data format: csv
    - Executed in my laptop and pushed to GitHub
    - 3 hours to download authors' papers from DBLP
    - 20 minutes to regenerate all data
    - 90 minutes to update citations
    50

    View Slide

  51. Frontend Module (HTML and JavaScript)
    - Plots the csv files (using Google Charts library)
    - Almost no computation on browsers (performance)
    - Hosted at GitHub pages (0-budget & high availability)
    51

    View Slide

  52. Future Work
    52

    View Slide

  53. Backlog
    1. Internal improvements, scripts, refactorings etc
    2. Update conferences and journals statistics (2018)
    3. Integration with CNPq (link to Lattes and Bolsa PQ)
    4. Integration with Unpaywall (links to preprints)
    5. "Global" depts ranking (all areas)
    6. Adjust papers' scores by number of authors
    7. Adjust dept's scores by number of professors
    8. Extend data collection to more than 5 years
    9. Other countries (2020?) 53

    View Slide

  54. csindexbr.org
    @csindexbr
    Thanks
    54

    View Slide