2000 • Exchange Student in Malaysia • 2002-2009 • CLARAONLINE, INC. • ICT Hosting Company, nowadays called Cloud system supplier • 2009-2015 • Institute of Innovation Research, HITOTSUBASHI UNIVERSITY • 2015-2017 • Science for RE-Designing Science, Technology and Innovation Policy Center, National Graduate Institute for Policy Studies (GRIPS) / NISTEP / Hitotsubashi UNIVERSITY/MANAGEMENT INNOVATION CENTER • 2018-2019 • EHESS Paris – CEAFJP/Michelin Research Fellow • OECD Expert Advisory Group: Digital Science and Innovation Policy and Governance (DSIP) and STI Policy Monitoring and Analysis (REITER) project • 2019- • TDB Center for Advanced Empirical Research on Enterprise and Economy, Faculty of Economics, Hitotsubashi University
to SQL of WoS - Pick up sufficient data to analyze - Which tag is needed? - Title? - Names Count? (as known as scientific ordering) - Publisher? - Fund information? - Name? First name? Last Name? - ID?
*.zip to *.gz format • Take 3 mins for 8.16GB • Decrypt data from *.gz to *.zip format • Take 30 mins for 20GB • Decrypt data from *.zip to *.xml format • Take 1.2 hours for 10GB • Parsing xml data into (my)sql format • Take 5.5 hours for 40.5GB • Binding separated *.sql format data into one single file for each year • Take 35 mins • Importing *sql format data into MySQL Server • Take 120 hours for 49.8GB ZIP - > GZ GZ -> ZIP ZIP -> XML XML -> SQL SQL -> SQL Server Accessin g from SQL client Finally you could get the data! Stata?R? Analyze Submissi on Revise Publis h
XML data to SQL format. • Using generic_paser.py which distributed in Github. • https://github.com/titipata/wos_pars er • Takes 5.5 hours to parse XML data in 40.5GB into SQL format. • Then import to SQL server, it takes 1.5-3 days with 2 million entries per year.