Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Arcas: Using Python to access open research lit...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Nikoleta
August 30, 2017
Science
180
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Arcas: Using Python to access open research literature
EuroScipy 2017
Nikoleta
August 30, 2017
More Decks by Nikoleta
See All by Nikoleta
A trip to earth science with python as a companion
nikoletav3
0
54
Optimisation of short memory strategies in the Iterated Prisoners Dilemma
nikoletav3
0
58
Testing Research Software
nikoletav3
0
340
Arcas
nikoletav3
0
510
SSI Selection Day
nikoletav3
0
420
SWORDS-03-10-2016
nikoletav3
0
53
PyCon UK 2016
nikoletav3
0
170
Other Decks in Science
See All in Science
(メタ)科学コミュニケーターからみたAI for Scienceの同床異夢
rmaruy
0
240
TypeScript で WebAssembly を用いた 型安全なプラグイン設計
nagano
2
520
(CVPR2026) Back to Basics: Let Denoising Generative Models Denoise
shumpei777
0
130
AI bij literatuuronderzoek in de wetenschap
voginip
0
170
MATSUO Makiko
genomethica
0
150
Rashomon at the Sound: Reconstructing all possible paleoearthquake histories in the Puget Lowland through topological search
cossatot
0
990
水耕栽培:古代の知恵から宇宙農業まで
grow_design_lab
0
140
Inside the Mind of an LLM
baggiponte
0
180
Bear-safety-running
akirun_run
0
150
DMMにおけるABテスト検証設計の工夫
xc6da
1
2k
なぜエネルギーは保存する? 〜自由落下でわかる“対称性”とネーターの定理〜
syotasasaki593876
0
180
AIを用いた PID制御で部屋 の温度制御をしてみた
nearme_tech
PRO
0
140
Featured
See All Featured
The Cult of Friendly URLs
andyhume
79
6.9k
Unsuck your backbone
ammeep
672
58k
Music & Morning Musume
bryan
47
7.2k
Designing for Performance
lara
611
70k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
140
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
330
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
201
75k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
170
Raft: Consensus for Rubyists
vanstee
141
7.5k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
160
Transcript
Arcas: Using Python to access open research literature @NikoletaGlyn
None
The illustrated guide to a Ph.D. Matt Might http://matt.might.net/articles/phd-school-in-pictures/
None
ARTICLE JOURNAL REVIEW PUBLISHED
Sustainable Software
None
None
0.5min + 100 × 1.5min + 10 × 0.5min =
155.5min ⇒ 2h and 35.5min
API
QUERY http://export.arxiv.org/api/query?search_query=ti: Sustainable%20Software
None
15min + 1min + 50min = 66min ⇒ 1h and
6min
QUERY http://export.arxiv.org/api/query?search_query=ti: Sustainable%20Software
QUERY http://export.arxiv.org/api/query?search_query=ti: Sustainable%20Software http://api.plos.org/search?q=title: Sustainable%20Software&rows=100
QUERY http://export.arxiv.org/api/query?search_query=ti: Sustainable%20Software http://api.plos.org/search?q=title: Sustainable%20Software&rows=100 http: //www.nature.com/opensearch/request?queryType=cql&query= dc.title%20adj%20SustainableSoftware&maximumRecords=100 ...
API1 Query XML API2 Query XML API3 Query XML API4
Query XML API5 Query XML API6 Query XML
ARCAS API1 Query XML API2 Query XML API3 Query XML
API4 Query XML API5 Query XML API6 Query XML
$ pip install arcas
>>> import arcas >>> api = arcas.Arxiv() >>> parameters =
api.parameters_fix( ... title=’sustainable software’, records=1, start=1) >>> url = api.create_url_search(parameters) >>> request = api.make_request(url) >>> root = api.get_root(request) >>> raw_article = api.parse(root) >>> article = api.to_dataframe(raw_article[0]) >>> api.export(article, "result.json")
{"key":{"0":"Ahern2013"}, "unique_key":{"0":"698d27415f69258ef122f46b184a77e0"}, "title":{"0":"VisIt: Experiences with Sustainable Software"}, "author":{"0":"Sean Ahern","1":"Eric Brugger"},
"abstract":{"0":" The success of the VisIt visualization..."}, "date":{"0":2013}, "journal":{"0":"arXiv"}, "provenance":{"0":"arXiv"}}
>>> for p in [arcas.Arxiv, arcas.Nature, arcas.Ieee, arcas.Plos]: ... api
= p() ... parameters = api.parameters_fix( ... title=’sustainable software’, records=1, start=1) ... url = api.create_url_search(parameters) ... request = api.make_request(url) ... root = api.get_root(request) ... raw_article = api.parse(root) ... try: ... for art in raw_article: ... article = api.to_dataframe(art) ... api.export(article, "result_from_{}.json".format( ... api.__class__.__name__)) ... except TypeError: ... pass
15min + 5min = 20min
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
year 2 4 6 8 10 12 14 16 number of records Articles per Year (N=87)
2000 2002 2004 2006 2008 2010 2012 2014 2016 year
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 number of records Provenance IEEE arXiv PLOS
None
Birgit Penzenstadler
Arcas tools.py doc/ arcas.readthedocs.io/ ieee nature arxiv . . .
test ieee test nature test arxiv . . .
$ arcas_scrape --version Arcas 0.0.3 $ arcas_scrape -p arxiv -t
"Sustainable Software" -r 1 http://export.arxiv.org/api/query?search_query=ti:Sustainable Software&max_results=1&start=1
@NikoletaGlyn https://github.com/ArcasProject/Arcas https://nikoleta-v3.github.io