Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Arcas: Using Python to access open research lit...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Nikoleta
August 30, 2017
Science
180
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Arcas: Using Python to access open research literature
EuroScipy 2017
Nikoleta
August 30, 2017
More Decks by Nikoleta
See All by Nikoleta
A trip to earth science with python as a companion
nikoletav3
0
54
Optimisation of short memory strategies in the Iterated Prisoners Dilemma
nikoletav3
0
58
Testing Research Software
nikoletav3
0
340
Arcas
nikoletav3
0
510
SSI Selection Day
nikoletav3
0
420
SWORDS-03-10-2016
nikoletav3
0
53
PyCon UK 2016
nikoletav3
0
170
Other Decks in Science
See All in Science
CVPR2026_VGGTとその仲間たち
mickey_0226
0
810
[NLP2026 参加報告会] AI for Science まとめ / NLP2026
lychee1223
0
1.9k
1. CPC理論の展開と集合的知能モデル(JSAI2026 KS-27 集合的予測符号化と新たな知性の時代)
hayashiyus884
1
200
Understanding CVP Waveforms: Interpretation and Clinical Implications in Anesthesiology
taka88
0
580
Utiliser Bitcoin sans Internet
rlifchitz
0
120
白金鉱業Vol.21【初学者向け発表枠】身近な例から学ぶ数理最適化の基礎 / Learning the Basics of Mathematical Optimization Through Everyday Examples
brainpadpr
1
750
(2025) Balade en cyclotomie
mansuy
0
620
データベース05: SQL(2/3) 結合質問
trycycle
PRO
0
1.2k
20260220 OpenIDファウンデーション・ジャパン ご紹介 / 20260220 OpenID Foundation Japan Intro
oidfj
0
360
水耕栽培を始める前に知っておきたい植物の科学
grow_design_lab
0
230
AIPシンポジウム 2025年度 成果報告会 「因果推論チーム」
sshimizu2006
3
530
なぜ21は素因数分解されないのか? - Shorのアルゴリズムの現在と壁
daimurat
0
450
Featured
See All Featured
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
2k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
230
23k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.9k
Visualization
eitanlees
152
17k
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
270
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
330
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
330
How to Ace a Technical Interview
jacobian
281
24k
Done Done
chrislema
186
16k
Producing Creativity
orderedlist
PRO
348
40k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
390
Transcript
Arcas: Using Python to access open research literature @NikoletaGlyn
None
The illustrated guide to a Ph.D. Matt Might http://matt.might.net/articles/phd-school-in-pictures/
None
ARTICLE JOURNAL REVIEW PUBLISHED
Sustainable Software
None
None
0.5min + 100 × 1.5min + 10 × 0.5min =
155.5min ⇒ 2h and 35.5min
API
QUERY http://export.arxiv.org/api/query?search_query=ti: Sustainable%20Software
None
15min + 1min + 50min = 66min ⇒ 1h and
6min
QUERY http://export.arxiv.org/api/query?search_query=ti: Sustainable%20Software
QUERY http://export.arxiv.org/api/query?search_query=ti: Sustainable%20Software http://api.plos.org/search?q=title: Sustainable%20Software&rows=100
QUERY http://export.arxiv.org/api/query?search_query=ti: Sustainable%20Software http://api.plos.org/search?q=title: Sustainable%20Software&rows=100 http: //www.nature.com/opensearch/request?queryType=cql&query= dc.title%20adj%20SustainableSoftware&maximumRecords=100 ...
API1 Query XML API2 Query XML API3 Query XML API4
Query XML API5 Query XML API6 Query XML
ARCAS API1 Query XML API2 Query XML API3 Query XML
API4 Query XML API5 Query XML API6 Query XML
$ pip install arcas
>>> import arcas >>> api = arcas.Arxiv() >>> parameters =
api.parameters_fix( ... title=’sustainable software’, records=1, start=1) >>> url = api.create_url_search(parameters) >>> request = api.make_request(url) >>> root = api.get_root(request) >>> raw_article = api.parse(root) >>> article = api.to_dataframe(raw_article[0]) >>> api.export(article, "result.json")
{"key":{"0":"Ahern2013"}, "unique_key":{"0":"698d27415f69258ef122f46b184a77e0"}, "title":{"0":"VisIt: Experiences with Sustainable Software"}, "author":{"0":"Sean Ahern","1":"Eric Brugger"},
"abstract":{"0":" The success of the VisIt visualization..."}, "date":{"0":2013}, "journal":{"0":"arXiv"}, "provenance":{"0":"arXiv"}}
>>> for p in [arcas.Arxiv, arcas.Nature, arcas.Ieee, arcas.Plos]: ... api
= p() ... parameters = api.parameters_fix( ... title=’sustainable software’, records=1, start=1) ... url = api.create_url_search(parameters) ... request = api.make_request(url) ... root = api.get_root(request) ... raw_article = api.parse(root) ... try: ... for art in raw_article: ... article = api.to_dataframe(art) ... api.export(article, "result_from_{}.json".format( ... api.__class__.__name__)) ... except TypeError: ... pass
15min + 5min = 20min
2000 2002 2004 2006 2008 2010 2012 2014 2016 2018
year 2 4 6 8 10 12 14 16 number of records Articles per Year (N=87)
2000 2002 2004 2006 2008 2010 2012 2014 2016 year
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 number of records Provenance IEEE arXiv PLOS
None
Birgit Penzenstadler
Arcas tools.py doc/ arcas.readthedocs.io/ ieee nature arxiv . . .
test ieee test nature test arxiv . . .
$ arcas_scrape --version Arcas 0.0.3 $ arcas_scrape -p arxiv -t
"Sustainable Software" -r 1 http://export.arxiv.org/api/query?search_query=ti:Sustainable Software&max_results=1&start=1
@NikoletaGlyn https://github.com/ArcasProject/Arcas https://nikoleta-v3.github.io