Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
680
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
23
Usability Testing Primo
ewlarson
0
77
Ruby Tips
ewlarson
0
100
9 Keys to Great UX
ewlarson
0
75
Other Decks in Programming
See All in Programming
20240301_cocone_EMゆるミートアップvol6_LT資料
cocone
0
250
Crafting a Own PHP - ウキウキ手作りミニマリストPHP
uzulla
4
960
PHPアプリケーションのスケーラビリティと 信頼性を革新する nginx+ngx_mrubyとGoの融合
pyama86
2
220
とにかくHTTP3をライトニングに話す / Anyway, I'll talk to Lightning about HTTP3.
seike460
PRO
0
110
架空のコンペ_スクワットフォーム判別コンペの解法
cpptake
0
430
LPIXEL×CADDi_kaerururu
kaerururu
3
270
ログラスの継続的なプロンプト改善のためのLLMOpsの今 / LLMOps at loglass now
rkaga
PRO
1
330
Deep Dive into the Symfony Security Component
hhamon
1
180
Spring Boot 2 to Spring Boot 3 with Java 21 and Jakarta EE
ivargrimstad
0
820
Cloudflare Workersの環境を再現することについて
yusukebe
5
720
私がエッジを使う理由
chimame
9
3.6k
Some Quick Ideas To Improve Your Tests ( #jassttokyo )
teyamagu
PRO
2
1.9k
Featured
See All Featured
A designer walks into a library…
pauljervisheath
199
23k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
225
51k
Designing for humans not robots
tammielis
247
25k
Designing with Data
zakiwarfel
94
4.8k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
11
1.4k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
39
4.3k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
67
38k
A Philosophy of Restraint
colly
195
15k
RailsConf 2023
tenderlove
0
500
What’s in a name? Adding method to the madness
productmarketing
PRO
14
2.5k
Optimizing for Happiness
mojombo
369
69k
Statistics for Hackers
jakevdp
789
220k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]