Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
37
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
AtCoder Conference 2025
shindannin
0
1.1k
humanlayerのブログから学ぶ、良いCLAUDE.mdの書き方
tsukamoto1783
0
200
AIによる開発の民主化を支える コンテキスト管理のこれまでとこれから
mulyu
3
410
生成AIを使ったコードレビューで定性的に品質カバー
chiilog
1
270
Raku Raku Notion 20260128
hareyakayuruyaka
0
340
AI時代のキャリアプラン「技術の引力」からの脱出と「問い」へのいざない / tech-gravity
minodriven
21
7.4k
15年続くIoTサービスのSREエンジニアが挑む分散トレーシング導入
melonps
2
220
AI Schema Enrichment for your Oracle AI Database
thatjeffsmith
0
310
AgentCoreとHuman in the Loop
har1101
5
240
Basic Architectures
denyspoltorak
0
680
OSSとなったswift-buildで Xcodeのビルドを差し替えられるため 自分でXcodeを直せる時代になっている ダイアモンド問題編
yimajo
3
620
Patterns of Patterns
denyspoltorak
0
1.4k
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
40
2.3k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
HDC tutorial
michielstock
1
390
AI: The stuff that nobody shows you
jnunemaker
PRO
2
270
Building Applications with DynamoDB
mza
96
6.9k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
83
Claude Code のすすめ
schroneko
67
210k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
3.9k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
120
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
230
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
55
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]