Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
800
4
Share
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
39
Usability Testing Primo
ewlarson
0
95
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
91
Other Decks in Programming
See All in Programming
野球解説AI Agentを開発してみた - 2026/02/27 LayerX社内LT会資料
shinyorke
PRO
0
400
AI駆動開発がもたらすパラダイムシフト
ryosuke0911
0
120
Java 21/25 Virtual Threads 소개
debop
0
320
車輪の再発明をしよう!PHP で実装して学ぶ、Web サーバーの仕組みと HTTP の正体
h1r0
3
500
GoのDB アクセスにおける 「型安全」と「柔軟性」の両立 - Bob という選択肢
tak848
0
310
PHP 7.4でもOpenTelemetryゼロコード計装がしたい! / PHPerKaigi 2026
arthur1
1
500
Going Multiplatform with Your Android App (Android Makers 2026)
zsmb
1
310
我々はなぜ「層」を分けるのか〜「関心の分離」と「抽象化」で手に入れる変更に強いシンプルな設計〜 #phperkaigi / PHPerKaigi 2026
shogogg
2
800
「接続」—パフォーマンスチューニングの最後の一手 〜点と点を結ぶ、その一瞬のために〜
kentaroutakeda
5
2.4k
ポーリング処理廃止によるイベント駆動アーキテクチャへの移行
seitarof
3
1.3k
VueエンジニアがReactを触って感じた_設計の違い
koukimiura
0
160
Nuxt Server Components
wattanx
0
240
Featured
See All Featured
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
360
The Curious Case for Waylosing
cassininazir
0
290
For a Future-Friendly Web
brad_frost
183
10k
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
How to make the Groovebox
asonas
2
2.1k
We Are The Robots
honzajavorek
0
210
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
500
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
310
Large-scale JavaScript Application Architecture
addyosmani
515
110k
How STYLIGHT went responsive
nonsquared
100
6k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]