Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Eric Larson
February 07, 2012
Programming
800
4
Share
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
40
Usability Testing Primo
ewlarson
0
97
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
92
Other Decks in Programming
See All in Programming
Claspは野良GASの夢をみるか
takter00
0
140
要はバランスからの卒業 #yumemi_grow
kajitack
0
200
Why Laravel apps break—Mastering the fundamentals to keep them maintainable
kentaroutakeda
1
320
LLM Plugin for Node-REDの利用方法と開発について
404background
0
140
不変条件と整合性境界—ビジネスが決める設計判断と実現パターン / Invariants and Consistency Boundaries
nrslib
11
3.1k
ReactとSvelteのその先、Ripple-TS / Beyond React and Svelte: Ripple-TS
ssssota
3
1.8k
肥大化するレガシーコードに立ち向かうためのインターフェース分離と依存の逆転 / JJUG CCC 2026 Spring
hirokunimaeta
0
240
Zod v4 Codec でスキーマに型変換を埋め込む REST API 設計 #TSKaigi2026
ryutaro_yako
0
170
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
2k
Old Dog, New Tricks: The Java 25 Reinvention - JNation
bazlur_rahman
0
140
Stage 3 Decorators でできること / できないこと / TSKaigi 2026
susisu
1
1.3k
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3k
Featured
See All Featured
The Invisible Side of Design
smashingmag
302
52k
SEO for Brand Visibility & Recognition
aleyda
0
4.6k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.8k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.5k
Imperfection Machines: The Place of Print at Facebook
scottboms
270
14k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
420
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
120k
So, you think you're a good person
axbom
PRO
2
2k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
74k
Optimising Largest Contentful Paint
csswizardry
37
3.7k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
230
Heart Work Chapter 1 - Part 1
lfama
PRO
7
36k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]