Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
800
4
Share
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
40
Usability Testing Primo
ewlarson
0
95
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
92
Other Decks in Programming
See All in Programming
HTML-Aware ERB: The Path to Reactive Rendering @ RubyKaigi 2026, Hakodate, Japan
marcoroth
0
630
AgentCore Optimizationを始めよう!
licux
3
170
From Formal Specification to Property Based Test
ohbarye
0
690
いつか誰かが、と思っていた フロントエンド刷新5年間の実践知
kiichisugihara
1
250
Claude CodeでETLジョブ実行テストを自動化してみた
yoshikikasama
0
1.1k
2026-04-15 Spring IO - I Can See Clearly Now
jonatan_ivanov
1
170
ソースコード→AST→オペコード、の旅を覗いてみる
o0h
PRO
1
120
「話せることがない」を乗り越える 〜日常業務から登壇テーマをつくる思考法〜
shoheimitani
4
960
2026年のソフトウェア開発を考える(2026/05版) / Software Engineering Scrum Fest Niigata 2026 Edition
twada
PRO
19
10k
ハーネスエンジニアリングとは?
kinopeee
13
6.7k
Spec Driven Development | AI Summit Vilnius
danielsogl
PRO
1
140
Programming with a DJ Controller — not vibe coding
m_seki
3
770
Featured
See All Featured
Crafting Experiences
bethany
1
140
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
2
190
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
290
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
230
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.9k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
430
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.6k
The Limits of Empathy - UXLibs8
cassininazir
1
320
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
370
Speed Design
sergeychernyshev
33
1.6k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
110
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]