Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
800
4
Share
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
39
Usability Testing Primo
ewlarson
0
95
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
91
Other Decks in Programming
See All in Programming
セグメントとターゲットを意識するプロポーザルの書き方 〜採択の鍵は、誰に刺すかを見極めるマーケティング戦略にある〜
m3m0r7
PRO
0
500
Kubernetes上でAgentを動かすための最新動向と押さえるべき概念まとめ
sotamaki0421
3
490
KagglerがMixSeekを触ってみた
morim
0
380
Exploring RuboCop with MCP
koic
0
280
t *testing.T は どこからやってくるの?
otakakot
0
120
Getting more out of Maven
mlvandijk
0
110
Offline should be the norm: building local-first apps with CRDTs & Kotlin Multiplatform
renaudmathieu
0
200
Coding at the Speed of Thought: The New Era of Symfony Docker
dunglas
0
4.9k
事業会社でのセキュリティ長期インターンについて
masachikaura
0
250
Going Multiplatform with Your Android App (Android Makers 2026)
zsmb
2
400
「効かない!」依存性注入(DI)を活用したAPI Platformのエラーハンドリング奮闘記
mkmk884
0
320
Codex CLIのSubagentsによる並列API実装 / Parallel API Implementation with Codex CLI Subagents
takatty
2
890
Featured
See All Featured
Large-scale JavaScript Application Architecture
addyosmani
515
110k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
10k
The Spectacular Lies of Maps
axbom
PRO
1
690
The World Runs on Bad Software
bkeepers
PRO
72
12k
New Earth Scene 8
popppiees
3
2.1k
JAMstack: Web Apps at Ludicrous Speed - All Things Open 2022
reverentgeek
1
420
The Curse of the Amulet
leimatthew05
1
11k
Done Done
chrislema
186
16k
Test your architecture with Archunit
thirion
1
2.2k
Scaling GitHub
holman
464
140k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
27
3.4k
Color Theory Basics | Prateek | Gurzu
gurzu
0
290
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]