Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
800
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
40
Usability Testing Primo
ewlarson
0
97
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
92
Other Decks in Programming
See All in Programming
Strategic Design in the Frontend: Moduliths & Micro Frontends @DDDEurope
manfredsteyer
PRO
0
110
Java × distroless で 軽量なコンテナイメージを / Java on Distroless
contour_gara
0
550
CSC307 Lecture 17
javiergs
PRO
0
320
Creating Composable Callables in Contemporary C++
rollbear
0
150
Dataformのリポジトリを立ち上げるときにまずやること / dataform-day0-2026
snhryt
0
170
AI 時代のソフトウェア設計の学び方
masuda220
PRO
29
13k
技術的負債解消で開発者の未来を開く- AIの力でコード刷新
kmd2kmd
0
100
AIで効率化できた業務・日常
ochtum
0
140
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
560
Language Server 使ってる? 〜VSCode と Zed の場合〜 / Are you using a Language Server? ~For VS Code and Zed~
handlename
0
790
メソッドのジェネリクスでGoの夢は広がるか? / Kyoto.go #65
utgwkk
3
820
Hunting Vulnerabilities in Symfony with LLMs
vinceamstoutz
0
550
Featured
See All Featured
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.7k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
160
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
220
WENDY [Excerpt]
tessaabrams
11
38k
Principles of Awesome APIs and How to Build Them.
keavy
128
18k
Become a Pro
speakerdeck
PRO
31
6k
Building the Perfect Custom Keyboard
takai
2
800
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
240
Statistics for Hackers
jakevdp
799
230k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
140
A Soul's Torment
seathinner
6
3k
Into the Great Unknown - MozCon
thekraken
41
2.6k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]