Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
37
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
AWS re:Invent 2025参加 直前 Seattle-Tacoma Airport(SEA)におけるハードウェア紛失インシデントLT
tetutetu214
2
120
CSC307 Lecture 05
javiergs
PRO
0
500
Best-Practices-for-Cortex-Analyst-and-AI-Agent
ryotaroikeda
1
110
Raku Raku Notion 20260128
hareyakayuruyaka
0
370
CSC307 Lecture 06
javiergs
PRO
0
690
それ、本当に安全? ファイルアップロードで見落としがちなセキュリティリスクと対策
penpeen
7
4k
AI巻き込み型コードレビューのススメ
nealle
2
1.4k
CSC307 Lecture 08
javiergs
PRO
0
670
生成AIを使ったコードレビューで定性的に品質カバー
chiilog
1
280
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
2.5k
日本だけで解禁されているアプリ起動の方法
ryunakayama
0
270
Smart Handoff/Pickup ガイド - Claude Code セッション管理
yukiigarashi
0
150
Featured
See All Featured
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Embracing the Ebb and Flow
colly
88
5k
Statistics for Hackers
jakevdp
799
230k
Claude Code のすすめ
schroneko
67
210k
Context Engineering - Making Every Token Count
addyosmani
9
670
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
3.9k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
130
The SEO Collaboration Effect
kristinabergwall1
0
360
Unsuck your backbone
ammeep
671
58k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
86
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
0
160
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]