Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
37
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
要求定義・仕様記述・設計・検証の手引き - 理論から学ぶ明確で統一された成果物定義
orgachem
PRO
1
170
AIフル活用時代だからこそ学んでおきたい働き方の心得
shinoyu
0
140
OCaml 5でモダンな並列プログラミングを Enjoyしよう!
haochenx
0
140
MDN Web Docs に日本語翻訳でコントリビュート
ohmori_yusuke
0
650
CSC307 Lecture 03
javiergs
PRO
1
490
AIエージェント、”どう作るか”で差は出るか? / AI Agents: Does the "How" Make a Difference?
rkaga
4
2k
dchart: charts from deck markup
ajstarks
3
1k
Oxlintはいいぞ
yug1224
5
1.4k
AIと一緒にレガシーに向き合ってみた
nyafunta9858
0
250
2026年 エンジニアリング自己学習法
yumechi
0
140
[KNOTS 2026登壇資料]AIで拡張‧交差する プロダクト開発のプロセス および携わるメンバーの役割
hisatake
0
290
ノイジーネイバー問題を解決する 公平なキューイング
occhi
0
110
Featured
See All Featured
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Being A Developer After 40
akosma
91
590k
From π to Pie charts
rasagy
0
120
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Navigating Weather and Climate Data
rabernat
0
110
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
220
WCS-LA-2024
lcolladotor
0
450
YesSQL, Process and Tooling at Scale
rocio
174
15k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.6k
30 Presentation Tips
portentint
PRO
1
220
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]