Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
800
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
39
Usability Testing Primo
ewlarson
0
95
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
91
Other Decks in Programming
See All in Programming
ロボットのための工場に灯りは要らない
watany
11
3k
CS教育のDX AIによる育成の効率化
niftycorp
PRO
0
150
S3ストレージクラスの「見える」「ある」「使える」は全部違う ─ 体験から見た、仕様の深淵を覗く
ya_ma23
0
810
条件判定に名前、つけてますか? #phperkaigi #c
77web
2
500
[PHPerKaigi 2026]PHPerKaigi2025の企画CodeGolfが最高すぎて社内で内製して半年運営して得た内製と運営の知見
ikezoemakoto
0
220
CSC307 Lecture 14
javiergs
PRO
0
480
Angular-Apps smarter machen mit Gen AI: Lokal und offlinefähig - Hands-on Workshop!
christianliebel
PRO
0
130
DevinとClaude Code、SREの現場で使い倒してみた件
karia
1
1.1k
PHPで TLSのプロトコルを実装してみる
higaki_program
0
310
Linux Kernelの1文字のミスで 権限昇格ができた話
rqda
0
2k
API Platformを活用したPHPによる本格的なWeb API開発 / api-platform-book-intro
ttskch
1
150
ふつうの Rubyist、ちいさなデバイス、大きな一年
bash0c7
0
1.1k
Featured
See All Featured
How Software Deployment tools have changed in the past 20 years
geshan
0
33k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
75
Paper Plane (Part 1)
katiecoart
PRO
0
5.7k
Balancing Empowerment & Direction
lara
5
950
Between Models and Reality
mayunak
2
240
Accessibility Awareness
sabderemane
0
83
What does AI have to do with Human Rights?
axbom
PRO
1
2k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.9k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
380
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
90
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]