Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
88
Other Decks in Programming
See All in Programming
Building, Deploying, and Monitoring Ruby Web Applications with Falcon (Kaigi on Rails 2025)
ioquatix
4
2.5k
CSC305 Lecture 09
javiergs
PRO
0
300
CSC509 Lecture 06
javiergs
PRO
0
260
pnpm に provenance のダウングレード を検出する PR を出してみた
ryo_manba
1
130
組込みだけじゃない!TinyGo で始める無料クラウド開発入門
otakakot
2
350
Go言語の特性を活かした公式MCP SDKの設計
hond0413
1
410
釣り地図SNSにおける有料機能の実装
nokonoko1203
0
190
タスクの特性や不確実性に応じた最適な作業スタイルの選択(ペアプロ・モブプロ・ソロプロ)と実践 / Optimal Work Style Selection: Pair, Mob, or Solo Programming.
honyanya
3
190
Le côté obscur des IA génératives
pascallemerrer
0
150
はじめてのDSPy - 言語モデルを『プロンプト』ではなく『プログラミング』するための仕組み
masahiro_nishimi
3
6.4k
PHPに関数型の魂を宿す〜PHP 8.5 で実現する堅牢なコードとは〜 #phpcon_hiroshima / phpcon-hiroshima-2025
shogogg
1
320
あなたとKaigi on Rails / Kaigi on Rails + You
shimoju
0
170
Featured
See All Featured
What's in a price? How to price your products and services
michaelherold
246
12k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Fireside Chat
paigeccino
40
3.7k
The Straight Up "How To Draw Better" Workshop
denniskardys
238
140k
Keith and Marios Guide to Fast Websites
keithpitt
411
23k
Building Applications with DynamoDB
mza
96
6.7k
The Invisible Side of Design
smashingmag
302
51k
Gamification - CAS2011
davidbonilla
81
5.5k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Code Reviewing Like a Champion
maltzj
526
40k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]