Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
35
Usability Testing Primo
ewlarson
0
93
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
宅宅自以為的浪漫:跟 AI 一起為自己辦的研討會寫一個售票系統
eddie
0
510
dotfiles 式年遷宮 令和最新版
masawada
1
800
0→1 フロントエンド開発 Tips🚀 #レバテックMeetup
bengo4com
0
110
re:Invent 2025 のイケてるサービスを紹介する
maroon1st
0
140
ハイパーメディア駆動アプリケーションとIslandアーキテクチャ: htmxによるWebアプリケーション開発と動的UIの局所的適用
nowaki28
0
440
SwiftUIで本格音ゲー実装してみた
hypebeans
0
460
Claude Codeの「Compacting Conversation」を体感50%減! CLAUDE.md + 8 Skills で挑むコンテキスト管理術
kmurahama
1
590
Navigating Dependency Injection with Metro
l2hyunwoo
1
160
C-Shared Buildで突破するAI Agent バックテストの壁
po3rin
0
410
JETLS.jl ─ A New Language Server for Julia
abap34
2
430
「コードは上から下へ読むのが一番」と思った時に、思い出してほしい話
panda728
PRO
39
26k
Cell-Based Architecture
larchanjo
0
140
Featured
See All Featured
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.3k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Embracing the Ebb and Flow
colly
88
4.9k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
25
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
What's in a price? How to price your products and services
michaelherold
246
13k
Chasing Engaging Ingredients in Design
codingconduct
0
77
Art, The Web, and Tiny UX
lynnandtonic
304
21k
A designer walks into a library…
pauljervisheath
210
24k
The Curious Case for Waylosing
cassininazir
0
190
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
The SEO Collaboration Effect
kristinabergwall1
0
300
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]