Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
88
Other Decks in Programming
See All in Programming
スマホで海難事故は防げるか?年間2000件以上の小型船舶の事故に挑むアプリ開発
atsuki_seo
0
110
プログラミングどうやる? ~テスト駆動開発から学ぶ達人の型~
a_okui
0
180
麻雀点数計算問題生成タスクから学ぶ Single Agentの限界と Agentic Workflowの底力
po3rin
5
1.9k
Back to the Future: Let me tell you about the ACP protocol
terhechte
0
120
Introducing FrankenPHP gRPC
dunglas
2
1k
Django Ninja による API 開発効率化とリプレースの実践
kashewnuts
0
520
実践AIチャットボットUI実装入門
syumai
7
2.2k
OWASP Kansai DAY 2025.09: OSINTにふれてみよう
deka_morita
0
150
エンジニアとして高みを目指す、 利益を生み出す設計の考え方 / design-for-profit
minodriven
21
11k
iOS 17で追加されたSubscriptionStoreView を利用して5分でサブスク実装チャレンジ
natmark
0
120
パフォーマンスチューニングで Web 技術を深掘り直す
progfay
18
4.6k
SpecKitでどこまでできる? コストはどれくらい?
leveragestech
0
190
Featured
See All Featured
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
Building Better People: How to give real-time feedback that sticks.
wjessup
368
20k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
Testing 201, or: Great Expectations
jmmastey
45
7.7k
The Language of Interfaces
destraynor
162
25k
Being A Developer After 40
akosma
90
590k
4 Signs Your Business is Dying
shpigford
185
22k
Into the Great Unknown - MozCon
thekraken
40
2k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.9k
Making Projects Easy
brettharned
118
6.4k
Site-Speed That Sticks
csswizardry
10
850
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]