$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
35
Usability Testing Primo
ewlarson
0
93
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
エディターってAIで操作できるんだぜ
kis9a
0
630
Media Capture and Streams: W3C仕様と現場での知見
nowaki28
0
130
モデル駆動設計をやってみよう Modeling Forum2025ワークショップ/Let’s Try Model-Driven Design
haru860
0
230
WebRTC と Rust と8K 60fps
tnoho
2
1.9k
ローターアクトEクラブ アメリカンナイト:川端 柚菜 氏(Japan O.K. ローターアクトEクラブ 会長):2720 Japan O.K. ロータリーEクラブ2025年12月1日卓話
2720japanoke
0
400
251126 TestState APIってなんだっけ?Step Functionsテストどう変わる?
east_takumi
0
290
TVerのWeb内製化 - 開発スピードと品質を両立させるまでの道のり
techtver
PRO
3
1.4k
AIコードレビューがチームの"文脈"を 読めるようになるまで
marutaku
0
300
著者と進める!『AIと個人開発したくなったらまずCursorで要件定義だ!』
yasunacoffee
0
110
俺流レスポンシブコーディング 2025
tak_dcxi
13
7.5k
非同期処理の迷宮を抜ける: 初学者がつまづく構造的な原因
pd1xx
1
420
【レイトレ合宿11】kagayaki_v4
runningoutrate
0
220
Featured
See All Featured
Producing Creativity
orderedlist
PRO
348
40k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.3k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
It's Worth the Effort
3n
187
29k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
For a Future-Friendly Web
brad_frost
180
10k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.8k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Visualization
eitanlees
150
16k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]