Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
770
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
85
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
Devin入門 〜月500ドルから始まるAIチームメイトとの開発生活〜 / Introduction Devin 〜Development With AI Teammates〜
rkaga
5
1.5k
ML.NETで始める機械学習
ymd65536
0
250
⚪⚪の⚪⚪をSwiftUIで再現す る
u503
0
140
PromptyによるAI開発入門
ymd65536
1
150
メンテが命: PHPフレームワークのコンテナ化とアップグレード戦略
shunta27
0
340
The Clean ArchitectureがWebフロントエンドでしっくりこないのは何故か / Why The Clean Architecture does not fit with Web Frontend
twada
PRO
62
21k
Jakarta EE meets AI
ivargrimstad
0
860
JAWS Days 2025のインフラ
komakichi
1
370
Expoによるアプリ開発の現在地とReact Server Componentsが切り開く未来
yukukotani
2
290
技術を改善し続ける
gumioji
0
190
.NET Frameworkでも汎用ホストが使いたい!
tomokusaba
0
220
From the Wild into the Clouds - Laravel Meetup Talk
neverything
0
190
Featured
See All Featured
Why Our Code Smells
bkeepers
PRO
336
57k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Code Review Best Practice
trishagee
67
18k
Music & Morning Musume
bryan
46
6.4k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
11
560
4 Signs Your Business is Dying
shpigford
183
22k
Optimizing for Happiness
mojombo
377
70k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Building Adaptive Systems
keathley
40
2.4k
Site-Speed That Sticks
csswizardry
4
420
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]