Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
780
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
89
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
A full stack side project webapp all in Kotlin (KotlinConf 2025)
dankim
0
100
設計やレビューに悩んでいるPHPerに贈る、クリーンなオブジェクト設計の指針たち
panda_program
6
1.9k
PicoRuby on Rails
makicamel
2
120
XP, Testing and ninja testing
m_seki
3
230
プロダクト志向ってなんなんだろうね
righttouch
PRO
0
180
CursorはMCPを使った方が良いぞ
taigakono
1
220
『自分のデータだけ見せたい!』を叶える──Laravel × Casbin で複雑権限をスッキリ解きほぐす 25 分
akitotsukahara
2
620
エンジニア向け採用ピッチ資料
inusan
0
180
deno-redisの紹介とJSRパッケージの運用について (toranoana.deno #21)
uki00a
0
180
初学者でも今すぐできる、Claude Codeの生産性を10倍上げるTips
s4yuba
14
9.7k
Flutterで備える!Accessibility Nutrition Labels完全ガイド
yuukiw00w
0
150
5つのアンチパターンから学ぶLT設計
narihara
1
160
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
940
We Have a Design System, Now What?
morganepeng
53
7.7k
How GitHub (no longer) Works
holman
314
140k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
53k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Into the Great Unknown - MozCon
thekraken
39
1.9k
Code Reviewing Like a Champion
maltzj
524
40k
Mobile First: as difficult as doing things right
swwweet
223
9.7k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Side Projects
sachag
455
42k
Intergalactic Javascript Robots from Outer Space
tanoku
271
27k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]