Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
780
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
猫と暮らす Google Nest Cam生活🐈 / WebRTC with Google Nest Cam
yutailang0119
0
120
PHPで始める振る舞い駆動開発(Behaviour-Driven Development)
ohmori_yusuke
2
390
設計やレビューに悩んでいるPHPerに贈る、クリーンなオブジェクト設計の指針たち
panda_program
6
2.1k
The Modern View Layer Rails Deserves: A Vision For 2025 And Beyond @ RailsConf 2025, Philadelphia, PA
marcoroth
1
170
Startups on Rails in Past, Present and Future–Irina Nazarova, RailsConf 2025
irinanazarova
0
110
XP, Testing and ninja testing
m_seki
3
250
Composerが「依存解決」のためにどんな工夫をしているか #phpcon
o0h
PRO
1
260
NPOでのDevinの活用
codeforeveryone
0
840
初学者でも今すぐできる、Claude Codeの生産性を10倍上げるTips
s4yuba
16
11k
#kanrk08 / 公開版 PicoRubyとマイコンでの自作トレーニング計測装置を用いたワークアウトの理想と現実
bash0c7
1
770
『自分のデータだけ見せたい!』を叶える──Laravel × Casbin で複雑権限をスッキリ解きほぐす 25 分
akitotsukahara
2
640
Flutterで備える!Accessibility Nutrition Labels完全ガイド
yuukiw00w
0
160
Featured
See All Featured
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.4k
The Language of Interfaces
destraynor
158
25k
How GitHub (no longer) Works
holman
314
140k
Making Projects Easy
brettharned
116
6.3k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.9k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
45
7.5k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Designing for humans not robots
tammielis
253
25k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.4k
A better future with KSS
kneath
238
17k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]