Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
88
Other Decks in Programming
See All in Programming
AI時代のドメイン駆動設計-DDD実践におけるAI活用のあり方 / ddd-in-ai-era
minodriven
25
9.7k
Trem on Rails - Prompt Engineering com Ruby
elainenaomi
1
100
UbieのAIパートナーを支えるコンテキストエンジニアリング実践
syucream
2
800
旅行プランAIエージェント開発の裏側
ippo012
1
600
さようなら Date。 ようこそTemporal! 3年間先行利用して得られた知見の共有
8beeeaaat
0
250
Rancher と Terraform
fufuhu
2
200
モバイルアプリからWebへの横展開を加速した話_Claude_Code_実践術.pdf
kazuyasakamoto
0
300
Testing Trophyは叫ばない
toms74209200
0
230
為你自己學 Python - 冷知識篇
eddie
1
330
Claude Codeで実装以外の開発フロー、どこまで自動化できるか?失敗と成功
ndadayo
4
1.8k
Google I/O recap web編 大分Web祭り2025
kponda
0
2.9k
Introducing ReActionView: A new ActionView-compatible ERB Engine @ Rails World 2025, Amsterdam
marcoroth
0
310
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
51
5.6k
Six Lessons from altMBA
skipperchong
28
4k
[RailsConf 2023] Rails as a piece of cake
palkan
56
5.8k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Unsuck your backbone
ammeep
671
58k
A better future with KSS
kneath
239
17k
Why Our Code Smells
bkeepers
PRO
339
57k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
How to Ace a Technical Interview
jacobian
279
23k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Optimizing for Happiness
mojombo
379
70k
Documentation Writing (for coders)
carmenintech
73
5k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]