Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Eric Larson
February 07, 2012
Programming
800
4
Share
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
39
Usability Testing Primo
ewlarson
0
95
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
91
Other Decks in Programming
See All in Programming
それはエンジニアリングの糧である:AI開発のためにAIのOSSを開発する現場より / It serves as fuel for engineering: insights from the field of developing open-source AI for AI development.
nrslib
1
820
Java 21/25 Virtual Threads 소개
debop
0
320
AI時代の脳疲弊と向き合う ~言語学としてのPHP~
sakuraikotone
1
1.8k
iOS機能開発のAI環境と起きた変化
ryunakayama
0
150
KagglerがMixSeekを触ってみた
morim
0
370
へんな働き方
yusukebe
6
2.9k
20260313 - Grafana & Friends Taipei #1 - Kubernetes v1.36 的開發雜記:那些困在 Alpha 加護病房太久的 Metrics
tico88612
0
250
L’IA au service des devs : Anatomie d'un assistant de Code Review
toham
0
200
我々はなぜ「層」を分けるのか〜「関心の分離」と「抽象化」で手に入れる変更に強いシンプルな設計〜 #phperkaigi / PHPerKaigi 2026
shogogg
2
800
Coding as Prompting Since 2025
ragingwind
0
720
Linux Kernelの1文字のミスで 権限昇格ができた話
rqda
0
2.3k
Radical Imagining - LIFT 2025-2027 Policy Agenda
lift1998
0
230
Featured
See All Featured
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
170
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
110
Reality Check: Gamification 10 Years Later
codingconduct
0
2.1k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
10k
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
110
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
250
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
27
3.4k
A designer walks into a library…
pauljervisheath
211
24k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
330
Optimizing for Happiness
mojombo
378
71k
What's in a price? How to price your products and services
michaelherold
247
13k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]