Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
800
4
Share
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
40
Usability Testing Primo
ewlarson
0
95
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
92
Other Decks in Programming
See All in Programming
🦞OpenClaw works with AWS
licux
1
330
PHPer、Cloudflare に引っ越す
suguruooki
1
130
GitHubCopilotCLIをはじめよう.pdf
htkym
0
320
ソフトウェア設計の結合バランス #phperkaigi
kajitack
0
490
Kubernetesを使わない環境にもCloud Nativeなデプロイを実現する / Enabling Cloud Native deployments without the complexity of Kubernetes
linyows
1
110
書き換えて学ぶTemporal #fukts
pirosikick
2
340
「OSSがあるなら自作するな」は AI時代も正しいか ── Build vs Adopt の新しい判断基準
kumorn5s
0
240
Back to the roots of date
jinroq
0
670
My daily life on Ruby
a_matsuda
2
180
의존성 주입과 모듈화
fornewid
0
160
2026-04-15 Spring IO - I Can See Clearly Now
jonatan_ivanov
1
170
when storing skills in S3 file
watany
3
980
Featured
See All Featured
The Curious Case for Waylosing
cassininazir
0
340
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
3k
Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO
greggifford
PRO
0
160
Leo the Paperboy
mayatellez
7
1.7k
A Modern Web Designer's Workflow
chriscoyier
698
190k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.9k
How to Think Like a Performance Engineer
csswizardry
28
2.6k
WENDY [Excerpt]
tessaabrams
10
37k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
130
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
110k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.9k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
180
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]