Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
700
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
26
Usability Testing Primo
ewlarson
0
78
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
76
Other Decks in Programming
See All in Programming
GitLab CI/CD で C#/WPFアプリケーションのテストとインストーラーのビルド・デプロイを自動化する
hacarus
0
460
障害対応を起点としたもっといい開発と運用のサイクル作りのためにできること / Hatena Enginner Seminar #29
polamjag
0
410
デフォルトにして至高、RubyMineの大好きな所
ruzia
0
990
MetricKitで予期せぬ終了を検知する話 / Detect unexpected termination with MetricKit
nekowen
1
200
Node.js v22 で変わること
yosuke_furukawa
PRO
12
4k
Polars入門
daikikatsuragawa
1
190
Direct Style Effect Systems The Print[A] ExampleA Comprehension Aid
philipschwarz
PRO
0
180
敵対的ポイフル
futabato
0
140
Fast JSX: Don't clone props object #28768
yossydev
1
200
業務ツールとして使うPostman
msys75
0
110
Behind VS Code Extensions for JavaScript / TypeScript Linnting and Formatting
unvalley
6
1.3k
Hanami and htmx
bkuhlmann
0
230
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
30
6.4k
Practical Orchestrator
shlominoach
183
9.7k
Ruby is Unlike a Banana
tanoku
96
10k
What's new in Ruby 2.0
geeforr
337
31k
Imperfection Machines: The Place of Print at Facebook
scottboms
261
12k
The World Runs on Bad Software
bkeepers
PRO
61
6.7k
Building Adaptive Systems
keathley
32
1.9k
Optimizing for Happiness
mojombo
370
69k
The Invisible Side of Design
smashingmag
294
49k
Building Flexible Design Systems
yeseniaperezcruz
320
37k
Building Applications with DynamoDB
mza
88
5.6k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
221
21k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]