Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
690
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
25
Usability Testing Primo
ewlarson
0
78
Ruby Tips
ewlarson
0
100
9 Keys to Great UX
ewlarson
0
76
Other Decks in Programming
See All in Programming
Let's learn code review
riofujimon
2
440
MetricKitで予期せぬ終了を検知する話 / Detect unexpected termination with MetricKit
nekowen
1
190
Git Lint
bkuhlmann
4
750
Elm Form Validation
bkuhlmann
0
510
エンターテイメント業界で利用されるAWS
demuyan
0
210
はてなにおける CSS Modules、及び CSS Modules に足りないもの / CSS Modules in Hatena, and CSS Modules missing parts
mizdra
7
940
1BRC--Nerd Sniping the Java Community
gunnarmorling
0
340
TYPO3 v13 – The road to LTS: What's new and new APIs
luisasofie_xoxo
0
210
『Railsオワコン』と言われる時代に、なぜブルーモ証券はRailsを選ぶのか
free_world21
0
260
効率化に挑戦してみたらモバイル開発が少し快適になった話
ryunakayama
0
130
初心者のためのRubyKaigi入門/RubyKaigi Introduction
a_matsuda
5
950
Behind VS Code Extensions for JavaScript / TypeScript Linnting and Formatting
unvalley
5
950
Featured
See All Featured
Bootstrapping a Software Product
garrettdimon
PRO
302
110k
Keith and Marios Guide to Fast Websites
keithpitt
408
22k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
116
18k
How To Stay Up To Date on Web Technology
chriscoyier
782
250k
For a Future-Friendly Web
brad_frost
172
9k
The World Runs on Bad Software
bkeepers
PRO
61
6.7k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
227
16k
The Invisible Customer
myddelton
114
12k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
60
14k
What’s in a name? Adding method to the madness
productmarketing
PRO
16
2.6k
Navigating Team Friction
lara
178
13k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
14
1.5k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]