Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
750
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
29
Usability Testing Primo
ewlarson
0
83
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
85
Other Decks in Programming
See All in Programming
AI時代におけるSRE、 あるいはエンジニアの生存戦略
pyama86
6
1.2k
DevTools extensions で 独自の DevTool を開発する | FlutterKaigi 2024
kokiyoshida
0
170
3 Effective Rules for Using Signals in Angular
manfredsteyer
PRO
0
150
EMになってからチームの成果を最大化するために取り組んだこと/ Maximize team performance as EM
nashiusagi
0
110
Gestaltung digitaler Lösungen – Produktions- oder Designprozess?
techstories
0
100
Kaigi on Rails 2024 〜運営の裏側〜
krpk1900
1
300
CSC509 Lecture 12
javiergs
PRO
0
160
社内活動の取り組み紹介 ~ スリーシェイクでこんな取り組みしてます ~
bells17
0
270
Figma Dev Modeで変わる!Flutterの開発体験
watanave
0
3.3k
RubyLSPのマルチバイト文字対応
notfounds
0
120
Make Impossible States Impossibleを 意識してReactのPropsを設計しよう
ikumatadokoro
0
310
Modular Monolith Monorepo ~シンプルさを保ちながらmonorepoのメリットを最大化する~
yuisakamoto
10
3k
Featured
See All Featured
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
28
2k
Designing on Purpose - Digital PM Summit 2013
jponch
115
7k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
Automating Front-end Workflow
addyosmani
1366
200k
Optimizing for Happiness
mojombo
376
70k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
6.9k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
5 minutes of I Can Smell Your CMS
philhawksworth
202
19k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
329
21k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
229
52k
Being A Developer After 40
akosma
87
590k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]