Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
88
Other Decks in Programming
See All in Programming
GraphQL×Railsアプリのデータベース負荷分散 - 月間3,000万人利用サービスを無停止で
koxya
1
1.2k
iOSエンジニア向けの英語学習アプリを作る!
yukawashouhei
0
190
CSC509 Lecture 02
javiergs
PRO
0
410
大規模アプリのDIフレームワーク刷新戦略 ~過去最大規模の並行開発を止めずにアプリ全体に導入するまで~
mot_techtalk
0
410
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
220
なぜGoのジェネリクスはこの形なのか? Featherweight Goが明かす設計の核心
ryotaros
7
1k
Cursorハンズオン実践!
eltociear
1
360
CSC305 Lecture 01
javiergs
PRO
1
400
登壇は dynamic! な営みである / speech is dynamic
da1chi
0
180
バッチ処理を「状態の記録」から「事実の記録」へ
panda728
PRO
0
120
After go func(): Goroutines Through a Beginner’s Eye
97vaibhav
0
240
Breaking Up with Big ViewModels — Without Breaking Your Architecture (droidcon Berlin 2025)
steliosf
PRO
1
350
Featured
See All Featured
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Music & Morning Musume
bryan
46
6.8k
Designing for humans not robots
tammielis
254
26k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.5k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.7k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
610
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
A designer walks into a library…
pauljervisheath
209
24k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]