Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
800
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
40
Usability Testing Primo
ewlarson
0
97
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
92
Other Decks in Programming
See All in Programming
TypeScript+Orvalで実現する型安全かつ堅牢でスケーラブルなマルチチャネル通知基盤 / TSKaigi Night talks ~after conference~
d0riven
0
350
Language Server 使ってる? 〜VSCode と Zed の場合〜 / Are you using a Language Server? ~For VS Code and Zed~
handlename
0
790
Dataformのリポジトリを立ち上げるときにまずやること / dataform-day0-2026
snhryt
0
170
ローカルLLMでどこまでコードが書けるか -拡張版 / How much code can be written on a local LLM Extended
kishida
11
4.3k
「エンジニアインターン、どうやって取った?」準備のリアルを語るLT会 Progate BAR
akiomatic
0
130
Java × distroless で 軽量なコンテナイメージを / Java on Distroless
contour_gara
0
550
Vite+ Unified Toolchain for the Web
naokihaba
0
320
気づいたらRubyで100作品 ー クリエイティブコーディングが生活の一部になるまで / 100 Ruby Sketches Later: How Creative Coding Became Part of My Life
chobishiba
3
590
「AIで開発し、AIを届ける」をEvalでつなぐ 〜AIネイティブに始めるプロダクト開発の実践〜 / Connecting "Develop with AI, deliver AI" with Eval
rkaga
4
5.3k
dRuby over BLE
makicamel
2
380
Vue × Nuxt × Oxc どこまで使える?実運用の現在地
andpad
0
260
キャリア迷子上等 ─ "ない道"は自分で作ればいい
16bitidol
3
2.1k
Featured
See All Featured
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.3k
Deep Space Network (abreviated)
tonyrice
0
210
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
850
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
250
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
210
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
440
Design in an AI World
tapps
1
250
Amusing Abliteration
ianozsvald
1
210
The Mindset for Success: Future Career Progression
greggifford
PRO
0
360
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
370
Rebuilding a faster, lazier Slack
samanthasiow
85
9.5k
Code Review Best Practice
trishagee
74
20k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]