Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Eric Larson
February 07, 2012
Programming
800
4
Share
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
40
Usability Testing Primo
ewlarson
0
97
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
92
Other Decks in Programming
See All in Programming
Augmenting AI with the Power of Jakarta EE
ivargrimstad
0
380
SPMマルチモジュールで テストカバレッジを取得する技法
yosshi4486
0
130
ふつうのFeature Flag実践入門
irof
7
3.3k
軽量Java基盤の設計 DIコンテナに頼らない、長期保守と1秒起動の実現 JJUG CCC 2026 Spring
macha64
0
200
AI駆動開発で崩れていくコードベースを立て直す
kyoko_nr_nr
1
400
Inspired By RubyKaigi (EN)
atzzcokek
0
440
Old Dog, New Tricks: The Java 25 Reinvention - JNation
bazlur_rahman
0
140
ユニットテストの先へ:テスト技法で要求・仕様を整理するJava開発実践 / Beyond_Unit_Testing_Practical_Java_Development_Techniques_for_Organizing_Requirements_and_Specifications
shimashima35
0
310
JJUG CCC 2026 Spring: JSpecify で実現する Kotlin フレンドリーな Java API 設計
ternbusty
1
110
技術記事、AIに書かせるか、自分で書くか? 〜それでも私が自分の手で書く理由〜 / #QiitaConference
jnchito
2
1.2k
Why Laravel apps break—Mastering the fundamentals to keep them maintainable
kentaroutakeda
1
320
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3k
Featured
See All Featured
Building Flexible Design Systems
yeseniaperezcruz
330
40k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
760
Prompt Engineering for Job Search
mfonobong
0
320
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
3
150
For a Future-Friendly Web
brad_frost
183
10k
Producing Creativity
orderedlist
PRO
348
40k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
エンジニアに許された特別な時間の終わり
watany
107
240k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
930
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
35k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]