Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
37
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
AIによる開発の民主化を支える コンテキスト管理のこれまでとこれから
mulyu
3
380
Vibe Coding - AI 驅動的軟體開發
mickyp100
0
180
Fluid Templating in TYPO3 14
s2b
0
130
AIによるイベントストーミング図からのコード生成 / AI-powered code generation from Event Storming diagrams
nrslib
2
1.9k
AI巻き込み型コードレビューのススメ
nealle
2
440
ノイジーネイバー問題を解決する 公平なキューイング
occhi
0
110
疑似コードによるプロンプト記述、どのくらい正確に実行される?
kokuyouwind
0
390
Unicodeどうしてる? PHPから見たUnicode対応と他言語での対応についてのお伺い
youkidearitai
PRO
1
2.6k
Claude Codeと2つの巻き戻し戦略 / Two Rewind Strategies with Claude Code
fruitriin
0
140
AIフル活用時代だからこそ学んでおきたい働き方の心得
shinoyu
0
140
FOSDEM 2026: STUNMESH-go: Building P2P WireGuard Mesh Without Self-Hosted Infrastructure
tjjh89017
0
170
2026年 エンジニアリング自己学習法
yumechi
0
140
Featured
See All Featured
Reality Check: Gamification 10 Years Later
codingconduct
0
2k
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
97
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
320
Writing Fast Ruby
sferik
630
62k
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
120
Believing is Seeing
oripsolob
1
56
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.3k
A Tale of Four Properties
chriscoyier
162
24k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
14k
My Coaching Mixtape
mlcsv
0
48
Building a Modern Day E-commerce SEO Strategy
aleyda
45
8.7k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]