Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
11年かかって やっとVibe Codingに 時代が追いつきましたね
yimajo
0
160
変化を楽しむエンジニアリング ~ いままでとこれから ~
murajun1978
0
260
The Modern View Layer Rails Deserves: A Vision For 2025 And Beyond @ RailsConf 2025, Philadelphia, PA
marcoroth
2
790
AI Ramen Fight
yusukebe
0
110
What's new in Adaptive Android development
fornewid
0
110
DMMを支える決済基盤の技術的負債にどう立ち向かうか / Addressing Technical Debt in Payment Infrastructure
yoshiyoshifujii
4
610
Strands Agents で実現する名刺解析アーキテクチャ
omiya0555
1
110
副作用と戦う PHP リファクタリング ─ ドメインイベントでビジネスロジックを解きほぐす
kajitack
3
460
The Niche of CDK Grant オブジェクトって何者?/the-niche-of-cdk-what-isgrant-object
hassaku63
1
700
脱Riverpod?fqueryで考える、TanStack Queryライクなアーキテクチャの可能性
ostk0069
0
560
CDK引数設計道場100本ノック
badmintoncryer
2
570
202507_ADKで始めるエージェント開発の基本 〜デモを通じて紹介〜(奥田りさ)
risatube
PRO
5
1k
Featured
See All Featured
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
Making the Leap to Tech Lead
cromwellryan
134
9.4k
The Cost Of JavaScript in 2023
addyosmani
51
8.6k
Faster Mobile Websites
deanohume
308
31k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
Practical Orchestrator
shlominoach
189
11k
Six Lessons from altMBA
skipperchong
28
3.9k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.8k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
760
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]