Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
Understanding Kotlin Multiplatform
l2hyunwoo
0
250
Infer入門
riru
4
1.4k
オホーツクでコミュニティを立ち上げた理由―地方出身プログラマの挑戦 / TechRAMEN 2025 Conference
lemonade_37
2
460
No Install CMS戦略 〜 5年先を見据えたフロントエンド開発を考える / no_install_cms
rdlabo
0
480
パスタの技術
yusukebe
1
360
QA x AIエコシステム段階構築作戦
osu
0
270
Vibe coding コードレビュー
kinopeee
0
430
ライブ配信サービスの インフラのジレンマ -マルチクラウドに至ったワケ-
mirrativ
1
170
コーディングは技術者(エンジニア)の嗜みでして / Learning the System Development Mindset from Rock Lady
mackey0225
2
430
中級グラフィックス入門~効率的なメッシュレット描画~
projectasura
4
2.6k
可変性を制する設計: 構造と振る舞いから考える概念モデリングとその実装
a_suenami
10
1.7k
それ CLI フレームワークがなくてもできるよ / Building CLI Tools Without Frameworks
orgachem
PRO
17
3.8k
Featured
See All Featured
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3k
The Language of Interfaces
destraynor
158
25k
Balancing Empowerment & Direction
lara
1
540
Agile that works and the tools we love
rasmusluckow
329
21k
How STYLIGHT went responsive
nonsquared
100
5.7k
The Cost Of JavaScript in 2023
addyosmani
51
8.8k
Faster Mobile Websites
deanohume
308
31k
Done Done
chrislema
185
16k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]