Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
780
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
89
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
Select API from Kotlin Coroutine
jmatsu
1
190
AIエージェントはこう育てる - GitHub Copilot Agentとチームの共進化サイクル
koboriakira
0
460
Is Xcode slowly dying out in 2025?
uetyo
1
210
Railsアプリケーションと パフォーマンスチューニング ー 秒間5万リクエストの モバイルオーダーシステムを支える事例 ー Rubyセミナー 大阪
falcon8823
4
1k
VS Code Update for GitHub Copilot
74th
1
470
たった 1 枚の PHP ファイルで実装する MCP サーバ / MCP Server with Vanilla PHP
okashoi
1
210
Result型で“失敗”を型にするPHPコードの書き方
kajitack
4
520
プロダクト志向ってなんなんだろうね
righttouch
PRO
0
170
なぜ「共通化」を考え、失敗を繰り返すのか
rinchoku
1
590
Benchmark
sysong
0
270
「Cursor/Devin全社導入の理想と現実」のその後
saitoryc
0
440
システム成長を止めない!本番無停止テーブル移行の全貌
sakawe_ee
1
150
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
124
52k
A Modern Web Designer's Workflow
chriscoyier
694
190k
Intergalactic Javascript Robots from Outer Space
tanoku
271
27k
The Cost Of JavaScript in 2023
addyosmani
51
8.5k
Designing for Performance
lara
609
69k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
Building an army of robots
kneath
306
45k
Why Our Code Smells
bkeepers
PRO
337
57k
GraphQLとの向き合い方2022年版
quramy
49
14k
Building Flexible Design Systems
yeseniaperezcruz
328
39k
Git: the NoSQL Database
bkeepers
PRO
430
65k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]