Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
35
Usability Testing Primo
ewlarson
0
93
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
AIの誤りが許されない業務システムにおいて“信頼されるAI” を目指す / building-trusted-ai-systems
yuya4
6
3.8k
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
250
ゲームの物理 剛体編
fadis
0
360
tparseでgo testの出力を見やすくする
utgwkk
2
260
ELYZA_Findy AI Engineering Summit登壇資料_AIコーディング時代に「ちゃんと」やること_toB LLMプロダクト開発舞台裏_20251216
elyza
2
390
ZOZOにおけるAI活用の現在 ~モバイルアプリ開発でのAI活用状況と事例~
zozotech
PRO
9
5.8k
バックエンドエンジニアによる Amebaブログ K8s 基盤への CronJobの導入・運用経験
sunabig
0
170
Cap'n Webについて
yusukebe
0
140
AIエージェントを活かすPM術 AI駆動開発の現場から
gyuta
0
440
MAP, Jigsaw, Code Golf 振り返り会 by 関東Kaggler会|Jigsaw 15th Solution
hasibirok0
0
260
SwiftUIで本格音ゲー実装してみた
hypebeans
0
450
AIコーディングエージェント(Gemini)
kondai24
0
250
Featured
See All Featured
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
286
14k
WCS-LA-2024
lcolladotor
0
380
Digital Ethics as a Driver of Design Innovation
axbom
PRO
0
130
What does AI have to do with Human Rights?
axbom
PRO
0
1.9k
Into the Great Unknown - MozCon
thekraken
40
2.2k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
47
Rails Girls Zürich Keynote
gr2m
95
14k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
120
Believing is Seeing
oripsolob
0
11
The SEO Collaboration Effect
kristinabergwall1
0
300
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]