Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
GUI操作LLMの最新動向: UI-TARSと関連論文紹介
kfujikawa
0
810
Strands Agents で実現する名刺解析アーキテクチャ
omiya0555
1
120
Amazon Q CLI開発で学んだAIコーディングツールの使い方
licux
3
180
「リーダーは意思決定する人」って本当?~ 学びを現場で活かす、リーダー4ヶ月目の試行錯誤 ~
marina1017
0
210
リッチエディターを安全に開発・運用するために
unachang113
1
380
kiroでゲームを作ってみた
iriikeita
0
150
The State of Fluid (2025)
s2b
0
130
令和最新版手のひらコンピュータ
koba789
13
7.5k
Dart 参戦!!静的型付き言語界の隠れた実力者
kno3a87
0
190
GitHub Copilotの全体像と活用のヒント AI駆動開発の最初の一歩
74th
7
2.5k
なぜ今、Terraformの本を書いたのか? - 著者陣に聞く!『Terraformではじめる実践IaC』登壇資料
fufuhu
4
570
STUNMESH-go: Wireguard NAT穿隧工具的源起與介紹
tjjh89017
0
350
Featured
See All Featured
Designing Experiences People Love
moore
142
24k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
How GitHub (no longer) Works
holman
314
140k
Statistics for Hackers
jakevdp
799
220k
RailsConf 2023
tenderlove
30
1.2k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.9k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.4k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
The Language of Interfaces
destraynor
158
25k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]