Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
760
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
30
Usability Testing Primo
ewlarson
0
84
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
86
Other Decks in Programming
See All in Programming
XStateを用いた堅牢なReact Components設計~複雑なClient Stateをシンプルに~ @React Tokyo ミートアップ #2
kfurusho
1
870
[JAWS-UG横浜 #80] うわっ…今年のServerless アップデート、少なすぎ…?
maroon1st
1
180
個人アプリを2年ぶりにアプデしたから褒めて / I just updated my personal app, praise me!
lovee
0
340
SwiftUIで単方向アーキテクチャを導入して得られた成果
takuyaosawa
0
270
Linux && Docker 研修/Linux && Docker training
forrep
24
4.5k
一休.com のログイン体験を支える技術 〜Web Components x Vue.js 活用事例と最適化について〜
atsumim
0
320
JavaScriptツール群「UnJS」を5分で一気に駆け巡る!
k1tikurisu
9
1.8k
sappoRo.R #12 初心者セッション
kosugitti
0
240
苦しいTiDBへの移行を乗り越えて快適な運用を目指す
leveragestech
0
340
最近のVS Codeで気になるニュース 2025/01
74th
1
260
CNCF Project の作者が考えている OSS の運営
utam0k
6
710
動作確認やテストで漏れがちな観点3選
starfish719
6
1k
Featured
See All Featured
Reflections from 52 weeks, 52 projects
jeffersonlam
348
20k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
10
1.3k
Speed Design
sergeychernyshev
26
790
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
45
2.3k
Documentation Writing (for coders)
carmenintech
67
4.6k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
132
33k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
29
2.2k
Rebuilding a faster, lazier Slack
samanthasiow
80
8.8k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
The Invisible Side of Design
smashingmag
299
50k
How GitHub (no longer) Works
holman
313
140k
Optimizing for Happiness
mojombo
376
70k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]