Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
690
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
25
Usability Testing Primo
ewlarson
0
78
Ruby Tips
ewlarson
0
100
9 Keys to Great UX
ewlarson
0
76
Other Decks in Programming
See All in Programming
CA.swift19 恋するAIアプリ開発の裏側
oskmr
0
360
try! Swift Tokyo 2024 参加報告 / try! Swift Tokyo 2024 Report
hironytic
0
210
2 週間で Twitter Bot を作ってみた
contour_gara
0
460
Goのエラースタックトレースの歴史と今後
sonatard
9
1.5k
コーンフレークから始める モデリング会話入門
ogurotakayuki
0
370
はてなにおける CSS Modules、及び CSS Modules に足りないもの / CSS Modules in Hatena, and CSS Modules missing parts
mizdra
7
930
『Railsオワコン』と言われる時代に、なぜブルーモ証券はRailsを選ぶのか
free_world21
0
240
Git Lint
bkuhlmann
4
750
Anthropic Cookbook のおすすめレシピ
schroneko
7
980
AWS CDKコントリビュートTIPS / aws-cdk-contribution-tips
gotok365
2
190
Polars入門
daikikatsuragawa
1
100
大規模UIKitベースアプリへのTCAの段階的導入/gradual-adoption-of-tca-in-a-large-scale-uikit-based-app
takehilo
1
180
Featured
See All Featured
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
125
32k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
78
42k
Into the Great Unknown - MozCon
thekraken
10
990
Optimising Largest Contentful Paint
csswizardry
8
2.4k
The Invisible Side of Design
smashingmag
294
49k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
2
1.3k
StorybookのUI Testing Handbookを読んだ
zakiyama
13
4.6k
GraphQLとの向き合い方2022年版
quramy
32
12k
jQuery: Nuts, Bolts and Bling
dougneiner
59
7.1k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
34
8.9k
How STYLIGHT went responsive
nonsquared
92
4.8k
The Power of CSS Pseudo Elements
geoffreycrofte
60
5k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]