Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
680
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
23
Usability Testing Primo
ewlarson
0
77
Ruby Tips
ewlarson
0
100
9 Keys to Great UX
ewlarson
0
75
Other Decks in Programming
See All in Programming
incrementalモデルの理解を深める
ikkimiyazaki
2
630
[スクリプト] Swiftの型推論を学ぼう
omochi
0
100
Some Quick Ideas To Improve Your Tests ( #jassttokyo )
teyamagu
PRO
2
2.3k
ファイル先頭の use の意味、説明できますか? 〜PHP の namespace と autoloading の関係を正しく理解しよう〜 / namespace and autoloading in php
okashoi
2
470
PHP 8.3で追加されたjson_validate()を徹底的に深掘りしてみよう
mashirou1234
1
720
LLMチャットボットのアプリケーション設計Tips
os1ma
4
650
マイ隙間家具OSSたちのご紹介
karupanerura
2
150
Deep Dive 大規模システムアーキテクチャ/開発組織エンジニアリング / Deep Dive Large-Scale System Architecture, Development Organization Engineering
nrslib
15
2.9k
PHP8の機能を使って堅牢にコードを書く
fendo181
6
2.6k
Compiling Python to WebAssembly with py2wasm
syrusakbary
0
130
実践!RDRAを活用した既存システムの仕様変更 / Specification Changes in Existing Systems Utilizing RDRA
imamotohikaru
0
1.8k
とにかくHTTP3をライトニングに話す / Anyway, I'll talk to Lightning about HTTP3.
seike460
PRO
0
120
Featured
See All Featured
Happy Clients
brianwarren
91
6.3k
4 Signs Your Business is Dying
shpigford
174
21k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
272
12k
Principles of Awesome APIs and How to Build Them.
keavy
119
16k
Design by the Numbers
sachag
274
18k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
12
1.4k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
111
35k
Building Applications with DynamoDB
mza
88
5.6k
Into the Great Unknown - MozCon
thekraken
10
830
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
11
1.4k
Building a Scalable Design System with Sketch
lauravandoore
455
32k
Optimizing for Happiness
mojombo
369
69k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]