Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
88
Other Decks in Programming
See All in Programming
アルテニア コンサル/ITエンジニア向け 採用ピッチ資料
altenir
0
110
ProxyによるWindow間RPC機構の構築
syumai
3
1.2k
複雑なドメインに挑む.pdf
yukisakai1225
5
1.2k
アセットのコンパイルについて
ojun9
0
130
Testing Trophyは叫ばない
toms74209200
0
890
GitHubとGitLabとAWS CodePipelineでCI/CDを組み比べてみた
satoshi256kbyte
4
240
CloudflareのChat Agent Starter Kitで簡単!AIチャットボット構築
syumai
2
500
プロパティベーステストによるUIテスト: LLMによるプロパティ定義生成でエッジケースを捉える
tetta_pdnt
0
3.3k
Tool Catalog Agent for Bedrock AgentCore Gateway
licux
7
2.5k
パッケージ設計の黒魔術/Kyoto.go#63
lufia
3
440
テストカバレッジ100%を10年続けて得られた学びと品質
mottyzzz
2
610
機能追加とリーダー業務の類似性
rinchoku
2
1.3k
Featured
See All Featured
The Art of Programming - Codeland 2020
erikaheidi
56
13k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.6k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
How to train your dragon (web standard)
notwaldorf
96
6.2k
Automating Front-end Workflow
addyosmani
1370
200k
Statistics for Hackers
jakevdp
799
220k
GitHub's CSS Performance
jonrohan
1032
460k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.2k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]