Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
35
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
안드로이드 9년차 개발자, 프론트엔드 주니어로 커리어 리셋하기
maryang
1
130
Pythonではじめるオープンデータ分析〜書籍の紹介と書籍で紹介しきれなかった事例の紹介〜
welliving
2
570
メルカリのリーダビリティチームが取り組む、AI時代のスケーラブルな品質文化
cloverrose
2
370
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
290
AI時代を生き抜く 新卒エンジニアの生きる道
coconala_engineer
1
420
SwiftUIで本格音ゲー実装してみた
hypebeans
0
490
生成AI時代を勝ち抜くエンジニア組織マネジメント
coconala_engineer
0
1k
クラウドに依存しないS3を使った開発術
simesaba80
0
160
gunshi
kazupon
1
120
ZJIT: The Ruby 4 JIT Compiler / Ruby Release 30th Anniversary Party
k0kubun
0
270
tparseでgo testの出力を見やすくする
utgwkk
2
280
Denoのセキュリティに関する仕組みの紹介 (toranoana.deno #23)
uki00a
0
160
Featured
See All Featured
Speed Design
sergeychernyshev
33
1.4k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.3k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.3k
A Soul's Torment
seathinner
1
2k
What does AI have to do with Human Rights?
axbom
PRO
0
1.9k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.8k
Amusing Abliteration
ianozsvald
0
69
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
2
2.8k
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
0
300
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
90
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.2k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
66
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]