Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
38
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
CSC307 Lecture 13
javiergs
PRO
0
310
AI活用のコスパを最大化する方法
ochtum
0
120
米国のサイバーセキュリティタイムラインと見る Goの暗号パッケージの進化
tomtwinkle
2
430
Go 1.26でのsliceのメモリアロケーション最適化 / Go 1.26 リリースパーティ #go126party
mazrean
1
350
LangChain4jとは一味違うLangChain4j-CDI
kazumura
1
150
AI時代のソフトウェア開発でも「人が仕様を書く」から始めよう-医療IT現場での実践とこれから
koukimiura
0
130
The Ralph Wiggum Loop: First Principles of Autonomous Development
sembayui
0
3.7k
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
230
「やめとこ」がなくなった — 1月にZennを始めて22本書いた AI共創開発のリアル
atani14
0
350
encoding/json/v2のUnmarshalはこう変わった:内部実装で見る設計改善
kurakura0916
0
310
Go1.26 go fixをプロダクトに適用して困ったこと
kurakura0916
0
330
Codex の「自走力」を高める
yorifuji
0
600
Featured
See All Featured
Claude Code のすすめ
schroneko
67
220k
Code Review Best Practice
trishagee
74
20k
We Have a Design System, Now What?
morganepeng
55
8k
WCS-LA-2024
lcolladotor
0
470
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
74
The Cost Of JavaScript in 2023
addyosmani
55
9.7k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
110
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.4k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
210
Information Architects: The Missing Link in Design Systems
soysaucechin
0
810
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]