Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
800
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
40
Usability Testing Primo
ewlarson
0
97
Ruby Tips
ewlarson
0
120
9 Keys to Great UX
ewlarson
0
92
Other Decks in Programming
See All in Programming
正しくソフトウェアを作る、前提を疑うための認知の視点 / doubt-premise
minodriven
21
6.7k
Datadog × OpenTelemetry 入門と実践のあいだ
kn_to_maxpno
1
160
エンジニアと一緒にテストコードの設計と実装を改善した話
mototakatsu
0
200
Strategic Design in the Frontend: Moduliths & Micro Frontends @DDDEurope
manfredsteyer
PRO
0
110
A2UI という光を覗いてみる
satohjohn
1
140
AIで効率化できた業務・日常
ochtum
0
140
依存関係から依存物へ―Dependencyという言葉の歴史をひも解く
j_lee
0
120
Performance Engineering for Everyone
elenatanasoiu
0
170
気圧・高度・GPSを記録&可視化するアプリ「Koudo」を作った話
hjmkth
1
270
LLM本来の能力を解き放つサンドボックス技術とAI民主化への適用
yukukotani
3
4.3k
脅威をエンジニアリングの糧にして――現場編 / Turning Threats into Engineering Fuel — Field Edition
nrslib
0
280
Language Server 使ってる? 〜VSCode と Zed の場合〜 / Are you using a Language Server? ~For VS Code and Zed~
handlename
0
790
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
950
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
330
HDC tutorial
michielstock
2
720
Why Our Code Smells
bkeepers
PRO
340
58k
Mind Mapping
helmedeiros
PRO
1
250
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
3.5k
Rails Girls Zürich Keynote
gr2m
96
14k
svc-hook: hooking system calls on ARM64 by binary rewriting
retrage
2
300
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
850
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
The Language of Interfaces
destraynor
162
27k
The agentic SEO stack - context over prompts
schlessera
0
820
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]