Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
780
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
AIと”コードの評価関数”を共有する / Share the "code evaluation function" with AI
euglena1215
1
180
たった 1 枚の PHP ファイルで実装する MCP サーバ / MCP Server with Vanilla PHP
okashoi
1
300
システム成長を止めない!本番無停止テーブル移行の全貌
sakawe_ee
1
360
MDN Web Docs に日本語翻訳でコントリビュートしたくなる
ohmori_yusuke
1
130
AI Agent 時代のソフトウェア開発を支える AWS Cloud Development Kit (CDK)
konokenj
6
800
RailsGirls IZUMO スポンサーLT
16bitidol
0
200
Claude Code + Container Use と Cursor で作る ローカル並列開発環境のススメ / ccc local dev
kaelaela
12
7k
Git Sync を超える!OSS で実現する CDK Pull 型デプロイ / Deploying CDK with PipeCD in Pull-style
tkikuc
4
350
GPUを計算資源として使おう!
primenumber
1
250
LT 2025-06-30: プロダクトエンジニアの役割
yamamotok
0
870
Model Pollution
hschwentner
1
160
生成AI時代のコンポーネントライブラリの作り方
touyou
1
290
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
231
18k
The Straight Up "How To Draw Better" Workshop
denniskardys
235
140k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
Raft: Consensus for Rubyists
vanstee
140
7k
Docker and Python
trallard
45
3.5k
Designing Experiences People Love
moore
142
24k
KATA
mclloyd
30
14k
How to Ace a Technical Interview
jacobian
278
23k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
830
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]