Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
35
Usability Testing Primo
ewlarson
0
93
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
AIエンジニアリングのご紹介 / Introduction to AI Engineering
rkaga
2
1k
関数の挙動書き換える
takatofukui
4
770
dnx で実行できるコマンド、作ってみました
tomohisa
0
130
[SF Ruby Conf 2025] Rails X
palkan
0
420
Micro Frontendsで築いた 共通基盤と運用の試行錯誤 / Building a Shared Platform with Micro Frontends: Operational Learnings
kyntk
1
1.9k
手軽に積ん読を増やすには?/読みたい本と付き合うには?
o0h
PRO
1
140
なあ兄弟、 余白の意味を考えてから UI実装してくれ!
ktcryomm
10
11k
関数実行の裏側では何が起きているのか?
minop1205
1
440
CloudNative Days Winter 2025: 一週間で作る低レイヤコンテナランタイム
ternbusty
7
1.9k
無秩序からの脱却 / Emergence from chaos
nrslib
2
12k
CSC305 Lecture 17
javiergs
PRO
0
270
Microservices Platforms: When Team Topologies Meets Microservices Patterns
cer
PRO
1
900
Featured
See All Featured
Making Projects Easy
brettharned
120
6.5k
GraphQLとの向き合い方2022年版
quramy
49
14k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.3k
Being A Developer After 40
akosma
91
590k
RailsConf 2023
tenderlove
30
1.3k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.1k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Raft: Consensus for Rubyists
vanstee
140
7.2k
Designing for Performance
lara
610
69k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]