Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
35
Usability Testing Primo
ewlarson
0
93
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
AI 駆動開発ライフサイクル(AI-DLC):ソフトウェアエンジニアリングの再構築 / AI-DLC Introduction
kanamasa
11
3.6k
大規模Cloud Native環境におけるFalcoの運用
owlinux1000
0
190
0→1 フロントエンド開発 Tips🚀 #レバテックMeetup
bengo4com
0
340
公共交通オープンデータ × モバイルUX 複雑な運行情報を 『直感』に変換する技術
tinykitten
PRO
0
160
AtCoder Conference 2025「LLM時代のAHC」
imjk
2
570
ゆくKotlin くるRust
exoego
1
150
実は歴史的なアップデートだと思う AWS Interconnect - multicloud
maroon1st
0
250
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
280
認証・認可の基本を学ぼう後編
kouyuume
0
250
Cap'n Webについて
yusukebe
0
150
ローカルLLMを⽤いてコード補完を⾏う VSCode拡張機能を作ってみた
nearme_tech
PRO
0
150
C-Shared Buildで突破するAI Agent バックテストの壁
po3rin
0
410
Featured
See All Featured
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
680
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.3k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.2k
Leadership Guide Workshop - DevTernity 2021
reverentgeek
0
160
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.1k
brightonSEO & MeasureFest 2025 - Christian Goodrich - Winning strategies for Black Friday CRO & PPC
cargoodrich
2
64
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Writing Fast Ruby
sferik
630
62k
AI: The stuff that nobody shows you
jnunemaker
PRO
1
12
Applied NLP in the Age of Generative AI
inesmontani
PRO
3
2k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
2
2.7k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]