Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
780
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
87
Other Decks in Programming
See All in Programming
猫と暮らす Google Nest Cam生活🐈 / WebRTC with Google Nest Cam
yutailang0119
0
120
チームのテスト力を総合的に鍛えて品質、スピード、レジリエンスを共立させる/Testing approach that improves quality, speed, and resilience
goyoki
5
880
A2A プロトコルを試してみる
azukiazusa1
2
1.4k
脱Riverpod?fqueryで考える、TanStack Queryライクなアーキテクチャの可能性
ostk0069
0
140
チームで開発し事業を加速するための"良い"設計の考え方 @ サポーターズCoLab 2025-07-08
agatan
1
420
iOS 26にアップデートすると実機でのHot Reloadができない?
umigishiaoi
0
130
Startups on Rails in Past, Present and Future–Irina Nazarova, RailsConf 2025
irinanazarova
0
110
「テストは愚直&&網羅的に書くほどよい」という誤解 / Test Smarter, Not Harder
munetoshi
0
170
VS Code Update for GitHub Copilot
74th
2
650
Agentic Coding: The Future of Software Development with Agents
mitsuhiko
0
100
#QiitaBash MCPのセキュリティ
ryosukedtomita
1
1.3k
AIと”コードの評価関数”を共有する / Share the "code evaluation function" with AI
euglena1215
1
170
Featured
See All Featured
Practical Orchestrator
shlominoach
189
11k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
Designing for humans not robots
tammielis
253
25k
Measuring & Analyzing Core Web Vitals
bluesmoon
7
510
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
950
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
50
5.5k
Java REST API Framework Comparison - PWX 2021
mraible
31
8.7k
The World Runs on Bad Software
bkeepers
PRO
69
11k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]