Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
88
Other Decks in Programming
See All in Programming
あなたとKaigi on Rails / Kaigi on Rails + You
shimoju
0
170
組込みだけじゃない!TinyGo で始める無料クラウド開発入門
otakakot
1
340
kiroとCodexで最高のSpec駆動開発を!!数時間で web3ネイティブなミニゲームを作ってみたよ!
mashharuki
0
780
技術的負債の正体を知って向き合う
irof
0
210
Six and a half ridiculous things to do with Quarkus
hollycummins
0
200
20251016_Rails News ~Rails 8.1の足音を聴く~
morimorihoge
2
620
CSC305 Lecture 10
javiergs
PRO
0
170
CSC305 Lecture 09
javiergs
PRO
0
290
PHPに関数型の魂を宿す〜PHP 8.5 で実現する堅牢なコードとは〜 #phpcon_hiroshima / phpcon-hiroshima-2025
shogogg
1
310
フロントエンド開発のためのブラウザ組み込みAI入門
masashi
6
3.2k
Claude Agent SDK を使ってみよう
hyshu
0
1.3k
Devoxx BE 2025 Loom lab
josepaumard
0
110
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
10
600
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
Automating Front-end Workflow
addyosmani
1371
200k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.7k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
Embracing the Ebb and Flow
colly
88
4.9k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Statistics for Hackers
jakevdp
799
220k
The Power of CSS Pseudo Elements
geoffreycrofte
79
6k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
132
19k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.9k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]