Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
37
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
CSC307 Lecture 10
javiergs
PRO
1
660
コマンドとリード間の連携に対する脅威分析フレームワーク
pandayumi
1
460
AI によるインシデント初動調査の自動化を行う AI インシデントコマンダーを作った話
azukiazusa1
1
740
AIによる高速開発をどう制御するか? ガードレール設置で開発速度と品質を両立させたチームの事例
tonkotsuboy_com
7
2.4k
Data-Centric Kaggle
isax1015
2
780
Package Management Learnings from Homebrew
mikemcquaid
0
230
AIエージェント、”どう作るか”で差は出るか? / AI Agents: Does the "How" Make a Difference?
rkaga
4
2k
それ、本当に安全? ファイルアップロードで見落としがちなセキュリティリスクと対策
penpeen
7
3.9k
「ブロックテーマでは再現できない」は本当か?
inc2734
0
1k
CSC307 Lecture 01
javiergs
PRO
0
690
ノイジーネイバー問題を解決する 公平なキューイング
occhi
0
110
20260127_試行錯誤の結晶を1冊に。著者が解説 先輩データサイエンティストからの指南書 / author's_commentary_ds_instructions_guide
nash_efp
1
980
Featured
See All Featured
A Modern Web Designer's Workflow
chriscoyier
698
190k
Design in an AI World
tapps
0
140
The Mindset for Success: Future Career Progression
greggifford
PRO
0
240
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
KATA
mclloyd
PRO
34
15k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.6k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
240
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
10
1.1k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.6k
Discover your Explorer Soul
emna__ayadi
2
1.1k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]