Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
38
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
Takumiから考えるSecurity_Maturity_Model.pdf
gessy0129
1
120
ふつうの Rubyist、ちいさなデバイス、大きな一年
bash0c7
0
580
エージェント開発初心者の僕がエージェントを作った話と今後やりたいこと
thasu0123
0
230
new(1.26) ← これすき / kamakura.go #8
utgwkk
0
1.7k
LangChain4jとは一味違うLangChain4j-CDI
kazumura
1
150
TipKitTips
ktcryomm
0
150
米国のサイバーセキュリティタイムラインと見る Goの暗号パッケージの進化
tomtwinkle
2
430
2026/02/04 AIキャラクター人格の実装論 口 調の模倣から、コンテキスト制御による 『思想』と『行動』の創発へ
sr2mg4
0
690
Claude Code の Skill で複雑な既存仕様をすっきり整理しよう
yuichirokato
1
300
Event Storming
hschwentner
3
1.3k
API Platformを活用したPHPによる本格的なWeb API開発 / api-platform-book-intro
ttskch
1
120
AIプロダクト時代のQAエンジニアに求められること
imtnd
2
710
Featured
See All Featured
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Producing Creativity
orderedlist
PRO
348
40k
A better future with KSS
kneath
240
18k
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
67
37k
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.1k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
210
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.1k
Mind Mapping
helmedeiros
PRO
1
110
Paper Plane (Part 1)
katiecoart
PRO
0
5.1k
Thoughts on Productivity
jonyablonski
75
5.1k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
450
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
80
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]