Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
37
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
2年のAppleウォレットパス開発の振り返り
muno92
PRO
0
180
コントリビューターによるDenoのすゝめ / Deno Recommendations by a Contributor
petamoriken
0
160
なぜSQLはAIぽく見えるのか/why does SQL look AI like
florets1
0
250
Grafana:建立系統全知視角的捷徑
blueswen
0
290
ThorVG Viewer In VS Code
nors
0
690
AIの誤りが許されない業務システムにおいて“信頼されるAI” を目指す / building-trusted-ai-systems
yuya4
7
4.4k
Findy AI+の開発、運用におけるMCP活用事例
starfish719
0
2.2k
20251212 AI 時代的 Legacy Code 營救術 2025 WebConf
mouson
0
250
KIKI_MBSD Cybersecurity Challenges 2025
ikema
0
150
AtCoder Conference 2025
shindannin
0
950
AI Agent Tool のためのバックエンドアーキテクチャを考える #encraft
izumin5210
6
1.6k
はじめてのカスタムエージェント【GitHub Copilot Agent Mode編】
satoshi256kbyte
0
170
Featured
See All Featured
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.1k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
99
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
1.8k
The Curse of the Amulet
leimatthew05
0
7.3k
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
130
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
115
100k
Optimizing for Happiness
mojombo
379
70k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
200
Testing 201, or: Great Expectations
jmmastey
46
7.9k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Become a Pro
speakerdeck
PRO
31
5.8k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]