Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
33
Usability Testing Primo
ewlarson
0
90
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
88
Other Decks in Programming
See All in Programming
Laravel Boost 超入門
fire_arlo
2
190
Processing Gem ベースの、2D レトロゲームエンジンの開発
tokujiros
2
120
Honoアップデート 2025年夏
yusukebe
1
910
もうちょっといいRubyプロファイラを作りたい (2025)
osyoyu
0
240
複雑なドメインに挑む.pdf
yukisakai1225
5
960
奥深くて厄介な「改行」と仲良くなる20分
oguemon
1
380
旅行プランAIエージェント開発の裏側
ippo012
2
810
TDD 実践ミニトーク
contour_gara
1
280
Claude Codeで実装以外の開発フロー、どこまで自動化できるか?失敗と成功
ndadayo
4
1.9k
フロントエンドのmonorepo化と責務分離のリアーキテクト
kajitack
2
160
TanStack DB ~状態管理の新しい考え方~
bmthd
2
450
ECS初心者の仲間 – TUIツール「e1s」の紹介
keidarcy
0
150
Featured
See All Featured
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
The Cost Of JavaScript in 2023
addyosmani
53
8.9k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
Git: the NoSQL Database
bkeepers
PRO
431
66k
Speed Design
sergeychernyshev
32
1.1k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.4k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
31
2.2k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Reflections from 52 weeks, 52 projects
jeffersonlam
351
21k
Designing for humans not robots
tammielis
253
25k
Scaling GitHub
holman
463
140k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]