Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Finding images in book page images
Search
Eric Larson
February 07, 2012
Programming
4
790
Finding images in book page images
Code4Lib 2012 Lighting Talk Presentation
Eric Larson
February 07, 2012
Tweet
Share
More Decks by Eric Larson
See All by Eric Larson
Put Your Money Where The Mouse is: Tools and Techniques for Making Informed Design Decisions
ewlarson
0
37
Usability Testing Primo
ewlarson
0
94
Ruby Tips
ewlarson
0
110
9 Keys to Great UX
ewlarson
0
90
Other Decks in Programming
See All in Programming
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
2.5k
フロントエンド開発の勘所 -複数事業を経験して見えた判断軸の違い-
heimusu
7
2.8k
MDN Web Docs に日本語翻訳でコントリビュート
ohmori_yusuke
0
650
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
610
SourceGeneratorのススメ
htkym
0
200
AIによる開発の民主化を支える コンテキスト管理のこれまでとこれから
mulyu
3
470
生成AIを活用したソフトウェア開発ライフサイクル変革の現在値
hiroyukimori
PRO
0
100
24時間止められないシステムを守る-医療ITにおけるランサムウェア対策の実際
koukimiura
1
120
Claude Codeと2つの巻き戻し戦略 / Two Rewind Strategies with Claude Code
fruitriin
0
140
CSC307 Lecture 06
javiergs
PRO
0
690
16年目のピクシブ百科事典を支える最新の技術基盤 / The Modern Tech Stack Powering Pixiv Encyclopedia in its 16th Year
ahuglajbclajep
5
1k
疑似コードによるプロンプト記述、どのくらい正確に実行される?
kokuyouwind
0
390
Featured
See All Featured
Rails Girls Zürich Keynote
gr2m
96
14k
We Are The Robots
honzajavorek
0
170
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
60
42k
Statistics for Hackers
jakevdp
799
230k
Pawsitive SEO: Lessons from My Dog (and Many Mistakes) on Thriving as a Consultant in the Age of AI
davidcarrasco
0
67
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
110
Accessibility Awareness
sabderemane
0
56
Navigating Team Friction
lara
192
16k
Mobile First: as difficult as doing things right
swwweet
225
10k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.7k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
120
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.4k
Transcript
Finding images in book page images Eric Larson University of
Wisconsin-Madison Libraries
Warning Hobbyist code here. I’m certain there are better ways
to do this.
None
None
None
None
None
curl
None
imagemagick
Processing steps 1. Desaturate the image 2. Boost contrast 3.
Convert image to 1pixel wide x image height 4. Sharpen the image 5. Super-duper grayscale conversion 6. Produce the text color list 7. Look for continuous “black” blocks
None
convert
convert -colorspace Gray
None
convert \ -contrast -contrast \ -contrast -contrast \ -contrast -contrast
\ -contrast -contrast \
None
Convert image to 1px x height
None
Sharpen the image
Heavy-handed grayscale conversion => make most grays black => whites
are white
convert to txt
None
Look for long, continuous blocks of “black”
None
None
None
github.com ewlarson/picturepages
Don Quixote # (168/169) 99% Accurate http://openlibrary.org/books/OL24150024M/The_history_of_Don_Quixote
None
None
Paradise Lost # (54/54) 100% Accurate http://openlibrary.org/books/OL14022842M/Paradise_Lost
None
None
Around the World in Eighty Days # (60/62) 97% Accurate
http://openlibrary.org/books/OL7050533M/Around_the_world_in_eighty_days
None
None
Wanna help do this better? Contact me.
[email protected]