Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scanning is the new spinning - Digitization & Open Knowledge initiatives around you

Scanning is the new spinning - Digitization & Open Knowledge initiatives around you

#ServantsOfKnowledge initiative created to digitise rare, out of copyright books, magazines, periodicals, science literature and more has helped us bring a wide variety of Indic literature to public. Especially for Kannada language. Our decade long effort around making research possible for a common man on art, history, culture & literature through our initiatives at Sanchaya (https://sanchaya.org) has been uplifted by the new initiative. We have digitised more than 20K books in 14 languages in a short span of last 2+ years with limited resources and growing. Growing corpus of language content can result in a better linguistic projects and that is where we need more collaboration.

Talk highlight:

Current status of the digitization
Technical Challenges & Open Tech
Support from Community
Future plans
Source code/Reference: https://archive.org/details/ServantsOfKnowledge

Details: https://indiafoss.net/2022/cfp/submissions/scanning-is-the-new-spinning-digitization-open-knowledge-initiatives-around

Video Recording: https://www.youtube.com/watch?v=EGj5i1jfsGA&list=PLOGilj110olwC-o9Uzj8F0p3OlmLFw6K2&index=14

omshivaprakash

July 24, 2022
Tweet

More Decks by omshivaprakash

Other Decks in Education

Transcript

  1. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 SCANNING IS THE NEW SPINNING ಓಂ ಶಿ ವ ಪ್ರ ಕಾ ಶ್ - @Omshivaprakash DIGITIZATION & OPEN KNOWLEDGE INITIATIVES AROUND YOU
  2. “ ServantsOfKnowledge Initiative for digitizing Kannada & other indic languages

    by Omshivaprakash is licensed under CC BY 4.0 Scanning is the new spinning… - Carl Malamud - public.resource.org
  3. “ ServantsOfKnowledge Initiative for digitizing Kannada & other indic languages

    by Omshivaprakash is licensed under CC BY 4.0 Type a quote here. -Johnny Appleseed
  4. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 OUR DIGITIZATION WORKS
  5. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 RESULT - IT’S NOW SEARCHABLE & ACCESSIBLE * We are revisiting our literature * Learning about our authors * Learning about publishers from various regions * Learning about the lexicon, print history * And a never ending new possibilities opened up
  6. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 A QUICK HISTORY OF SOK Social connect -> Building a Community project -> Sourcing books via partnerships -> Metadata -> Scanning -> Collaboration {{cc-by-2.0}} {{ fl ickrreview|Nishkid64|18:22, 27 October 2007 (UTC)}} https://en.wikipedia.org/wiki/ Carl_Malamud American technologist, author, and public domain advocate, known for his foundation Public.Resource.Org
  7. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 POWER HORSE BEHIND THE PROJECT ➤ TTScribe by Internet Archive & INTERNET ARCHIVES BACKEND TEAM OUR SCAN ASSISTANTS
  8. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 OUR PARTNERSHIPS GREW WITH THE SUPPORT OF ORGS AND GENERAL PUBLIC ➤ We started scanning books in IASc Bengaluru since March 2019 ➤ 2+ years ➤ 20K Books & growing ➤ More than a dozen associations including authors/authors families, publishers ➤ 20+ magazines archived (Kasturi 50 years corpus) ➤ More than 60% local language books ➤ Scanned > 2.5million pages ➤ Kannada, English, Tamil, Malayalam, Konkani, Sindhi, Urdu and more… SERVANTS OF KNOWLEDGE INITIATIVE https://archive.org/details/ServantsOfKnowledge
  9. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 As we speak we scan 1Million pages a month @ NLSIU, Bengaluru To help visually challenged & researchers
  10. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 DIGITIZED CONTENT ACROSS FOLLOWING CATEGORIES ➤ Science ➤ History ➤ Arts ➤ Literature ➤ Classical Text ➤ Newspapers ➤ Magazines ➤ Dictionaries digital.sanchaya.net
  11. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0
  12. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 WHY I PURSUED THIS PROJECT FOR OPEN RESOURCES ➤ No veri fi able resources on Internet for my language ➤ Lack of Open content ➤ Lack of understanding about licensing & copyrights ➤ Need for standardisation ➤ Need for collaboration This limited my Wiki contributions at one stage… And I started chasing references through digitiaztion projects
  13. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 HOW WE ARE SOURCING BOOKS ➤ Individual contributions ➤ Collaborate with Authors/Publishers to release content under Creative Commons ➤ Source public domain content via libraries/universities ➤ Requesting weeded out books from libraries
  14. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 METADATA MANAGEMENT ➤ Ensure book info is added in both source language & English (Author, publisher, Title) ➤ Add the Copyright/Licensing / any contributions from the community members OCR & SEARCHABLE INDIC CONTENT Thanks to Sushant from IndiaKanoon https://github.com/sushant354/egazette
  15. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 CHALLENGES & NEXT STEPS ➤ Make use of the reference material largely for Indic wikis - Kannada to start with ➤ Contribute the metadata to Wikidata ➤ Finding PD books ➤ Finding Copyright holders ➤ Convincing to release under CC-by-SA ➤ Build revival fonts ➤ Testing & Training tesseract OCR engine to empower data harnessing for local languages ➤ Expansion in terms local tech, collaborations Find the local generous partners to fund the open knowledge initiatives
  16. “ ServantsOfKnowledge Initiative for digitizing Kannada & other indic languages

    by Omshivaprakash is licensed under CC BY 4.0 Servants Of Knowledge is currently operating 3 scanning centres * Bengaluru - National Law School India University Library & Jayanagar * Mangaluru - World Konkani Centre * Chennai - Roja Muthaiah Research Library We where here
  17. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 MY OTHER AFFILIATIONS SANCHAYA.ORG & SANCHIFOUNDATION.ORG ➤ We built projects to create content to preserve art, history & culture through audio & visual content - intent was to make a rich- reusable-archive for educational purpose. ➤ Released them under Creative Commons - cc-by-sa license on Youtube, Portal & Internet Archive (350+ hrs of visual content) ➤ Created partnerships with repertory such as Ninasam, Kamat Foundation etc. https://techfiz.com
  18. Scanning is the new Spinning - Digitization & Open Knowledge

    initiatives Around you Omshivaprakash is licensed under CC BY 4.0 OMSHIVAPRAKASH- WIKIPEDIAN, MOZILLAN, CC CONTRIBUTOR,, L10N, STARTUPS, FLOSS, LANGUAGE TECHNOLOGY - KANNADA, DIGITAL ARCHIVIST, OPEN DATA ETC. About myself