Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Conserving Linguistic Heritage the FOSS way...

Conserving Linguistic Heritage the FOSS way...

Presentation on our work around http://vachana.sanchaya.net to digitize and build linguistic research tool for Kannada. Presented at Swatantra 2014 - Fifth International Free Software Conference, Kerala on 19th December 2014. (Remote/Recorded)

Event Page: http://icfoss.org/fs2014/program_details.html#Wikipedia/Wikimedia

Youtube Link to video: http://youtu.be/DS8o6Jn5ToI

omshivaprakash

December 19, 2014
Tweet

More Decks by omshivaprakash

Other Decks in Technology

Transcript

  1. Hello! I am Omshivaprakash I’m a Bengaluru based Wikimedian and

    a FOSS contributor. I’m here to share my experience helping reuse/conserve the linguistic heritage of Kannada the FOSS way!
  2. ‘’ We need to be able to research on Vachana

    Sahitya. We should be able to search Vachana’s on the NET. We need data to understand Sahitya much better. - Sri OL Nagabhushana Swamy - Sri Vasudendra
  3. Challenges ▣ ANSI Data available on GoK Website ▣ GOK

    website not being intuitive ▣ 15 large volumes Printed Books + others ▣ No real tool to analyze the data at fingertips ▣ Hot discussions on public forums needed concordance & numerical data to debate on literature Researches wanted data authentically come to consensus via research… but how?
  4. Digitize in Unicode Idea was to get hands on the

    digitized data in a reusable format & in Unicode
  5. Scrape We found that the data was available in digital

    format on GoK website http: //vachanasahitya.gov.in but in ANSI format. We pulled the data with wget and write a python script to systematically extract data and converted the text to Unicode. ALL IN FLAT FILES Getting to work on data But... It was not really enough. How does anyone take all the text in files and do research? We proposed to push this to a database and provide simple GUI tools to search text to look at results.
  6. more challenges... Technical difficulties Providing the end results to large

    number of people. Making them understand to use the tools such as MySQL WorkBench/ SQLite Manager etc... Awareness Text input methods SQL syntax OS compatibility Expanding scope What about other research requirements? How many queries we can write and keep sharing with the linguists not the computer savvy people?
  7. An opportunity to build something For language that is close

    to our heart with few like minded people around over a cup of coffee, during weekends, whenever we have sometime to scribble through the need of our people… IT WAS FUN...
  8. To unearth the wealth of literature ▣ by reading and

    searching through 21 thousand Vachana’s ▣ written by 250 Vachanakaara’s ▣ Researching in finger tips via Concordance & quick visualizations ▣ Building corpus of 2lac+ unique words ▣ Building biodata of all male & female vachanakaaras ▣ enabling crowd sourced review solution ▣ opening up new possibilities for Linguistic research across other literary work of Kannada.
  9. FOSS All because of the FOSS tools around us and

    its philosophy that we believed in...
  10. Rails, Nginx, Passenger, Memcached, MySQL, Python, Gitlab, wordpress & more...

    Only server cost to keep it running Localized & being adopted to other projects too... It is being reviewed to be contributed to Wiki Source & Wikipedia
  11. Moving forward Bring more literary works online Standardize Research platform

    for language Create timeline for Centuries of Heritage
  12. How we are planning to do this? Collaboration Enable community

    collaboration to build research documents around our literary heritage Engage Engage students and others to work together on our code to build robust and futuristic tools for all type of literary works(Text, Poems, Old Kannada) etc Evolve Evolve over period of time, adopt learnings from mistakes, reviews and feedbacks Consult with communities We would like to consult and learn from multiple language communities. Because Vachana Sahitya is translated to more than 15 languages & more Keep tweaking We keep working on tweaking the tool and make it robust to be used as a platform for our upcoming projects Reaching goals We are determined to reach our goal of building unified search tool with timeline for centuries of Kannada Literature the FOSS way...
  13. We are on Social Media - FB/Twitter/Google+ Embed us on

    Wordpress via Plugin We will be on Mobile Soon… We are opening up APIs to reuse data or build tools around Kannada literature Adding English and other translated works too.... There is lot more to share So, Keep in touch!!!
  14. Thanks! Any questions? You can find me at: Kn/En Wiki:

    User:Omshivaprakash Project Page: http://vachana.sanchaya.net Main Project: http://kannada.sanchaya.net @omshivaprakash | @vachanasanchaya
  15. Credits Special thanks to all the people who made and

    released these awesome resources for free: ▣ Team photo by Amit Mrugvadhe ▣ To my team for having made this possible ▣ Minicons by Webalys ▣ Presentation template by SlidesCarnival ▣ Photographs by Unsplash