Loves Walking and like to participate in marathon Bird Watching is my hobby (PS: Sarvam Models are named with bird names because of me) I talked here itself in 2023 in KeyValue office
with a passion for open source software. Machine Learning Engineer focusing on deep learning and fast.ai frameworks. Community & Travel Engaged in Malayalam computing and explored 10 Indian cities. Speaker & Volunteer Presented at PyCon India and contributes to AI4Bharat initiatives. (Prompt: create images to represent things like FOSS, ML, Malayalam computing, Walking, Ooty, Pune, 10cities travelling in India, fast.ai, Deep Learning, Sarvam, Startups, Python, Kaggle, bird watching, Language hero comes with pride)
my previous company I was benchmarking various ASR providers. Malayalam ASR Benchmarking I knew how to build a python package Learned nbdev made by Jeremy Howard, Hamel Hussain, Wasim Lograt etc. during fast.ai course, 2022 Frustration lead me to publish as a python package while doing Malayalam ASR Benchmarking project and giving talks on - OpenAI Whisper and it's amazing power to do finetuning.
Kochi FOSS today. I am presenting a talk today, at Key Value Systems along with Andrew from Hoppscotch and Renjith from Wikidata. ASR output = I am at Kochi Foods today. I am presenting a talk today at Key Value Systems along with Andrew from Hope's Coach and Ranjit from Wikidata. WER without normalization = 0.2 CER without normalization = 0.08759 WER with normalization = 0.2 CER with normalization = 0.067164
September 21, 2022 by releasing the inference code and pre-trained model weights. The Whisper normalizer is a text normalization tool and algorithm used in OpenAI's Whisper automatic speech recognition (ASR) system. Its main purpose is to standardize transcribed text so that formatting differences-such as punctuation, capitalization, or whitespace-do not unfairly penalize evaluation metrics like Word Error Rate (WER) and Character Error Rate (CER) . The normalization process makes it easier to compare transcriptions by ensuring that only genuine transcription errors are counted, not superficial formatting differences. Explain EnglishNormalizer Explain BasicTextNormalizer
agents and Speech in general exploeded in 2023, 2024 onwards Seeing increasing better Speech to Text models, Text to Speech models and Speech to Speech models etc. SEO in google because of which my python package comes when googling whisper normalizer or using perplexity. Perplexity AI What is whisper normalizer and how to use it The Whisper normalizer is a text normalization tool and algorithm used in OpenAI's Whisper automatic speech recognition (ASR) system. Its main&
Sometimes stuff you build for fun can be humbling.I worked on Malayalam Speech to Text for last year and as a byproduct I got this unexpected& 00:14 YouTube whisper_normalizer hits 500K+ downloads
Twitter / X Thank you @jeremyphoward, @HamelHusain and @wasimlorgat for creating nbdev. Many thanks to& Twitter Jeremy Howard on Twitte& Nice job!If this is a competition, then you're&
noticed a big bug whisper_normalizer is removing vowels as part of Basic Text Normalizer 2 Kavya and I tweet Inform the community via blogpost which Kavya wrote and tweets, that normalizer used in Meta's ASR paper, Assembly.ai, OpenAI etc are wrong 3 Kavya published a paper What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations Published in EMNLP 4 Both of us are trying to fix the issues Fixed the problem with normalizers written by Anoop Kunchukuttan and AI4Bharat Now there are normalizers like MalayalamNormalizer, HindiNormalizer in 9 Indian languages
@Meta @AssemblyAI not caring about this?Not going to tag famous audio folks there. It's good to atleast acknowledge these issues. https://t.co/CCckAAHuwG4 Kurian Benoy (@kurianbenoy2) May 8, 2024
This is a big bug.I was suspicious of this event and did benchmarking just to see if numbers hold up. Now my benchmarking was also wrong since it depended on this normalization https://t.co/ML4V1qUB154 Kurian Benoy & Twitter Kavya Manohar (?= ) on Twitter / X Loud Rant:I came to know that the surprisingly low WER in #whisper ASR for Malayalam reported in the @huggingface fine- tuning event last year was just because the evaluation script removed all the vowel signs before computing WER!!! ˋ And the&
into this, but looks like another story of some initial great results falling victim to evaluation issues.Kavya writes:During the Whisper fine-tuning event hosted by Hugging Face in December 2022, researchers and practitioners worldwi&
Bugs Quickly Prioritize issues reported by users for reliability. Add Features Implement enhancements based on community feedback. Like MalayalamNormalizer, updates in English Normalizer Update Dependencies Keep libraries current to ensure compatibility and security. Monitor Usage Track downloads and feedback to inform future development.
whisper_normali& Last year March, I created a python package called whisper_normalizer package with nbdev. I realized it was not possible to use the normalization& linkedin Release v0.1.0 · kurianbenoy/whisper_normalizer | Kurian Benoy Weekend Release Alert! Just shipped whisper_normalizer v0.1.0 / What's new: 1. Support for converting arabic numbers to Indic script in&