Upgrade to Pro — share decks privately, control downloads, hide ads and more …

apidays Paris 2024 - API Roulette - Betting on ...

apidays
December 31, 2024

apidays Paris 2024 - API Roulette - Betting on the Right Provider in the GenAI Gold Rush - Jean-Louis Quéguiner, Gladia

API Roulette: Betting on the Right Provider in the GenAI Gold Rush
Jean-Louis Quéguiner, CEO at Gladia

apidays Paris 2024 - The Future API Stack for Mass Innovation
December 3 - 5, 2024

------

Check out our conferences at https://www.apidays.global/

Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8

Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io

Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/

apidays

December 31, 2024
Tweet

More Decks by apidays

Other Decks in Programming

Transcript

  1. Who I am • CEO @Gladia (Speech-to-text) • VP Data,

    AI and Quantum @OVHcloud • CTO @Auchan Retail International • Business Angel (40+ companies) • Projects: ◦ AI / ML: ▪ Predictive maintenance 500K servers ▪ Anti-Fraud & Security ▪ Real-Time Bidding ▪ Audio // NLP // Video // Translation // LLM ◦ Other: ▪ Big Data (a lot) ▪ Quantum Computing (a lot) • What I love to share: Science // Hardware
  2. Powering 100K+ developers around the globe developing a robust AI

    platform designed enhance sales performance or call-center operations (CX) Accuracy made for the modern Enterprise ~ 1% word-error rate Performance tailored for high-velocity teams ~ 1mn for 1 hour audio file Pricing in-line with business plans avg. 30% less than main providers (Google STT, AWS Transcribe)
  3. Latency & Accuracy ✓ Gladia’s real-time and multilingual transcription boasts

    an industry- leading latency of under 300 milliseconds. ✓ We provide the same high level of accuracy in real-time as in batch processing, regardless of language.
  4. 1. Is the project Internal vs external: phone medical booking

    2. Define your real goal: I want {WHO} to do {WHAT} in {ENVIRONMENT} using {DEVICE} from {WHEN_TO_WHEN} with an acceptable error of {ERROR_RATE} based on {CRITERIA}. If we have an error then we will do {WHAT}. Use case scoping
  5. The WHO question will raise the limit of the target

    platform and raise potential biases in the inference. • Color of skin • Location / Origins • Adult / Child • Male / Female • with Accent / no Accent • specialist / not specialist Qualification - WHO - summary
  6. The WHAT is defining the scope of the usage meaning

    your quality metric hence build vs buy strategy and your provider evaluation criteria. Examples: • “to book a call”: we will need to ? • “to diagnose cancer with image”: we will need to ? Qualification - WHAT - summary
  7. The ENVIRONMENT - an AI can act perfectly in your

    lab / prod environment but not in the user context: Always these 4 elements: • Quality • Format • Encoding and compressing • Capture baises Qualification - ENVIRONMENT
  8. The DEVICE : • determines mode of failure and capture

    biases, • defines if your users are able to handle streaming capability, • prepares to handle the connection issues (maybe store locally to potential perform batch) and • sets expected latencies. Qualification - DEVICE
  9. The WHEN_TO_WHEN determines the hourly pattern of usage. Depending on

    the provider it can affect: • latency: ◦ pick hours: 8am => 5pm on a given cloud region ◦ likely to make you fall in long queue ◦ high dispersion around latency • quality: some providers silently run slower quality models during pick hours Qualification - WHEN_TO_WHEN
  10. The WHEN_TO_WHEN determines the hourly pattern of usage. Depending on

    the provider it can affect: • cost and availability: ◦ Providing an hourly usage pattern forecast can save you up to 30% in cost/negotiation. ◦ Having committed forecasted demand over the year can save you up to 50% on the cloud bill. ◦ if building yourself => auto-scaling strategy (save up to 50% - not compatible with committed forecasted demand) ◦ make sure to select a provider/region that has capacity on your time zone and region (if latency is a subject) ◦ you can play on regions hosting if you don’t care about latency (night in US region is day in Europe - > more capacity // better negotiation possible) Qualification - WHEN_TO_WHEN
  11. The ERROR_RATE and CRITERIA determines your evaluation criteria of your

    provider product. Qualification - ERROR_RATE/CRITERIA /!\ You are not evaluating a provider, you are evaluating a result in your product. This is a common mistake company do: don’t trust benchmarks, always evaluate the end goal meaning evaluate on the data provided in the sentence we defined before that have: • The appropriate WHO • The appropriate WHAT • The appropriate WHEN_TO_WHEN • The appropriate ENVIRONMENT • The appropriate DEVICE Repeat the process multiple time in an automated way reliability is often a big problem for AI APIs providers - put the provider under pressure. Always add failover provider
  12. Measure everything (meaning you have a baseline ;-) ): don’t

    trust benchmarks Consider yourself as an independent provider (BUILD) just like the others (BUY): only the end users are right Conclusion Consider the full pipeline when choosing the provider you CAN SAVE A LOT on each steps and GAIN QUALITY Use FMECA for impartial decision making Always take a step back on the “Definition Sentence” (WHO, WHAT, WHEN, ETC…)
  13. Sharing the “I want low latency” story How one of

    our customers died because of the wrong metric…