Slide 1

Slide 1 text

API Roulette: Betting on the Right Provider in the GenAI Gold Rush

Slide 2

Slide 2 text

Who I am ● CEO @Gladia (Speech-to-text) ● VP Data, AI and Quantum @OVHcloud ● CTO @Auchan Retail International ● Business Angel (40+ companies) ● Projects: ○ AI / ML: ■ Predictive maintenance 500K servers ■ Anti-Fraud & Security ■ Real-Time Bidding ■ Audio // NLP // Video // Translation // LLM ○ Other: ■ Big Data (a lot) ■ Quantum Computing (a lot) ● What I love to share: Science // Hardware

Slide 3

Slide 3 text

Powering 100K+ developers around the globe developing a robust AI platform designed enhance sales performance or call-center operations (CX) Accuracy made for the modern Enterprise ~ 1% word-error rate Performance tailored for high-velocity teams ~ 1mn for 1 hour audio file Pricing in-line with business plans avg. 30% less than main providers (Google STT, AWS Transcribe)

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Latency & Accuracy ✓ Gladia’s real-time and multilingual transcription boasts an industry- leading latency of under 300 milliseconds. ✓ We provide the same high level of accuracy in real-time as in batch processing, regardless of language.

Slide 6

Slide 6 text

1. Is the project Internal vs external: phone medical booking 2. Define your real goal: I want {WHO} to do {WHAT} in {ENVIRONMENT} using {DEVICE} from {WHEN_TO_WHEN} with an acceptable error of {ERROR_RATE} based on {CRITERIA}. If we have an error then we will do {WHAT}. Use case scoping

Slide 7

Slide 7 text

The WHO question will raise the limit of the target platform and raise potential biases in the inference. ● Color of skin ● Location / Origins ● Adult / Child ● Male / Female ● with Accent / no Accent ● specialist / not specialist Qualification - WHO - summary

Slide 8

Slide 8 text

The WHAT is defining the scope of the usage meaning your quality metric hence build vs buy strategy and your provider evaluation criteria. Examples: ● “to book a call”: we will need to ? ● “to diagnose cancer with image”: we will need to ? Qualification - WHAT - summary

Slide 9

Slide 9 text

The ENVIRONMENT - an AI can act perfectly in your lab / prod environment but not in the user context: Always these 4 elements: ● Quality ● Format ● Encoding and compressing ● Capture baises Qualification - ENVIRONMENT

Slide 10

Slide 10 text

The DEVICE : ● determines mode of failure and capture biases, ● defines if your users are able to handle streaming capability, ● prepares to handle the connection issues (maybe store locally to potential perform batch) and ● sets expected latencies. Qualification - DEVICE

Slide 11

Slide 11 text

The WHEN_TO_WHEN determines the hourly pattern of usage. Depending on the provider it can affect: ● latency: ○ pick hours: 8am => 5pm on a given cloud region ○ likely to make you fall in long queue ○ high dispersion around latency ● quality: some providers silently run slower quality models during pick hours Qualification - WHEN_TO_WHEN

Slide 12

Slide 12 text

The WHEN_TO_WHEN determines the hourly pattern of usage. Depending on the provider it can affect: ● cost and availability: ○ Providing an hourly usage pattern forecast can save you up to 30% in cost/negotiation. ○ Having committed forecasted demand over the year can save you up to 50% on the cloud bill. ○ if building yourself => auto-scaling strategy (save up to 50% - not compatible with committed forecasted demand) ○ make sure to select a provider/region that has capacity on your time zone and region (if latency is a subject) ○ you can play on regions hosting if you don’t care about latency (night in US region is day in Europe - > more capacity // better negotiation possible) Qualification - WHEN_TO_WHEN

Slide 13

Slide 13 text

Qualification - WHEN_TO_WHEN

Slide 14

Slide 14 text

The ERROR_RATE and CRITERIA determines your evaluation criteria of your provider product. Qualification - ERROR_RATE/CRITERIA /!\ You are not evaluating a provider, you are evaluating a result in your product. This is a common mistake company do: don’t trust benchmarks, always evaluate the end goal meaning evaluate on the data provided in the sentence we defined before that have: ● The appropriate WHO ● The appropriate WHAT ● The appropriate WHEN_TO_WHEN ● The appropriate ENVIRONMENT ● The appropriate DEVICE Repeat the process multiple time in an automated way reliability is often a big problem for AI APIs providers - put the provider under pressure. Always add failover provider

Slide 15

Slide 15 text

Measure everything (meaning you have a baseline ;-) ): don’t trust benchmarks Consider yourself as an independent provider (BUILD) just like the others (BUY): only the end users are right Conclusion Consider the full pipeline when choosing the provider you CAN SAVE A LOT on each steps and GAIN QUALITY Use FMECA for impartial decision making Always take a step back on the “Definition Sentence” (WHO, WHAT, WHEN, ETC…)

Slide 16

Slide 16 text

Sharing the “I want low latency” story How one of our customers died because of the wrong metric…

Slide 17

Slide 17 text

Thank you @gladiaio @jilijeanlouis https://gladia.io