Slide 1

Slide 1 text

Creating APIs That Data Scientists Will Love Email newsletter -------->

Slide 2

Slide 2 text

“ To succeed in AI, first master APIs. — Ryan Day

Slide 3

Slide 3 text

Go deeper on today’s topic with my book Coming From O'Reilly Publishing in March 2025

Slide 4

Slide 4 text To get today’s slides Tip Sheet: APIs for AI & Data Science - sign up today

Slide 5

Slide 5 text

Walking in the shoes of a data scientist

Slide 6

Slide 6 text

Marianna Diamos, Los Angeles Times, CC BY 4.0 via Wikimedia Commons Marianna Diamos, Los Angeles Times, CC BY 4.0 via Wikimedia Commons Before you criticize a man, walk a mile in his shoes. That way, when you do criticize him, you'll be a mile away and have his shoes – Steve Martin

Slide 7

Slide 7 text

Source: Anaconda State of Data Science Report Jobs Data Scientists Do

Slide 8

Slide 8 text

(Python) Tools Data Scientists Use

Slide 9

Slide 9 text

Creating APIs that data scientists will love

Slide 10

Slide 10 text

Our demonstration project

Slide 11

Slide 11 text

1 - Provide an SDK for your API. 2 - Add standard external identifiers. 3 - Enforce data type definitions. 4 - Provide a method for bulk downloads. 5 - Support querying by last changed date. Tips to Make APIs for Data Scientists

Slide 12

Slide 12 text

SDKs: just give me the data Tip 1 - Provide a Software Development Kit

Slide 13

Slide 13 text

Solid Baseline - REST API, API keys, SDKs

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

SDK: the user’s perspective

Slide 16

Slide 16 text

Extra reference data to join other datasets. Tip 2 - Add standard external identifiers

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Swagger UI documentation

Slide 19

Slide 19 text

Get Player endpoint in Swagger UI documentation Internal Identifier

Slide 20

Slide 20 text

Standard external identifier: GSIS_ID External Identifier

Slide 21

Slide 21 text

Streamlit app - enriched with third-party data

Slide 22

Slide 22 text

Joining to Industry Data - nfldatapy

Slide 23

Slide 23 text

Give structures your users can count on Tip 3 - Enforce data type definitions in your API.

Slide 24

Slide 24 text

OpenAPI specification file

Slide 25

Slide 25 text

OpenAPI specification file - paths

Slide 26

Slide 26 text

OpenAPI specification file - schemas

Slide 27

Slide 27 text

Pydantic - enforcing definitions

Slide 28

Slide 28 text

The data, the whole data, and nothing but the data. Tip 4 - Provide a method for bulk downloads.

Slide 29

Slide 29 text

3 Reasons Data Scientists want Bulk Data 1. Exploratory data analysis (EDA) 2. Training data for an ML Model 3. Initial load of a data pipeline

Slide 30

Slide 30 text

Initial Load - Airflow

Slide 31

Slide 31 text

Just give me the deltas. Tip 5 - Support querying by last changed date.

Slide 32

Slide 32 text

Last Changed Date - User’s perspective

Slide 33

Slide 33 text

Incremental load- Airflow

Slide 34

Slide 34 text

Airflow code

Slide 35

Slide 35 text

1 - Provide an SDK for your API 2 - Add standard external identifiers. 3 - Enforce data type definitions. 4 - Provide a method for bulk downloads. 5 - Support querying by last changed date. Tips to Make APIs for Data Scientists

Slide 36

Slide 36 text Get more tips and tricks Tip Sheet: APIs for AI & Data Science - sign up today

Slide 37

Slide 37 text

Extra Slides

Slide 38

Slide 38 text

Source: Microsoft Team Data Science Process

Slide 39

Slide 39 text

A common format that is easy to consume Extra Tip - Return data in JSON format

Slide 40

Slide 40 text

Players endpoint in Swagger UI documentation

Slide 41

Slide 41 text

JSON data returned

Slide 42

Slide 42 text

This presentation template is free for everyone to use thanks to the following: Happy designing! for the presentation template for the photos Credits Pexels, Pixabay