Slide 1

Slide 1 text

Simplifying Data Analysis & Visualization with Developer Tools & AI Nitya Narasimhan, PhD Senior AI Advocate, Microsoft @nitya | #in/nityan

Slide 2

Slide 2 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools What We’ll Cover Today – Fork The Repo To Follow Along! https://aka.ms/workshops/python-data-analysis

Slide 3

Slide 3 text

Data Science Day 2024 | Nitya Narasimhan, PhD Data analysis – drives the ML models – that power AI algorithms Image Credit | Microsoft Learn must-have skill for a data scientist good-to-have skill for an AI developer trends show a shift left in the application lifecycle giving developers more responsibility in earlier stages of workflow its just fun to explore data and gain insights The Motivation – Why should I learn Data Analysis?

Slide 4

Slide 4 text

Data Science Day 2024 | Nitya Narasimhan, PhD KNOWLEDGE GAP I know what I don’t know but I can plan my journey I don’t know what I don’t know so how do I even start? Use Developer Tools Use AI Assistance INTUITION GAP The Challenge – What stops me from learning it?

Slide 5

Slide 5 text

Data Science Day 2024 | Nitya Narasimhan, PhD I lack data science & Python expertise I can’t do this! I want to learn how to do I have dev tools & AI assistance Where do I start? The Mindset – How can I cultivate my curiosity to learn? Tired Wired

Slide 6

Slide 6 text

Data Science Day 2024 | Nitya Narasimhan, PhD Help me get setup and productive quickly .. The Approach – how can I practice goal-oriented learning? Make it FRICTIONLESS Make progress towards goal without distractions Keep it FOCUSED Make it reproducible by others for collaboration Make it FRIENDLY ”hey - it works on my machine..” ”let me google this – should be quick” ”I don’t remember – let me explain it”

Slide 7

Slide 7 text

Data Science Day 2024 | Nitya Narasimhan, PhD Hands-on Workshop Let’s dive in

Slide 8

Slide 8 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Make It Frictionless – I need a consistent dev environment with easy setup Challenge 01

Slide 9

Slide 9 text

Data Science Day 2024 | Nitya Narasimhan, PhD A | Start Development with a suitable “Codespaces” Template This repo extends the codespaces-jupyter template from GitHub Exercise 01 See: https://aka.ms/workshops/python-data-analysis You’ll find updated exercises in the the `data-science-day- 2024` branch Uncheck this before you fork repo, to get all branches copied

Slide 10

Slide 10 text

Data Science Day 2024 | Nitya Narasimhan, PhD B | Fork the template & launch it with GitHub Codespaces (cloud) Exercise 01 See: https://aka.ms/workshops/python-data-analysis You can now launch a GitHub Codespaces instance directly from branch in the browser

Slide 11

Slide 11 text

Data Science Day 2024 | Nitya Narasimhan, PhD B | Clone the template, and launch it with Docker Desktop (local) Exercise 01 See: https://aka.ms/workshops/python-data-analysis Or you can clone it to your local device and open it in Visual Studio Code .. If you have the Dev Container extension installed and a Docker Desktop running, you should see this ..

Slide 12

Slide 12 text

Data Science Day 2024 | Nitya Narasimhan, PhD C | Get a pre-built dev environment that works the same for everyone Exercise 01 See: https://aka.ms/workshops/python-data-analysis Either way, you have a Visual Studio Code environment that is setup with a pre- built dev container with all dependencies installed for you – with no added effort.

Slide 13

Slide 13 text

Data Science Day 2024 | Nitya Narasimhan, PhD D | Learn How Dev Containers Work (Cloud vs. Local) Exercise 01 See: https://aka.ms/workshops/python-data-analysis The container runs a Visual Studio Code server that you can connect to from a Visual Studio Code client (browser or local) – with the repository being visible to both. Configuration as code - a devcontainer.json file defines the environment, is version controlled like any other file.

Slide 14

Slide 14 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Make It Friendly – I need a reproducible environment for easy collaboration Challenge 02

Slide 15

Slide 15 text

Data Science Day 2024 | Nitya Narasimhan, PhD A | “matplotlib” is the most popular library for 2D data visualizations Exercise 02 See: https://aka.ms/workshops/python-data-analysis Open the matplotlib example notebook & select a kernel using existing Python env.

Slide 16

Slide 16 text

Data Science Day 2024 | Nitya Narasimhan, PhD B | Learn to run, edit, extend – Jupyter Notebooks in this environment Exercise 02 See: https://aka.ms/workshops/python-data-analysis Run All – executes all code cells in selected Python 3.10.13 env No added effort in setup. Ability to add “markdown” cells for code to document it Modify code or data to explore ideas in an interactive way Share notebook with collaborators for contributions, debug

Slide 17

Slide 17 text

Data Science Day 2024 | Nitya Narasimhan, PhD C | Run the “pandas+matplotlib” example and intuit how it works Exercise 02 See: https://aka.ms/workshops/python-data-analysis Open the population example notebook and run it as before Note how pandas works by creating a df (data frame) from structured data (CSV) Then uses matplotlib to create a plot using data from 2 separate columns of that table

Slide 18

Slide 18 text

Data Science Day 2024 | Nitya Narasimhan, PhD D | Now copy the code cell, change data – and see if intuition holds up Exercise 02 See: https://aka.ms/workshops/python-data-analysis The good news is that I can modify code and experiment inline to learn by doing. The bad news is that I don’t understand what the code does – or if there are better options I could use. Let’s try a cricket data set (IPL 2022) shared on Kaggle by a fan

Slide 19

Slide 19 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Keep it Focused – I want to progress in my goal without getting distracted Challenge 03

Slide 20

Slide 20 text

Data Science Day 2024 | Nitya Narasimhan, PhD A | Install the GitHub Copilot Extension – activate inline AI assistance Exercise 03 See: https://aka.ms/workshops/python-data-analysis Add the extension to devcontainer.json if you want it installed by default in that env Github Copilot is a paid offering with a free trial to explore it.

Slide 21

Slide 21 text

Data Science Day 2024 | Nitya Narasimhan, PhD B | Use GitHub Copilot in Chat mode – create notebooks to learn Exercise 03 See: https://aka.ms/workshops/python-data-analysis Using inline mode sets context to that specific file context Using chat mode sets context to workspace with richer options Use chat mode to create a notebook to learn pandas usage

Slide 22

Slide 22 text

Data Science Day 2024 | Nitya Narasimhan, PhD C | Use GitHub Copilot inline – ask for explainers or fix errors in context Exercise 03 See: https://aka.ms/workshops/python-data-analysis The copilot-created notebook has errors but is a good starter for learning phases. Ask copilot chat to explain bug. See how it references the file. Ask copilot inline to fix the bug. See how it gives you a choice.

Slide 23

Slide 23 text

Data Science Day 2024 | Nitya Narasimhan, PhD E | Explore GitHub Copilot suggestions – fill knowledge & intuition gaps Exercise 03 See: https://aka.ms/workshops/python-data-analysis Use suggested next prompts to fill in knowledge gaps – without losing focus Stop if knowledge gap is filled. Pivot from asking to doing Instead of “googling” and falling into a rabbit hole of search results, use Copilot as a contextual question-answer system that keeps you inside the development environment – and ties responses to code references Build intuition by trying suggestions and learning to make connections between code and outcomes (success or failure)

Slide 24

Slide 24 text

Data Science Day 2024 | Nitya Narasimhan, PhD F | Explore GitHub Copilot for goal-oriented task – prompt engineering Exercise 03 See: https://aka.ms/workshops/python-data-analysis Open a new code cell and write a prompt to get your task done Refine the prompt interactively to move closer to desired goal Use suggestions to refactor code, goals. Add markdown to recall insights later

Slide 25

Slide 25 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Make It Friendly – I need a reproducible environment for easy collaboration Challenge 04

Slide 26

Slide 26 text

Data Science Day 2024 | Nitya Narasimhan, PhD A | Use Visual Studio Code Profiles – Customize editor for productivity Exercise 04 See: https://aka.ms/workshops/python-data-analysis Start with the Data Science Profile to get popular extensions.

Slide 27

Slide 27 text

Data Science Day 2024 | Nitya Narasimhan, PhD B | Activate Data Wrangler Extension – view & edit for data cleaning Exercise 04 See: https://aka.ms/workshops/python-data-analysis

Slide 28

Slide 28 text

Data Science Day 2024 | Nitya Narasimhan, PhD C | Editing in Data Wrangler Extension – operations auto-generate code Exercise 04 See: https://aka.ms/workshops/python-data-analysis

Slide 29

Slide 29 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Make It Frictionless – I have the tools & environment. How about the data? Challenge 05

Slide 30

Slide 30 text

Data Science Day 2024 | Nitya Narasimhan, PhD A | Explore Kaggle Datasets – use community notebooks for inspiration Exercise 05 See: https://aka.ms/workshops/python-data-analysis IPL 2022 Dataset Open Data Commons License downloaded Oct 2023 Example EDA on Kaggle. Find a dataset in a domain of interest (ideas for insights) Find EDA examples from the community to get intuition on how to explore data Find ML model examples based on that dataset, to learn new libraries (sklearn) https://www.kaggle.com/code/coolboyraghu/ipl-score-prediction

Slide 31

Slide 31 text

Data Science Day 2024 | Nitya Narasimhan, PhD B | Explore Hugging Face – curated datasets for Deep Learning models Exercise 05 See: https://huggingface.co/docs/datasets/index Find datasets for new tasks and explore new libraries and tutorials

Slide 32

Slide 32 text

Data Science Day 2024 | Nitya Narasimhan, PhD C | Explore Azure Open Datasets – curated datasets for many domains Exercise 05 See: https://learn.microsoft.com/azure/open-datasets/ Expand your understanding from community curated dataset to big data mindset

Slide 33

Slide 33 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Make It Friendly – Debug models and decision-making for responsible AI Challenge 06

Slide 34

Slide 34 text

Data Science Day 2024 | Nitya Narasimhan, PhD A | Explore Responsible AI Toolkit – one notebook example at a time! Exercise 06 See: https://responsibleaitoolbox.ai/ Model Debugging Decision-Making Let's imagine that the diabetes progression scores predicted by the model are used to determine medical insurance rates. If the score is greater than 120, there is a higher rate. Patient 43's model score of 268.08 results in this increased rate, and they want to know how they should change their health to get a lower rate prediction from the model (leading to lower insurance price). The What-If counterfactuals component shows how slightly different feature values affect model predictions. This can be used to solve Patient 43's problem. https://github.com/microsoft/responsible-ai-toolbox/tree/main/notebooks/responsibleaidashboard/tabular

Slide 35

Slide 35 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Keep it focused– I need to build my intuition but I don’t know where to start Challenge 07

Slide 36

Slide 36 text

Data Science Day 2024 | Nitya Narasimhan, PhD LIDA is a library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface). Research Paper: https://arxiv.org/abs/2303.02927 Explore Project LIDA – Visualize Data using LLM & Natural Language Prompts Exercise 07 See: https://aka.ms/lida/org

Slide 37

Slide 37 text

Data Science Day 2024 | Nitya Narasimhan, PhD A | Add Your Open AI Key – Codespaces Secret vs. Local Env Variable Exercise 07

Slide 38

Slide 38 text

Data Science Day 2024 | Nitya Narasimhan, PhD B | Ask LIDA to generate a summary of the dataset Exercise 07 Using older screenshots (vs. live demo) given issues in OpenAI today

Slide 39

Slide 39 text

Data Science Day 2024 | Nitya Narasimhan, PhD C | Generate Goals for me from the data – build intuition on what, how Exercise 07

Slide 40

Slide 40 text

Data Science Day 2024 | Nitya Narasimhan, PhD D | Generate goals for me – but make them customized to my persona Exercise 07

Slide 41

Slide 41 text

Data Science Day 2024 | Nitya Narasimhan, PhD E | Show me different ways to visualize the data – for the same goal Exercise 07

Slide 42

Slide 42 text

Data Science Day 2024 | Nitya Narasimhan, PhD GitHub Copilot can do this too .. But you can vary the parameters and customize base prompt here programmatically It’s open source so you can do more if needed F | Ask questions of the data in natural language – and get visualizations Exercise 07

Slide 43

Slide 43 text

Data Science Day 2024 | Nitya Narasimhan, PhD Provides flexibility for trial-and-error experiments to build intuition. G | Prompt Engineering works in user queries Exercise 07

Slide 44

Slide 44 text

Data Science Day 2024 | Nitya Narasimhan, PhD lida/components/viz/vizexplainer.py system_prompt = """ You are a helpful assistant highly skilled in providing helpful, structured explanations of visualization of the plot(data: pd.DataFrame) method in the provided code. You divide the code into sections and provide a description of each section and an explanation. The first section should be named "accessibility" and describe the physical appearance of the chart (colors, chart type etc), the goal of the chart, as well the main insights from the chart. You can explain code across the following 3 dimensions: 1. accessibility: the physical appearance of the chart (colors, chart type etc), the goal of the chart, as well the main insights from the chart. 2. transformation: This should describe the section of the code that applies any kind of data transformation (filtering, aggregation, grouping, null value handling etc) 3. visualization: step by step description of the code that creates or modifies the presented visualization. """ H | Get Explanations For Decisions – Understand why this visualization Exercise 07

Slide 45

Slide 45 text

Data Science Day 2024 | Nitya Narasimhan, PhD I | Get Recommendations for Visualizations relevant to your dataset Exercise 07

Slide 46

Slide 46 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Make the Paradigm Shift – From MLOps to LLM Ops and Generative AI Challenge 09

Slide 47

Slide 47 text

Data Science Day 2024 | Nitya Narasimhan, PhD Data analysis – drives the ML models – that power AI algorithms Image Credit | Microsoft Learn must-have skill for a data scientist good-to-have skill for an AI developer trends show a shift left in the application lifecycle giving developers more responsibility in earlier stages of workflow its just fun to explore data and gain insights Closing the Loop ..

Slide 48

Slide 48 text

Data Science Day 2024 | Nitya Narasimhan, PhD ML Ops - App lifecycle for developing Predictive AI Image Credit | Microsoft Learn

Slide 49

Slide 49 text

Data Science Day 2024 | Nitya Narasimhan, PhD LLM Ops - App lifecycle for developing Generative AI Image Credit | Microsoft Learn

Slide 50

Slide 50 text

Data Science Day 2024 | Nitya Narasimhan, PhD Azure AI Week : https://aka.ms/ai-studio/intelligent-apps Image Credit | Microsoft Learn Azure AI Week : https://aka.ms/ai-studio/intelligent-apps

Slide 51

Slide 51 text

Data Science Day 2024 | Nitya Narasimhan, PhD 011 Setup a consistent and reusable dev environment using GitHub Codespaces | Exercise Instantiate the Codespaces- Jupyter template & launch it 021 Explore Jupyter notebooks for data science & machine learning examples | Exercise Validate ability to run Jupyter notebooks without added effort 031 Add GitHub Copilot extension. Explore use to create notebooks and explain examples | Exercise Create notebooks, learn Python data structures & visualization 041 Use a Visual Studio Code Data Science profile & extensions in your devcontainer | Exercise Complete the VS Code datasci tutorial, explore Data Wrangler 051 Explore open datasets (curated & shared by the ML community) to start exploration | Exercise Load & explore dataset from Hugging Face, Kaggle, Azure 061 Understand principles of responsible AI and use toolbox to train & debug your model | Exercise Explore text or tabular data & model from Hugging Face 071 Explore LLM-based data visualization with Microsoft LIDA for intuition, suggestions | Exercise Use natural language to get goals, visualizations & refine 081 Make the paradigm shift from ML Ops to LLM Ops (predictive to generative AI apps) | Exercise Explore the Azure AI Studio (UI & SDK) capabilities 091 Customize & extend the template to suit your learning needs and share feedback! | Exercise Pick a different open dataset and try these steps yourself 101 Related resources for self-guided learners to continue their journey. Thank you! Q&A | Exercise See #14DaysOf DataScience posts on Developer Tools https://aka.ms/workshops/python-data-analysis Make It Friendly – I need a reproducible environment for easy collaboration Challenge 04

Slide 52

Slide 52 text

Data Science Day 2024 | Nitya Narasimhan, PhD Help me get setup and productive quickly .. The Approach – how can I practice goal-oriented learning? Make it FRICTIONLESS Make progress towards goal without distractions Keep it FOCUSED Make it reproducible by others for collaboration Make it FRIENDLY Dev Containers GitHub Codespaces GitHub Copilot Microsoft LIDA Jupyter Notebooks VS Code Profiles

Slide 53

Slide 53 text

Data Science Day 2024 | Nitya Narasimhan, PhD 1 | Introduction – Data Analysis Challenges & Goals 2 | GitHub Codespaces – Reusable environments 3 | Visual Studio Code – Productivity extensions 4 | GitHub Copilot – AI-assisted learning 5 | Open Datasets – Community-inspired exploration 6 | Responsible AI – Model debugging for fairness 7 | Project LIDA – AI-assisted intuition & visualization 8 | Azure AI Studio – Paradigm shift to LLM Ops 9 | Summary – Questions & Next Steps

Slide 54

Slide 54 text

Data Science Day 2024 | Nitya Narasimhan, PhD Week 2: Developer Tools 16/3 | GitHub Codespaces 17/3 | Visual Studio Code 18/3 | GitHub Copilot 19/3 | Open Datasets 20/3 | Responsible AI 21/3 | Project LIDA 22/3 | Azure AI Studio #14DaysOfDataScience Browse The Posts : https://aka.ms/2024/data-science-recipes

Slide 55

Slide 55 text

Data Science Day 2024 | Nitya Narasimhan, PhD Data Science Collection 1. Data Science Foundations 2. Cloud Skills Challenge 3. Resoponsible AI 4. Data Science Curriculum 5. Data Science Handbook 6. Hugging Face Datasets 7. Kaggle Online Courses Will be updated regularly Bookmark Collection: https://aka.ms/2024-datasci-collection