Having data analysis and visualization skills is increasingly important in the new age of Large Language Models and generative AI. But how does a non-Python developer skill up rapidly with the tools & best practices required to achieve project goals, without having the benefit of years of Python or data science experience? This is where the right developer tooling, with a little bit of AI assistance, can help.
In this talk, we'll go from identifying an open-source data set, to analyzing it for insights and visualizing relevant outcomes, in 25 minutes - with just a GitHub account and an OpenAI endpoint.
Along the way, we'll introduce you to a series of developer tools that make your journey easier:
- Open Dataset: to "analyze" - from Kaggle, Hugging Face, or Azure
- Data Wrangler: to "sanitize" data - extension from Visual Studio Code
- Jupyter Notebook: to "record" process - for transferable learning
- GitHub Codespaces: to "pre-build" environment - for consistent reuse
- GitHub Copilot: to "explain/fix" code - for focused learning with AI help
- Microsoft LIDA: to "suggest/build" visualization goals - for building your intuition with AI help
The talk comes with an associated repo that you can fork - then replace with your own dataset to extend or experiment on your own later. By the end of the talk you should have a sense of how you can go from discovering a data set to getting some visual insights about it, using existing tools with a little AI assistance.