Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PandasAI is All You Need: Experience Interactiv...

PandasAI is All You Need: Experience Interactive Data Analysis

I presented at PyCon APAC 2025.
URL:https://pretalx.com/pycon-apac-2025/talk/VAFRRL/
Contents:
This session introduces PandasAI, a Python library that leverages large language models to streamline data tasks from processing and cleaning to visualization and feature creation through conversational interfaces.

You will learn how PandasAI simplifies workflows by allowing you to query your data and generate analyses without diving deep into complicated code. We will explore real-world examples, discuss best practices, and address potential challenges. By the end of this session, you will have a clear understanding of how to apply conversational data analysis to your projects, making your data work more intuitive and efficient.

Are you ready to experience a paradigm shift in data analysis brought by generative AI? Instead of writing complex analytical code, imagine interacting with your data in plain natural language.

negi111111

March 05, 2025
Tweet

More Decks by negi111111

Other Decks in Technology

Transcript

  1. PandasAI is All You Need: Experience Interactive Data Analysis Ryosuke

    Tanno, Yo Nakamura NTT Communications Corporation 2025/03/02 Ateneo de Manila University @ MANILA
  2. © NTT Communications Corporation All Rights Reserved. 2 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Today's Goal • You know what PandasAI is. ◦ Search, analysis, and visualization using natural language (Pandas + ChatGPT) • Understanding the library pandasai • Understanding interactive data analysis • Understanding LLM Cases in Data Analysis
  3. © NTT Communications Corporation All Rights Reserved. 3 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Agenda • Introduction & Context • PandasAI Overview & Key Features • Tips, Best Practices • Wrap-up
  4. © NTT Communications Corporation All Rights Reserved. 4 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Agenda • Introduction & Context • PandasAI Overview & Key Features • Tips, Best Practices • Wrap-up
  5. © NTT Communications Corporation All Rights Reserved. 5 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme What is Data Science? • Data Science is a field combining statistical methods, computational techniques, and domain expertise. • It involves visualizing data, applying statistical analysis, machine learning, and communicating insights effectively. • Collaboration with domain experts is important for solving real-world problems Statistical Human Computatio nal Source Science and data science: https://www.pnas.org/content/114/33/8689
  6. © NTT Communications Corporation All Rights Reserved. 6 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Overall process of data analysis Cross-Industry Standard Process for Data Mining (CRISP-DM) → Methodology for data mining 1. Understanding the business a. setting analysis objectives and goals 2. Understanding data a. understanding the meaning of each item 3. Data preparation a. data cleansing & basic tabulation 4. Model creation a. algorithm selection & model creation 5. Evaluation a. model evaluation & evaluation of results against objectives 6. Deployment a. measure design/proposal
  7. © NTT Communications Corporation All Rights Reserved. 7 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Common Challenges in Data Analysis • Limited Technical Skills • Difficulty using SQL or advanced analytical tools • Dependence on technical experts • Time Constraints • Urgent need for fast insights • Slow turnaround with traditional analytical methods • Communication Barriers • Gap between data teams and business decision-makers • Challenges translating business questions into analytical queries
  8. © NTT Communications Corporation All Rights Reserved. 8 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Suddenly...U are being transferred to the data analysis department." BACKGROUND SETTING: "You are a field representative for a data analytics consulting firm. You have been assigned as an analyst for a data analysis project in the retail industry. You need to work with the CEO of the client. You have one hour to answer the following questions from the client." Boss <"Sorry to be so quick, but a client is asking me to answer the following, I hear you are a Python specialist, if you can use Python, does that mean you can do analysis as well? Best regards!" Subject: I want to divert the know-how of a store that is performing well to other stores. • Which store has the largest sales? • Which stores have large fluctuations in sales? • Which stores have the highest growth rates?
  9. © NTT Communications Corporation All Rights Reserved. 9 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Architecture using LLM for data analysis infrastructure Source: https://mindfulgeek.substack.com/p/enable-safe-chat-with-your-databases Data Engineer < This is the architecture of the data analysis infrastructure. I made it so that you can analyze data interactively using LLM even if you don't have analytical soft skills. Don't worry!
  10. © NTT Communications Corporation All Rights Reserved. 10 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme What makes you happy about interactive data analysis using natural language? Problem 1: SQL and other skills are required to access DB and explore data Problem 2: Need to have the skills to analysis the data to derive the necessary results from the data (data analysis). Problem 1 
 Problem 2 
 Well... with LLMs, that’s possible! -> PandasAI 

  11. © NTT Communications Corporation All Rights Reserved. 11 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Why Natural Language Analysis Matters? Natural language analysis helps with: • Speed: Quick responses to business queries • Empowerment: Let non-technical teams run their own analyses • Better Communication: Reduce misunderstandings between data teams and decision-makers
  12. © NTT Communications Corporation All Rights Reserved. 12 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme It means you can do this. Pretty convenient, right? It means you can do this. Pretty convenient, right? 売上が最も大きいのはどの店ですか? 
 Which store has the largest sales? 
 Aling tindahan ang may 
 pinakamalaking benta? 

  13. © NTT Communications Corporation All Rights Reserved. 13 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Agenda • Introduction & Context • PandasAI Overview & Key Features • Tips, Best Practices • Wrap-up
  14. © NTT Communications Corporation All Rights Reserved. 14 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme What is Pandas? • Efficient data manipulation on tabular data structures ◦ If the data can be handled by csv or Excel, the first step is df: pd.DataFrame = pd.read_csv("hoge.csv") ◦ Fast and efficient data manipulation via DataFrame objects Internally, Matplotlib functions can be used to create scatter plots, bar charts, box plots, ... etc. with just a .plot() Intuitive data extraction, including selection and filtering of specific rows and columns age_sex = titanic[["Age", "Sex"]] Source: https://pandas.pydata.org/docs/getting_started/index.html#getting-started
  15. © NTT Communications Corporation All Rights Reserved. 15 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme What is PandasAI? • Capability to search, analyze and visualize using natural language (Pandas + ChatGPT) ◦ Can extract information from DataFrame and draw graphs using natural language ◦ OSS publicly available, so you can use it with OpenAI for free if you have an API key. Source: https://pandas.pydata.org/docs/getting_started/index.html#getting-started The library wraps up the process by "using a generative AI model to understand and interpret natural language queries, transforming them into Python code or SQL queries, and then using that code to manipulate the data and return the results to the user. Wraps up the process of manipulating data using that code and returning the results to the user.
  16. © NTT Communications Corporation All Rights Reserved. 16 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Summary of PandasAI Features • Natural Language Queries: Questioning data in natural language • Data visualization: visualize data by creating graphs and charts • Data cleansing: cleansing the data set by addressing missing values • Feature generation: improve data quality • Data connectors: CSV, XLSX, PostgreSQL, MySQL, BigQuery, Databrick, Snowflake, etc. Source: https://pandas-ai.com/
  17. © NTT Communications Corporation All Rights Reserved. 17 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Three main components to understand PandasAI • SmartDataframe ◦ For single-table interactions • SmartDatalake ◦ For multi-table or multi-source data • Agent ◦ Manages the conversation state and context across multiple queries Source: https://mindfulgeek.substack.com/p/enable-safe-chat-with-your-databases I'll explain in turn.z Pandas
 DataFrame 

  18. © NTT Communications Corporation All Rights Reserved. 18 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme SmartDataframe • When operating on a single data frame ◦ Acts as a bridge between your DataFrame • Underlying language model, allowing for intuitive interactions • Data retrieval from pandasai import SmartDataframe sales_by_country = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]. }) df = SmartDataframe(sales_by_country) df.chat('Which are the top 5 countries by sales?') # Output: China, United States, Japan, Germany, Australia
  19. © NTT Communications Corporation All Rights Reserved. 19 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme SmartDatalake • When using multiple data frames • For more complex scenarios, such as combining multiple DataFrames • This allows you to manage and query multiple DataFrames simultaneously from pandasai import SmartDatalake employees_data = { 'EmployeeID': [1, 2, 3, 4, 5],. 'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],. 'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']. } salaries_data = { 'EmployeeID': [1, 2, 3, 4, 5],. 'Salary': [5000, 6000, 4500, 7000, 5500]. } employees_df = pd.DataFrame(employees_data) salaries_df = pd.DataFrame(salaries_data) lake = SmartDatalake([employees_df, salaries_df]) lake.chat("Who gets paid the most?") # Output: Olivia gets paid the most The library will join the data tables.
  20. © NTT Communications Corporation All Rights Reserved. 20 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Agent • Agents track the state of the conversation ◦ Capable of handling multiple turns of conversation • Clarification questions ◦ If you do not have enough information to answer the query ◦ If you don't have enough information to answer the query, request clarifying questions • Explanation ◦ why they gave the answer they did. • Rephrase Question ◦ If you want to get a more accurate answer ◦ Ask the LLM side to rephrase the question from pandasai import Agent sales_by_country = pd.DataFrame({ "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"], "sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000],. "deals_opened":[142, 80, 70, 90, 60, 50, 40, 30, 110, 120],. "deals_closed":[120, 70, 60, 80, 50, 40, 30, 20, 100, 110] }) agent = Agent(sales_by_country) agent.chat('Which are the top 5 countries by sales?') # Output: China, United States, Japan, Germany, Australia agent.chat('And which one has the most deals?') # Output: United States has the most deals
  21. © NTT Communications Corporation All Rights Reserved. 21 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Python code generation process by PandasAI PandasAI uses a generative AI model to understand and interpret natural language queries, translate them into Python or SQL queries, manipulate the data using that code, and return the results to the user. The internal execution process is as follows • The generation process consists of the following 5 steps (7 steps to be exact) • PromptGeneration -> CodeGenerator -> CodeExecution -> ResultValidation -> ResultParsing            ResultValidation -> ResultParsing ◦ CacheLookup: Check if data is cached ◦ PromptGeneration: Generation of prompts ◦ CodeGenerator: Generate code from prompts ◦ CachePopulation: Cache of generated data ◦ CodeExecution: Code execution ◦ ResultValidation: Verification of execution results ◦ ResultParsing: Parsing of results
  22. © NTT Communications Corporation All Rights Reserved. 22 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Agenda • Introduction & Context • PandasAI Overview & Key Features • Tips, Best Practices • Wrap-up
  23. © NTT Communications Corporation All Rights Reserved. 23 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Useful Tips to know Useful settings used for production installation (see "Advanced usage" in the official manual for details) • How to delete logs ◦ If used as is, it produces a large amount of LOG. ◦ I want to eliminate unnecessary logs when deploying in a production environment. ◦ The library cannot suppress logging, so it is necessary to force logging to be reduced • whitelisting ◦ What if malicious or vulnerable code is generated? • Control of output (Determinism) ◦ I want the LLM output to be consistent (⇔ vary and be creative each time) ◦ • Do not want to send data to LLM (enforce_privacy) ◦ Send only column names, not data ◦ Data is sent with head randomly shuffled and unidentified. # Hide the data details that appear at the INFO level. getLogger("pandasai").setLevel(WARNING)
  24. © NTT Communications Corporation All Rights Reserved. 24 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme What if malicious code is generated? • For example, if code is generated that directs the extraction of environment variables or internally requests an API ◦ There must be many libraries that you don't want generated (e.g., you don't want os module to be used, etc.) • By default, PandasAI can only execute code that uses whitelisted modules ◦ To prevent malicious code from being executed on the server or locally • However, custom modules can be added to the whitelist ◦ custom_whitelisted_dependencies: List[str] = Field(default_factory=list) from pandasai import SmartDataframe df = SmartDataframe("data.csv", config={ "custom_whitelisted_dependencies": ["any_module"].})
  25. © NTT Communications Corporation All Rights Reserved. 25 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Output Control (Determinism) We want LLM output to be consistent ⇔ We want to be creative by changing it each time • Want to ensure reproducibility ◦ Development, debugging, testing • Consistency ◦ Customer support • Role of temperature=0 ◦ Control output randomness ◦ Higher -> response diversity and creativity increases ◦ Lower -> model is more predictable and conservative • Fixed seed value ◦ *Not yet available in AzureOpenAI # User input prompt prompt: str = card_config.get("prompt") llm_versions = { "gpt-4o": {"deployment_name": "gpt-4o", "api_version": "2023-05-15"}, } } }, } try:. llm = AzureOpenAI( deployment_name=llm_versions["gpt-4o"]["deployment_ name"], api_version=llm_versions["gpt-4o"]["api_version"], temperature=0, seed=26,. )
  26. © NTT Communications Corporation All Rights Reserved. 26 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Agenda • Introduction & Context • PandasAI Overview & Key Features • Tips, Best Practices • Wrap-up
  27. © NTT Communications Corporation All Rights Reserved. 27 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme Today's goal (summary) • You know what PandasAI is. ◦ Search, analysis, and visualization using natural language (Pandas + ChatGPT) • Understanding the library pandasai • Understanding interactive data analysis • Understanding LLM Cases in Data Analysis
  28. © NTT Communications Corporation All Rights Reserved. 28 Call for

    Sponsors When & Where is PyCon APAC 2025? PyCon APAC 2015 Theme References and useful links • Official Pandas Documentation ◦ Getting started is carefully and graphically explained. • Official PandasAI Documentation, Official PandasAI Repository ◦ Libraries are manual oriented. • Japanese LLM Summary Information ◦ LLM information compiled by our group's colleagues in a very comprehensive manner ◦ If you want to know more about Japanese LLM, please click here. • Enable safe Chat with your Databases: with Malloy and PandasAI ◦ https://mindfulgeek.substack.com/p/enable-safe-chat-with-your-databases