Save 37% off PRO during our Black Friday Sale! »

Ggplot in Python: Diamond Pricing

Ggplot in Python: Diamond Pricing

4af679ea7716884dc09bf8b42488bfbb?s=128

_themessier

April 01, 2016
Tweet

Transcript

  1. Sarah Masud What makes a diamond costly? - An Introduction

    to ggplots in Python
  2. ABOUT ME: • Associate Software Engineer at Red Hat •

    Graduated from Jamia Millia Islamia, new Delhi in 2016 • Mentor- IEEE WIE Project Stand (Data Science Track) • Review Committee- GHC India (Data Science Track) • Volunteer- Women Who Code, Lean In India Links: • Github:https://github.com/sara-02 • Blog: https://themessier.wordpress.com • Linkedin: https://www.linkedin.com/in/sarahmasud
  3. OUTLINE: • Different Visualization tools in Python • What are

    ggplots? • How do ggplots work? • Diamond dataset and how to make sense out of it ◦ Price vs length, breadth, height ◦ Price vs carat ◦ Price vs carat when color and cut are taken into consideration • Why do we learn from visualizations? (observations and conclusions)
  4. DIFFERENT TOOLS • Matplotlib • Ggplots • Seaborn • Kde

    • Plotly A description of what each does, will be provided.
  5. WHAT IS ggplot? Created by H. Wickman, ggplot provides an

    easy interface to generate state of art visualizations. Written originally for R, its success enabled it be used for Python as well. COMPONENTS OF ggplot: • ggplot API- Used to implement the plots. • Data- Uses data as Data Frames as in pandas. • Aesthetics- How the axes and theme looks. • Layer- what information is annotated on top of basic plot.
  6. HOW GGPLOT WORKS 1. ggplot is invoked. 2. A blank

    coordinate system with labeled axes is put up. 3. The points are plotted. 4. The axis redefined and cropped. 5. The line draw as another layer on top of the points.
  7. WHAT DOES THE COLUMNS CONTAIN: • Carat- Weight of the

    diamond (1 carat=0.2g) • Cut- Quality of cut • Color- Color of diamond (J-worst D-best) • Clarity- A measure of how clear the diamond is. • Cert- The level of certification granted. • x- Length in mm. • y- Breadth in mm. • z- Height in mm. • Measurement- Volume in terms of x*y*z. • Table- Width of top of diamond relative to widest point. • Depth- Numerically = (2*z) /(x+y)
  8. PRICE EVALUATION: Diamonds are expensive! Let us try to map

    what factors make them so.
  9. PRICE VS BREADTH OBSERVATIONS: Broader the diamond, higher the price.

    But there are many overlaps. What could be a possible reason ?
  10. PRICE VS DEPTH OBSERVATIONS: The plot is almost vertical, looks

    like depth is not affecting the price. We can drop safely do without it.
  11. PRICE VS CARATS OBSERVATIONS: Again we see that higher the

    carat value, higher the price, but there are some exceptions
  12. PRICE vs CARAT (DIG DEEPER AND DEEPER) SOMETHING AMISS!

  13. UNDERSTANDING THE BEHAVIOUR OBSERVATIONS: At any given carat value, the

    cut and color are also contributing to its cost
  14. THE FACETS FEATURE EXAMPLE

  15. HYPOTHESIS VALIDATION (sample example) Assumption: Carat affects the price Result:

    Yes, it does but there are exceptions Assumption: Better the cut quality, higher the price Result: Yes, for a given carat value cut quality changes price.
  16. SOURCE OF RAW DATA: https://github.com/SolomonMg/diamonds-data BLOGPOSTS: 1. https://themessier.wordpress.com/2015/06/17/ggplot-in-python-part-1 2. https://themessier.wordpress.com/2015/06/17/ggplot-in-python-part-4/

    3. https://themessier.wordpress.com/2015/06/17/ggplot-in-python-part-5/
  17. THANK YOU