_themessier
April 01, 2016

# Ggplot in Python: Diamond Pricing

April 01, 2016

## Transcript

1. ### Sarah Masud What makes a diamond costly? - An Introduction

to ggplots in Python
2. ### ABOUT ME: • Associate Software Engineer at Red Hat •

Graduated from Jamia Millia Islamia, new Delhi in 2016 • Mentor- IEEE WIE Project Stand (Data Science Track) • Review Committee- GHC India (Data Science Track) • Volunteer- Women Who Code, Lean In India Links: • Github:https://github.com/sara-02 • Blog: https://themessier.wordpress.com • Linkedin: https://www.linkedin.com/in/sarahmasud
3. ### OUTLINE: • Different Visualization tools in Python • What are

ggplots? • How do ggplots work? • Diamond dataset and how to make sense out of it ◦ Price vs length, breadth, height ◦ Price vs carat ◦ Price vs carat when color and cut are taken into consideration • Why do we learn from visualizations? (observations and conclusions)
4. ### DIFFERENT TOOLS • Matplotlib • Ggplots • Seaborn • Kde

• Plotly A description of what each does, will be provided.
5. ### WHAT IS ggplot? Created by H. Wickman, ggplot provides an

easy interface to generate state of art visualizations. Written originally for R, its success enabled it be used for Python as well. COMPONENTS OF ggplot: • ggplot API- Used to implement the plots. • Data- Uses data as Data Frames as in pandas. • Aesthetics- How the axes and theme looks. • Layer- what information is annotated on top of basic plot.
6. ### HOW GGPLOT WORKS 1. ggplot is invoked. 2. A blank

coordinate system with labeled axes is put up. 3. The points are plotted. 4. The axis redefined and cropped. 5. The line draw as another layer on top of the points.
7. ### WHAT DOES THE COLUMNS CONTAIN: • Carat- Weight of the

diamond (1 carat=0.2g) • Cut- Quality of cut • Color- Color of diamond (J-worst D-best) • Clarity- A measure of how clear the diamond is. • Cert- The level of certification granted. • x- Length in mm. • y- Breadth in mm. • z- Height in mm. • Measurement- Volume in terms of x*y*z. • Table- Width of top of diamond relative to widest point. • Depth- Numerically = (2*z) /(x+y)
8. ### PRICE EVALUATION: Diamonds are expensive! Let us try to map

what factors make them so.
9. ### PRICE VS BREADTH OBSERVATIONS: Broader the diamond, higher the price.

But there are many overlaps. What could be a possible reason ?
10. ### PRICE VS DEPTH OBSERVATIONS: The plot is almost vertical, looks

like depth is not affecting the price. We can drop safely do without it.
11. ### PRICE VS CARATS OBSERVATIONS: Again we see that higher the

carat value, higher the price, but there are some exceptions

13. ### UNDERSTANDING THE BEHAVIOUR OBSERVATIONS: At any given carat value, the

cut and color are also contributing to its cost

15. ### HYPOTHESIS VALIDATION (sample example) Assumption: Carat affects the price Result:

Yes, it does but there are exceptions Assumption: Better the cut quality, higher the price Result: Yes, for a given carat value cut quality changes price.
16. ### SOURCE OF RAW DATA: https://github.com/SolomonMg/diamonds-data BLOGPOSTS: 1. https://themessier.wordpress.com/2015/06/17/ggplot-in-python-part-1 2. https://themessier.wordpress.com/2015/06/17/ggplot-in-python-part-4/

3. https://themessier.wordpress.com/2015/06/17/ggplot-in-python-part-5/