Leticia Portella
February 19, 2019
54

# Introduction to Jupyter Notebooks & Data Analytics with Kaggle

## Leticia Portella

February 19, 2019

## Transcript

3. ### Kaggle is a place where you can ﬁnd a lot

of datasets, it already have installed most of tools you’ll need for a basic analysis, is a good place to see the people’s code and built a portfolio Why Kaggle?

5. None

7. None
8. None
9. ### Notebooks are a place where you can create code, show

graphs, document your methodologies and ﬁndings… all in a single place

cell ran

statement)
14. ### Jupyter Shortcuts Ctrl + Enter = Run cell ESC +

B = New cell below ESC + dd = Delete cell

16. ### Reading a document If you check the ﬁrst cell, it

will tell you that the documents are ready for you in ../input/. So, we can read the ﬁles by with a Pandas function and with the path of the ﬁle df = pd.read_csv(‘../input/train.csv')
17. ### Dataframes Dataframes are similiar to what you ﬁnd in Excel

structures. You have rows indicated by numbers and columns with names. You can check the ﬁrst 5 rows of a data frame to see the basic structure: df.head()

21. ### Dataframes You can check the structure of a dataframe, to

get an idea of how many rows and columns it has: df.shape
22. ### Dataframes You can check the main statistical characteristics of the

numerical columns of a data frame df.describe()

24. ### Series You can select a single column of the data

frame to work with. A column of a Dataframe is called Series and have some special properties df['Age']
25. ### Series You can also check the statistical characteristics of a

Series df['Age'].describe()
26. ### Series You can ﬁlter a series to see which rows

have adults. This will return a Series of True and False. df[‘Age'] > 10
27. ### Series And Series have functions that help you quickly plot

some of it. We can, for instance, check the histogram of Ages. df[‘Age’].plot.hist()

29. ### Series And Series have functions that help you quickly plot

some of it. We can, for instance, check the histogram of Ages. df[‘Age’].plot.hist()
30. ### Series We can count how many passengers were on each

class df[‘Pclass’].value_counts()
31. ### Series Since the result of a value_counts is also a

Series, we can store this value in a variable and use it to plot a pie chart :) passengers_per_class = df[‘Pclass’].value_counts() passengers_per_class.plot.pie()
32. ### Exercise Plot a bar plot with the number of people

that survived and didn’t survive (Column Survived)
33. ### Series Remember we could ﬁlter a series? We could use

it to checkout our variables. Let’s see which class survived the most survived = df[‘Survived'] > 0 ﬁltered_df = df[survived] passenger_per_class = ﬁltered_df[“Pclass”].value_counts() passenger_per_class.plot.pie()

classes:
35. ### Function Now if we pass an age to the function

it returns a label:
36. ### Series We can create a new column (Ageclass) using the

Column Age and this function :) df[“Ageclass”] = df[“Age”].apply(age_to_ageclass)
37. ### Exercise Now we have classes for age, we can check

which sector survived the most, the same we did with Class :)
38. ### Dataframes We can group two columns to count df[“Ageclass”] =

df[“Age”].apply(age_to_ageclass)