Slide 34
Slide 34 text
1
preprocess.py: the three ways to pre-process data
2 3
Code Styles/
Preprocessing
Functions
Pandas Python SQL Query
Filter
dataframe.where(.query)
dataframe.groupby()
dataframe[[“”, “”, ‘“]]
dataframe.loc[]
dataframe.iloc[]
if - else + for +.append()
[[v1, v2, v3] for value in values]
SELECT * FROM Customers
WHERE CustomerID=1;
Replace
dataframe.
fi
llna()
dic = {“key1”: value1, “key2”: value, …}
dataframe['column1'].replace(dic,
inplace=True)
dic = {“key1”: value1, “key2”: value, …}
[[dic.get(v, v) for v in value] for value in values]
SELCT REPLACE("XYZ FGH
XYZ", "X", “m”);
De-duplicate
/Be unique
duplicated() / drop_duplicates()
dataframe['column1'].unique()
(outuput: array([v1, v2, v3]))
set(list)
list({v1, v2, v2, …})
list({value[0] for value in values})
SELCT DISTINCT(column) FROM
table1;
Delete/Drop
dataframe.dropna()
dataframe.drop()
dataframe.drop(index=index list)
if - else + for +.append()
[[v1, v2, v3] for value in values]
DELETE FROM table_name
WHERE condition;