Slide 1

Slide 1 text

Serenata de Amor’s data science Jessica Temporal

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

The Team 3

Slide 4

Slide 4 text

4 Rosie

Slide 5

Slide 5 text

So far 5 ● 5 implemented classifiers ● ~3k suspicious reimbursements found ● 629 reports made ● 216 congresspeople reported

Slide 6

Slide 6 text

How we get the data? 6 ● Scrapping ● APIs ○ ReceitaWS ○ Camara

Slide 7

Slide 7 text

I have the data, what do I do now? ● Develop a hypothesis ● Test it out ● Implement a classifier ● Report 7

Slide 8

Slide 8 text

The irregular companies classifier

Slide 9

Slide 9 text

import data 9

Slide 10

Slide 10 text

data.format() 10

Slide 11

Slide 11 text

data.head(5) 11

Slide 12

Slide 12 text

12 pd.merge

Slide 13

Slide 13 text

13 data.query()

Slide 14

Slide 14 text

14 Classifier Hypothesis

Slide 15

Slide 15 text

Fixtures ● Sample data 15

Slide 16

Slide 16 text

Tests ● Test first, code later 16

Slide 17

Slide 17 text

Show me the code ● The classifier 17

Slide 18

Slide 18 text

What about Rosie? ● import Classifier 18

Slide 19

Slide 19 text

5222

Slide 20

Slide 20 text

It’s me, Jessica github.com/datasciencebr @jesstemporal apoia.se/serenata