Slide 1

Slide 1 text

1 Rosie

Slide 2

Slide 2 text

So far 2 ● 5 implemented classifiers ● ~3k suspicious reimbursements found ● 629 reports made ● 216 congresspeople reported

Slide 3

Slide 3 text

How we get the data? 3 ● Scrapping ● APIs ○ ReceitaWS ○ Camara

Slide 4

Slide 4 text

I have the data, what do I do now? ● Develop a hypothesis ● Test it out ● Implement a classifier ● Report 4

Slide 5

Slide 5 text

Jupyter Notebooks 5

Slide 6

Slide 6 text

GitHub 6

Slide 7

Slide 7 text

Server 7

Slide 8

Slide 8 text

The irregular companies classifier

Slide 9

Slide 9 text

import data 9

Slide 10

Slide 10 text

data.format() 10

Slide 11

Slide 11 text

data.head(5) 11

Slide 12

Slide 12 text

12 pd.merge

Slide 13

Slide 13 text

13 data.query()

Slide 14

Slide 14 text

Fixtures ● Sample data 14

Slide 15

Slide 15 text

Tests ● Test first, code later 15

Slide 16

Slide 16 text

Show me the code ● The classifier 16

Slide 17

Slide 17 text

What about Rosie? ● import Classifier 17

Slide 18

Slide 18 text

5222 18

Slide 19

Slide 19 text

github.com/datasciencebr @jesstemporal apoia.se/serenata 19