Slide 1

Slide 1 text

1 Rosie

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

So far 3 ● 5 implemented classifiers ● >9k suspicious reimbursements found ● ~1k reports made ● >200 congresspeople reported

Slide 4

Slide 4 text

How we get the data? 4 ● Scrapping ● APIs ○ ReceitaWS ○ Camara

Slide 5

Slide 5 text

I have the data, what do I do now? ● Develop a hypothesis ● Test it out ● Implement a classifier ● Report 5

Slide 6

Slide 6 text

Jupyter Notebooks 6

Slide 7

Slide 7 text

GitHub 7

Slide 8

Slide 8 text

The irregular companies classifier

Slide 9

Slide 9 text

Ativa 9

Slide 10

Slide 10 text

10 Nula Inapta Baixada Suspensa

Slide 11

Slide 11 text

import data 11

Slide 12

Slide 12 text

data.format() 12

Slide 13

Slide 13 text

data.head(5) 13

Slide 14

Slide 14 text

14 pd.merge

Slide 15

Slide 15 text

15 data.query()

Slide 16

Slide 16 text

16 Classifier Validated Hypothesis

Slide 17

Slide 17 text

Fixtures ● Sample data 17

Slide 18

Slide 18 text

Tests ● Test first, code later 18

Slide 19

Slide 19 text

Show me the code ● The classifier 19

Slide 20

Slide 20 text

What about Rosie? ● import Classifier 20

Slide 21

Slide 21 text

5222 21

Slide 22

Slide 22 text

/okfn-brasil @jesstemporal #serenataDeAmor serenata.ai Obrigada!