WHAT IS SERENATA? ● The main goal: use artificial intelligence to social control of public administration ● We learnt how to work with data science using open data (CSVs that show reimbursements). ● Multidisciplinary team: Scientists, programers, marketing and journalists ● Open Source: More than 700 members in the Telegram group.
● We did a crowdfunding campaign that would pay 3 months of development ● Data science projects usually take 6 months to a year, what can we do in 3 months? ● Techniques: hypothesis driven development and timeboxing HOW DO WE GET HERE?
● Hypothesis-Driven Development ● Survey of hypotheses that seek the solution of a problem ● Multidisciplinary team as a way to expand knowledge HDD: HYPOTHESES
● List of hypotheses to explore ● Associate a time window with development, and if it doesn't work, switch to another hypothesis ● Back to previous assumptions as time goes by TIMEBOXING
● We studied the available dataset, and by that we defined some hypothesis we could have: ○ Non-Standard Prices on Food ○ Traveled distance and spending ○ Invalid tax identification number ○ Monthly maximums (taxi, fuel, ...) DEVELOPED HYPOTHESES
● Jupyter notebook with initial analysis ● Script for parsing the entire database ● Training an initial model ● Retraining after time period DEVELOPMENT CYCLE