Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Predicting irregularities in public bidding: an...
Search
Thiago Marzagão
May 28, 2017
Research
0
3.3k
Predicting irregularities in public bidding: an application of neural networks
Thiago Marzagão
May 28, 2017
Tweet
Share
More Decks by Thiago Marzagão
See All by Thiago Marzagão
Aula inagural na ENAP
thiagomarzagao
0
1k
SICSS presentation
thiagomarzagao
0
970
antitrust uses and misuses (in the age of Big Data)
thiagomarzagao
1
1.9k
mineração de dados
thiagomarzagao
0
2.6k
mineração de dados no governo
thiagomarzagao
1
3.2k
Using AI to fight corruption in the Brazilian government
thiagomarzagao
0
290
Uso de Técnicas de Mineração de Dados no Monitoramento dos Gastos Públicos e no Combate à Corrupção
thiagomarzagao
0
3.1k
Mineração de Dados no Governo Federal
thiagomarzagao
0
130
Classificação Automatizada de Produtos e Serviços Licitados
thiagomarzagao
0
88
Other Decks in Research
See All in Research
EarthSynth: Generating Informative Earth Observation with Diffusion Models
satai
3
110
Adaptive fusion of multi-modal remote sensing data for optimal sub-field crop yield prediction
satai
3
220
Generative Models 2025
takahashihiroshi
21
12k
Creation and environmental applications of 15-year daily inundation and vegetation maps for Siberia by integrating satellite and meteorological datasets
satai
3
130
ストレス計測方法の確立に向けたマルチモーダルデータの活用
yurikomium
0
710
大規模な2値整数計画問題に対する 効率的な重み付き局所探索法
mickey_kubo
1
270
NLP2025参加報告会 LT資料
hargon24
1
320
Type Theory as a Formal Basis of Natural Language Semantics
daikimatsuoka
1
240
EarthMarker: A Visual Prompting Multimodal Large Language Model for Remote Sensing
satai
3
350
「エージェントって何?」から「実際の開発現場で役立つ考え方やベストプラクティス」まで
mickey_kubo
0
120
ノンパラメトリック分布表現を用いた位置尤度場周辺化によるRTK-GNSSの整数アンビギュイティ推定
aoki_nosse
0
330
数理最適化と機械学習の融合
mickey_kubo
15
8.9k
Featured
See All Featured
Documentation Writing (for coders)
carmenintech
72
4.9k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Embracing the Ebb and Flow
colly
86
4.7k
How GitHub (no longer) Works
holman
314
140k
Agile that works and the tools we love
rasmusluckow
329
21k
Measuring & Analyzing Core Web Vitals
bluesmoon
7
510
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.6k
Scaling GitHub
holman
460
140k
KATA
mclloyd
30
14k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
The Power of CSS Pseudo Elements
geoffreycrofte
77
5.9k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
138
34k
Transcript
Predicting irregularities in public bidding: an application of neural networks
Observatory of Public Spending
Government contractor doesn’t pay employees Default epidemy in the federal
government: 4 companies went bankrupt Construction company abandons 3 projects Observatory of Public Spending
Observatory of Public Spending what if we could predict which
contractors will become headaches?
Observatory of Public Spending
Observatory of Public Spending impossible to do manually ~25k new
contracts every year
Observatory of Public Spending
Observatory of Public Spending data + neural networks = predictions
Observatory of Public Spending data: - n = 10186 -
9442 (~93%) not problem - 744 (~ 7%) problem - 2011-2016
Observatory of Public Spending data: - Y: has the company
been punished before?
Observatory of Public Spending data: - X: a total of
183 attributes, like: - # of employees - average salary of employees - # of auctions it participated - donated $ to politicians? - …
Observatory of Public Spending neural networks: - two approaches: -
(“traditional”) neural network - deep neural network
Observatory of Public Spending TNN: - 2 hidden layers -
can’t handle 183 attributes - hence must use PCA first
Observatory of Public Spending TNN: - PCA - selected 24
continuous variables based on covariance matrix - PCA reduced 24 variables to 9 components (~70% of variance; all components w/ eigenvalue > 1)
Observatory of Public Spending TNN: - 9 components + 21
binary vars. - 80% training - w/ oversampling - 20% testing - boosting (10 models)
Observatory of Public Spending DNN: - 3 hidden layers -
hundreds of neurons - can handle all 183 variables - can handle complex relationships between the variables
Observatory of Public Spending DNN: - all 183 variables (no
PCA) - no oversampling - 80% training - 20% testing - 5-fold cross-validation
Observatory of Public Spending
Observatory of Public Spending how can we evaluate performance? -
accuracy (% of correct predictions overall) - recall (% of problems predicted to be problems) - precision (% of predicted problems that are problems)
Observatory of Public Spending how can we evaluate performance? -
accuracy (% of correct predictions overall) - recall (% of problems predicted to be problems) - precision (% of predicted problems that are problems)
Observatory of Public Spending results: - TNN precision: 0.24 -
DNN precision: 0.79 - huge difference! extra computational cost of DNN is worth it
Observatory of Public Spending to do: - improve recall -
0.58 w/ TNN - 0.26 w/ DNN - change the law - must allow gov not to contract w/ high risk companies
Observatory of Public Spending Ting Sun
[email protected]
Leonardo Sales
[email protected]
Observatory of Public Spending @tmarzagao thiagomarzagao.com