Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Predicting irregularities in public bidding: an...
Search
Thiago Marzagão
May 28, 2017
Research
0
3.3k
Predicting irregularities in public bidding: an application of neural networks
Thiago Marzagão
May 28, 2017
Tweet
Share
More Decks by Thiago Marzagão
See All by Thiago Marzagão
Aula inagural na ENAP
thiagomarzagao
0
980
SICSS presentation
thiagomarzagao
0
950
antitrust uses and misuses (in the age of Big Data)
thiagomarzagao
1
1.8k
mineração de dados
thiagomarzagao
0
2.5k
mineração de dados no governo
thiagomarzagao
1
3.2k
Using AI to fight corruption in the Brazilian government
thiagomarzagao
0
280
Uso de Técnicas de Mineração de Dados no Monitoramento dos Gastos Públicos e no Combate à Corrupção
thiagomarzagao
0
3.1k
Mineração de Dados no Governo Federal
thiagomarzagao
0
120
Classificação Automatizada de Produtos e Serviços Licitados
thiagomarzagao
0
80
Other Decks in Research
See All in Research
大規模言語モデルを用いたニュースデータのセンチメント判定モデルの開発および実体経済センチメントインデックスの構成
nomamist
1
170
公立高校入試等に対する受入保留アルゴリズム(DA)導入の提言
shunyanoda
0
2.2k
知識強化言語モデルLUKE @ LUKEミートアップ
ikuyamada
0
390
20250226 NLP colloquium: "SoftMatcha: 10億単語規模コーパス検索のための柔らかくも高速なパターンマッチャー"
de9uch1
0
330
A Segment Anything Model based weakly supervised learning method for crop mapping using Sentinel-2 time series images
satai
3
290
Vision Language Modelと完全自動運転AIの最新動向
tsubasashi
2
370
eAI (Engineerable AI) プロジェクトの全体像 / Overview of eAI Project
ishikawafyu
0
440
Collaborative Development of Foundation Models at Japanese Academia
odashi
2
530
(NULLCON Goa 2025)Windows Keylogger Detection: Targeting Past and Present Keylogging Techniques
asuna_jp
1
370
Batch Processing Algorithm for Elliptic Curve Operations and Its AVX-512 Implementation
herumi
0
140
BtoB プロダクトにおけるインサイトマネジメントの必要性 現場ドリブンなカミナシがインサイトマネジメントに取り組むワケ / Why field-driven Kaminashi is working on insight management
kaminashi
1
380
o1 pro mode の調査レポート
smorce
0
150
Featured
See All Featured
Typedesign – Prime Four
hannesfritz
41
2.6k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
We Have a Design System, Now What?
morganepeng
52
7.5k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
5
560
Building Flexible Design Systems
yeseniaperezcruz
329
38k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.2k
How to Think Like a Performance Engineer
csswizardry
23
1.5k
Agile that works and the tools we love
rasmusluckow
328
21k
Into the Great Unknown - MozCon
thekraken
37
1.7k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.1k
A Tale of Four Properties
chriscoyier
158
23k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.5k
Transcript
Predicting irregularities in public bidding: an application of neural networks
Observatory of Public Spending
Government contractor doesn’t pay employees Default epidemy in the federal
government: 4 companies went bankrupt Construction company abandons 3 projects Observatory of Public Spending
Observatory of Public Spending what if we could predict which
contractors will become headaches?
Observatory of Public Spending
Observatory of Public Spending impossible to do manually ~25k new
contracts every year
Observatory of Public Spending
Observatory of Public Spending data + neural networks = predictions
Observatory of Public Spending data: - n = 10186 -
9442 (~93%) not problem - 744 (~ 7%) problem - 2011-2016
Observatory of Public Spending data: - Y: has the company
been punished before?
Observatory of Public Spending data: - X: a total of
183 attributes, like: - # of employees - average salary of employees - # of auctions it participated - donated $ to politicians? - …
Observatory of Public Spending neural networks: - two approaches: -
(“traditional”) neural network - deep neural network
Observatory of Public Spending TNN: - 2 hidden layers -
can’t handle 183 attributes - hence must use PCA first
Observatory of Public Spending TNN: - PCA - selected 24
continuous variables based on covariance matrix - PCA reduced 24 variables to 9 components (~70% of variance; all components w/ eigenvalue > 1)
Observatory of Public Spending TNN: - 9 components + 21
binary vars. - 80% training - w/ oversampling - 20% testing - boosting (10 models)
Observatory of Public Spending DNN: - 3 hidden layers -
hundreds of neurons - can handle all 183 variables - can handle complex relationships between the variables
Observatory of Public Spending DNN: - all 183 variables (no
PCA) - no oversampling - 80% training - 20% testing - 5-fold cross-validation
Observatory of Public Spending
Observatory of Public Spending how can we evaluate performance? -
accuracy (% of correct predictions overall) - recall (% of problems predicted to be problems) - precision (% of predicted problems that are problems)
Observatory of Public Spending how can we evaluate performance? -
accuracy (% of correct predictions overall) - recall (% of problems predicted to be problems) - precision (% of predicted problems that are problems)
Observatory of Public Spending results: - TNN precision: 0.24 -
DNN precision: 0.79 - huge difference! extra computational cost of DNN is worth it
Observatory of Public Spending to do: - improve recall -
0.58 w/ TNN - 0.26 w/ DNN - change the law - must allow gov not to contract w/ high risk companies
Observatory of Public Spending Ting Sun
[email protected]
Leonardo Sales
[email protected]
Observatory of Public Spending @tmarzagao thiagomarzagao.com