Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Predicting irregularities in public bidding: an...
Search
Thiago Marzagão
May 28, 2017
Research
0
3.2k
Predicting irregularities in public bidding: an application of neural networks
Thiago Marzagão
May 28, 2017
Tweet
Share
More Decks by Thiago Marzagão
See All by Thiago Marzagão
Aula inagural na ENAP
thiagomarzagao
0
910
SICSS presentation
thiagomarzagao
0
880
antitrust uses and misuses (in the age of Big Data)
thiagomarzagao
1
1.8k
mineração de dados
thiagomarzagao
0
2.5k
mineração de dados no governo
thiagomarzagao
1
3.1k
Using AI to fight corruption in the Brazilian government
thiagomarzagao
0
260
Uso de Técnicas de Mineração de Dados no Monitoramento dos Gastos Públicos e no Combate à Corrupção
thiagomarzagao
0
3k
Mineração de Dados no Governo Federal
thiagomarzagao
0
120
Classificação Automatizada de Produtos e Serviços Licitados
thiagomarzagao
0
75
Other Decks in Research
See All in Research
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
sosk
1
950
marukotenant01/tenant-20240916
marketing2024
0
500
12
0325
0
190
クラウドソーシングによる学習データ作成と品質管理(セキュリティキャンプ2024全国大会D2講義資料)
takumi1001
0
280
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
eumesy
PRO
7
1.2k
LLM時代にLabは何をすべきか聞いて回った1年間
hargon24
1
490
メールからの名刺情報抽出におけるLLM活用 / Use of LLM in extracting business card information from e-mails
sansan_randd
2
140
SNLP2024:Planning Like Human: A Dual-process Framework for Dialogue Planning
yukizenimoto
1
330
Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices
masakat0
0
220
テキストマイニングことはじめー基本的な考え方からメディアディスコース研究への応用まで
langstat
1
120
クロスセクター効果研究会 熊本都市交通リノベーション~「車1割削減、渋滞半減、公共交通2倍」の実現へ~
trafficbrain
0
250
文書画像のデータ化における VLM活用 / Use of VLM in document image data conversion
sansan_randd
2
190
Featured
See All Featured
Agile that works and the tools we love
rasmusluckow
327
21k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.1k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
StorybookのUI Testing Handbookを読んだ
zakiyama
27
5.3k
Fashionably flexible responsive web design (full day workshop)
malarkey
405
65k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
16
2.1k
Visualization
eitanlees
145
15k
Automating Front-end Workflow
addyosmani
1366
200k
Build The Right Thing And Hit Your Dates
maggiecrowley
33
2.4k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
250
21k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
Transcript
Predicting irregularities in public bidding: an application of neural networks
Observatory of Public Spending
Government contractor doesn’t pay employees Default epidemy in the federal
government: 4 companies went bankrupt Construction company abandons 3 projects Observatory of Public Spending
Observatory of Public Spending what if we could predict which
contractors will become headaches?
Observatory of Public Spending
Observatory of Public Spending impossible to do manually ~25k new
contracts every year
Observatory of Public Spending
Observatory of Public Spending data + neural networks = predictions
Observatory of Public Spending data: - n = 10186 -
9442 (~93%) not problem - 744 (~ 7%) problem - 2011-2016
Observatory of Public Spending data: - Y: has the company
been punished before?
Observatory of Public Spending data: - X: a total of
183 attributes, like: - # of employees - average salary of employees - # of auctions it participated - donated $ to politicians? - …
Observatory of Public Spending neural networks: - two approaches: -
(“traditional”) neural network - deep neural network
Observatory of Public Spending TNN: - 2 hidden layers -
can’t handle 183 attributes - hence must use PCA first
Observatory of Public Spending TNN: - PCA - selected 24
continuous variables based on covariance matrix - PCA reduced 24 variables to 9 components (~70% of variance; all components w/ eigenvalue > 1)
Observatory of Public Spending TNN: - 9 components + 21
binary vars. - 80% training - w/ oversampling - 20% testing - boosting (10 models)
Observatory of Public Spending DNN: - 3 hidden layers -
hundreds of neurons - can handle all 183 variables - can handle complex relationships between the variables
Observatory of Public Spending DNN: - all 183 variables (no
PCA) - no oversampling - 80% training - 20% testing - 5-fold cross-validation
Observatory of Public Spending
Observatory of Public Spending how can we evaluate performance? -
accuracy (% of correct predictions overall) - recall (% of problems predicted to be problems) - precision (% of predicted problems that are problems)
Observatory of Public Spending how can we evaluate performance? -
accuracy (% of correct predictions overall) - recall (% of problems predicted to be problems) - precision (% of predicted problems that are problems)
Observatory of Public Spending results: - TNN precision: 0.24 -
DNN precision: 0.79 - huge difference! extra computational cost of DNN is worth it
Observatory of Public Spending to do: - improve recall -
0.58 w/ TNN - 0.26 w/ DNN - change the law - must allow gov not to contract w/ high risk companies
Observatory of Public Spending Ting Sun
[email protected]
Leonardo Sales
[email protected]
Observatory of Public Spending @tmarzagao thiagomarzagao.com