Predicting irregularities in public bidding: an application of neural networks

Slide 1

Slide 1 text

Predicting irregularities in public bidding: an application of neural networks Observatory of Public Spending

Slide 2

Slide 2 text

Government contractor doesn’t pay employees Default epidemy in the federal government: 4 companies went bankrupt Construction company abandons 3 projects Observatory of Public Spending

Slide 3

Slide 3 text

Observatory of Public Spending what if we could predict which contractors will become headaches?

Slide 4

Slide 4 text

Observatory of Public Spending

Slide 5

Slide 5 text

Observatory of Public Spending impossible to do manually ~25k new contracts every year

Slide 6

Slide 6 text

Observatory of Public Spending

Slide 7

Slide 7 text

Observatory of Public Spending data + neural networks = predictions

Slide 8

Slide 8 text

Observatory of Public Spending data: - n = 10186 - 9442 (~93%) not problem - 744 (~ 7%) problem - 2011-2016

Slide 9

Slide 9 text

Observatory of Public Spending data: - Y: has the company been punished before?

Slide 10

Slide 10 text

Observatory of Public Spending data: - X: a total of 183 attributes, like: - # of employees - average salary of employees - # of auctions it participated - donated $ to politicians? - …

Slide 11

Slide 11 text

Observatory of Public Spending neural networks: - two approaches: - (“traditional”) neural network - deep neural network

Slide 12

Slide 12 text

Observatory of Public Spending TNN: - 2 hidden layers - can’t handle 183 attributes - hence must use PCA first

Slide 13

Slide 13 text

Observatory of Public Spending TNN: - PCA - selected 24 continuous variables based on covariance matrix - PCA reduced 24 variables to 9 components (~70% of variance; all components w/ eigenvalue > 1)

Slide 14

Slide 14 text

Observatory of Public Spending TNN: - 9 components + 21 binary vars. - 80% training - w/ oversampling - 20% testing - boosting (10 models)

Slide 15

Slide 15 text

Observatory of Public Spending DNN: - 3 hidden layers - hundreds of neurons - can handle all 183 variables - can handle complex relationships between the variables

Slide 16

Slide 16 text

Observatory of Public Spending DNN: - all 183 variables (no PCA) - no oversampling - 80% training - 20% testing - 5-fold cross-validation

Slide 17

Slide 17 text

Observatory of Public Spending

Slide 18

Slide 18 text

Observatory of Public Spending how can we evaluate performance? - accuracy (% of correct predictions overall) - recall (% of problems predicted to be problems) - precision (% of predicted problems that are problems)

Slide 19

Slide 19 text

Slide 20

Slide 20 text

Observatory of Public Spending results: - TNN precision: 0.24 - DNN precision: 0.79 - huge difference! extra computational cost of DNN is worth it

Slide 21

Slide 21 text

Observatory of Public Spending to do: - improve recall - 0.58 w/ TNN - 0.26 w/ DNN - change the law - must allow gov not to contract w/ high risk companies

Slide 22

Slide 22 text

Observatory of Public Spending Ting Sun [email protected] Leonardo Sales [email protected]

Slide 23

Slide 23 text

Observatory of Public Spending @tmarzagao thiagomarzagao.com