Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Telling the future with mathematics: Probabilit...

Telling the future with mathematics: Probability and crime prediction

Summer School of Science, Požega, Croatia, 2019-08-04.

Telling the future with mathematics: probability and crime prediction

If you've ever watched late-night TV, you've probably seen fortune tellers claiming they can tell the future if you call them – usually using some magic instruments. While the aforementioned is far-fetched, mathematicians can actually use probability calculations to predict the certainty of an outcome. Furthermore, we can use our knowledge of probability to help us understand when and where crime will (potentially) happen so police departments can work pre-emptively. Using mathematical models combined with knowledge from social media platforms such as Twitter, mathematics can help police departments battle crime before it even happens. In this swapshop, we'll discuss how this application of mathematics works and we'll try to build some simplified models for prediction.

Errata: There have been some concerns regarding predictive policing, voicing discrimination concerns or opinions regarding its role in combatting discrimination, including some voiced by the Human Rights Data Analysis Group. While working on this swapshop, I was not aware of such issues nor was my intent to promote an inherently biased tool: I was working on the (wrong) assumption that the data collected is perfectly unbiased. However, I did aim to show in task handouts how humans are inherently biased when intuitively interpreting data. I am sorry if this came across wrong and obviously I should have put more thought into the human rights aspect of this topic.

Mario Borna Mjertan

August 04, 2019
Tweet

More Decks by Mario Borna Mjertan

Other Decks in Science

Transcript

  1. Quick question... How many of you are from Croatia? (or

    nearby) Nick Savchenko, Wikimedia Commons, CC-BY-SA
  2. Here’s what we’ll be doing. Basic probability (or: how can

    maths predict anything?) Text mining (or: what can your DMs tell us?) Spacial-temporal generalized additive models (or: how can we put this all together to tell the future?) Building our own small, localised models (or: DIY N3MBERS)
  3. For instance... • There are 52 cards in a deck.

    There are 13 red cards. • We draw one card. What’s the probability that it’s red? • nw = 13 • np = 52 • p = 0.25 = 25% = ¼
  4. For instance... • There are 52 cards in a deck.

    There are 13 red cards. • We draw four cards. What’s the probability that they’re all red? • nw = 13*12*11*10 • np = 52*51*50*49 • p = 0.00264 = 0.26%
  5. Try it out yourselves. • A phone number is made

    out of: • a three digit network code • a six digit user number How many nine digit phone numbers can be made if the network code is made up out of the digits 0, 7, 8 or 9, while the user number is made out of the digits 0 through 9? • A railroad network consists of 30 stations. How many different one-way tickets can that network sell? • We throw a six-sided game dice three times. What are the chances that we get a sum of 13?
  6. What do we need to take into account? Just plotting

    crime on a map won’t help us predict where crime will happen.
  7. What do we need to take into account? • Usually,

    we draw layers on top of a map: • The base layer • Land use • Demographics • Crime initatives • Transport networks • Regenration areas • Crime incidents • Also, we usually take into account the motives behind crime (like police officers do)
  8. A hotspot is a geographical area of higher than average

    crime. It is an area of crime concentration, relative to the distribution of crime across the whole region of interest (e.g. a city centre, census ward or tract, municipal district, county or state). Hotspots are often clusters of crime that can exist at different scales of interest.
  9. Murder Fight Drug abuse Train accident Speeding Murder – around

    the river Fights – near nightclubs, in the city centre Drug abuse – parks, low population area Train accidents – near the railroad Speeding – long roads in a good state
  10. This helps us: • Know where crime hotspots for certain

    crime types are • Make decent arguments about why crime happens where • for instance, drug abuse happens in low population areas and parks because there’s less chance of getting reported to the police • fights happen around nightclubs because people get drunk and emotional etc. • Know where to efficiently have patrol cars
  11. But that’s not enough. We need to have confidence in

    the fact that an area is a crime hotspot.
  12. We also need: • A formal, mathematical way of finding

    out where the hotspots are • A confidence measurement • A way to do everything fast
  13. Continuous surface smoothing methods • We can visualise the distribution

    of crime and identify hotspots by creating a smooth continuous surface to represent the density or volume of crimes distributed across the study area. èInterpolation techniques, include inverse distance weighting and kriging. • They use a population or intensity value – with crime data, we usually don’t have those and we’re not trying to estimate a number of crimes. • What’s a more suitable method?
  14. Quartic kernel density estimation • The quartic kernel density estimation

    creates a smooth surface of the variation in the density of point events across an area. • The method is explained in these steps: • A fine grid is generated over the point distribution • A moving 3D function of a specified radius visits each cell and calculates weights for each point within the kernel’s radius • Points closer to the centre will receieve a higher weight, and therefore contribute more to the cell’s total density value • Final grid cell values are calculated by summing the values of all kernel estimates for each location • We get a result which tells us something about the density or clustering of crime points at all locations in our study area.
  15. K-means clustering • A general method for finding clusters of

    data • This can be useful here as well - for instance, in our 2D mapping • However, actually implementing the algorithm here can be challenging • NP-hard à slow • It’s a method of vector quantization • Given an initial set of k means m1, ..., mk, the algorithm proceeds by alternating between two steps • Asign each observation to the cluster whose mean has the least squared Euclidean distance • Calculate the new means of the observation in the new clusters • The algorithm has converged when the assignments no longer changed, but doesn’t guarantee to find the optimum
  16. Let’s compare these two chats. A friendly chat between two

    gossipy friends No signs of aggression
  17. Let’s compare these two chats. Clear signs of aggression Loudly

    voicing his/her displeasure (this one is fake)
  18. We could... Collect and analyse text messages Assign a score

    of “harmfulness” to them Assign a category of potential crime Alert services
  19. How does machine learning work? • A lot of complicated

    maths • Linear algebra, tensors etc. are really useful here! • There are some general tools that can help us • For example, Tensorflow is a great framework for building machine learning services • However, it comes at a cost • Expensive • Takes a lot of existing data to draw a meaningful conclusion • Takes a lot of time and computer resources
  20. I know you’re going what the... • Let’s break it

    down. • Spacial • We take into account the place that the crime could potentially occur • Temporal • We take into account the time that the crime could potentially occur • Additive • A method in statistics • Nonparametric regression method • It uses a 1D smoother to build a restricted class of nonparametric regression models • In translation, it takes some data, leaves out noise and random incidents and builds out ‘different classes’ of crime • Model • Basically, a way to use maths to predict real life
  21. However... • This is complicated • We’d really need a

    strong statistics background • We’ll be going for an intuitive understanding of how this works • I don’t want to torture you with university-level maths here
  22. We’re done, I guess. FEEL FREE TO CONTACT ME FOR

    ANY MATHS- RELATED QUESTIONS (ESPECIALLY IF YOU’D LIKE TO STUDY MATHS) EMAIL ME: [email protected] DM ME: @MARIOBORNAMJERTAN ON MESSENGER @MBMJERTAN ON INSTAGRAM 00385958455325 ON WHATSAPP