Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Particles mining at CERN: Turning a mountain of data into a molehill

Particles mining at CERN: Turning a mountain of data into a molehill

Talk by Ellie Dobson, Application Engineer @Mathworks at Data Science London meetup

Data Science London

January 11, 2014
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. Particle mining …making a mountain of data into a molehill

    Ellie Dobson Particle physicist and MathWorks Application Engineer
  2. Several thousand billion protons …99.9999991% of light speed …orbiting the

    ring 11 000 times/s Colliding particles at the LHC 1 collision every 25 ns
  3. New fundamental particles may be produced in the collision and

    it is the job of a particle physicist to look for them
  4. 1111011111011100101011110100000111001000100000001000001111011111 1011100101011110100000111001000100000001000001111011111101110010 1011110100000111001000100000010000011110111111011100101011110100 0001110010001000000010000011110111111011100101011110100000111001 0001000000100000111101111110111001010111101000001110010001000000 0100000111101111110111001010111101000001110010001000000010000011 1101111101110010101111010000011100100010000000100000111101111110 1110010101111010000011100100010000000100000111101111110111001010 1111010000011100100010000000100000111101111110111001010111101000 0011100100010000000100000111101111110111001010111101000001110010

    0010000000100000111101111110111001010111101000001110010000010000 0111101111110111001010111101000001110010001000000010000011110111 1110111001010111101000001110010001000000010000011111101110010101 1110100000111001000100000001000001111011111101110010101111010000 0111001000100000001000001111011111101110010101111010000011100100 0100000001000001111011111101110010101111010000011100100010000000 1000001111011111101110010101111010000011100100010000000100000111 1011111101110010101111010000011100100010000000100000111101111110 1110010101111010000011100100010000000100000111101111110111001010 Potential rate of incoming data is 1PB per second
  5. 12 4 leptons with momenta above X GeV Invariant mass

    of lepton pairs above Y GeV Invariant mass of all leptons below Z GeV
  6. 13 X, Y and Z are optimised from Monte Carlo

    simulation… 4. Detector simulation 1. Matrix element calculation 2. Fragmentation 3. Hadronization
  7. A prescription is built up to represent the search criteria

    that is applied to each collision Particle collision recorded No noise bursts Data cleaning selection 4 leptons > 15 GeV Invariant mass > 60 GeV Missing energy < 30 GeV … …
  8. 1111011111011100101011110100000111001000100000001000001111011111 1011100101011110100000111001000100000001000001111011111101110010 1011110100000111001000100000010000011110111111011100101011110100 0001110010001000000010000011110111111011100101011110100000111001 0001000000100000111101111110111001010111101000001110010001000000 0100000111101111110111001010111101000001110010001000000010000011 1101111101110010101111010000011100100010000000100000111101111110 1110010101111010000011100100010000000100000111101111110111001010 1111010000011100100010000000100000111101111110111001010111101000 0011100100010000000100000111101111110111001010111101000001110010

    0010000000100000111101111110111001010111101000001110010000010000 0111101111110111001010111101000001110010001000000010000011110111 1110111001010111101000001110010001000000010000011111101110010101 1110100000111001000100000001000001111011111101110010101111010000 0111001000100000001000001111011111101110010101111010000011100100 0100000001000001111011111101110010101111010000011100100010000000 1000001111011111101110010101111010000011100100010000000100000111 1011111101110010101111010000011100100010000000100000111101111110 1110010101111010000011100100010000000100000111101111110111001010 Potential rate of incoming data is 1PB per second
  9. The first barrier of defence is a hardware trigger clocking

    at 25ns 40 MHz 100 kHz Particle collision At least 1 lepton
  10. This is backed up by a farm of software triggers

    running in real time 100 kHz 200 Hz 4 leptons >15 GeV
  11. The recorded data is sent to to a world wide

    computing grid where a gigantic map reduce is performed on the data ~10 PB per year
  12. Raw format 000100010101…. Hits format Time stamp Sensor ID 1372964258

    3479685 1372964258 5328950 1372964269 54389 Bit bashing Tracking and clustering ESD format Collision ID 5374890 Track container angle pointing to Track 1 50GeV v1 Track 2 42GeV v2 Cluster container angle some params Shower 1 45o … Shower 2 3o … Collision ID 5374890
  13. A particle has various likely characteristics > shower shapes, tracking,

    decay length …machine learning is used to identify different particles from one another Final data format Collision ID 5374890 Electron container energy angle Electron 1 50GeV 69o Electron 2 42GeV 89o Electron 3 42GeV 89o Muon container Muon 2 2GeV 24o Collision ID 5374891…
  14. The user runs parallel grid jobs on each event to

    select those with the right signatures Data flow of ~9 GB/s Data cleaning selection Invariant mass > 60GeV Missing energy < 30 GeV
  15. For selected events, the topological information is condensed into a

    cleverly chosen variable ….where the signal can be seen above the background Signal
  16. Comparing to theory The ‘photo’ of the Higgs decay products

    is blurred by the detector Luckily, we took plenty of shots. The distribution of images can be unfolded to regain the original distribution Observed = Resolution matrix R x Truth Truth = R-1 x Observed Inverting the matrix doesn’t converge! Need regularization, Monte Carlo training… …or otherwise use smearing to verify your simulation is correct