In a mostly functional language like OCaml, it is desirable to have each domain (our unit of parallelism) collect its own local garbage independently. Given that OCaml is commonly used for writing latency sensitive code such as trading systems, UIs, networked unikernels, it is also desirable to minimise the stop-the-world phases in the GC. Although obvious, the difficulty is to make this work in the presence of mutations. In this talk, we will present the overall design of Multicore OCaml GC, but also deep dive into a few of the interesting techniques that make it work.