Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fighting CryptoCurrency Crime with Scala by Ben Simpson

Shannon
November 29, 2018

Fighting CryptoCurrency Crime with Scala by Ben Simpson

An overview of Scala methodologies we use to detect and prevent criminal activities within the BlockChain.

Shannon

November 29, 2018
Tweet

More Decks by Shannon

Other Decks in Technology

Transcript

  1. 2 Preventing And Detecting Criminal Activity In Cryptocurrencies. We identify

    illicit activity in cryptocurrencies by providing actionable intelligence to cryptocurrency companies, financial institutions and government agencies.
  2. 3 How we do this • What is the source

    of funds for this transaction?
  3. 4 How we do this • Where are the funds

    being moved to? • What is the source of funds for this transaction?
  4. 5 How we do this Wikileaks Unlabelled AlphaBay Wikipedia Mt.

    Gox • Where are the funds being moved to? • Who controls the funds? • What is the source of funds for this transaction?
  5. 6 We build a large graph that sheds light on

    the activity of dark markets, scams, gambling... Satoshi Nakamoto https://www.elliptic.co/data-visualizations
  6. Challenges Size of data set Calculating Risk Scores • Using

    graph traversal algorithms on highly connected graphs • May need to traverse 10s of hops on the graph to calculate a score • Spikes of tens of thousands of risk calculations from our customers (fulfilled at 200/min) 9 350M Ethereum transfers 70M Ethereum addresses 360M Bitcoin transfers 460M Bitcoin addresses Totals: ~1B nodes, ~3B edges
  7. Why Scala? Existing Java Codebase 1 0 ETL Tooling Shared

    Knowledge and tools with Data Science Team
  8. Why Scala? Existing Java Codebase 11 In-memory, high performance, Parallel,

    graph engine Connected components algorithm on the Bitcoin address-graph projection (460M vertices, 741M edges)
  9. Why Scala? Existing Java Codebase 13 * RAM used on

    a machine with 250GB ** https://www.slideshare.net/databricks/webscale-graph-analytics-with-apache-spark-with-tim-hunter (2016) Elliptic graph engine Apache GraphFrames** Hardware 32 CPUs ~20GB RAM used x16 AWS r3.4xlarge 256 CPU 1.9TB RAM Cost 500 $/month (250GB RAM machine) 15,000 $/month 5,000 $/month (3y upfront) Runtime ~14min 4min (6min on GraphX) Dynamic Graph (incremental) yes No (issues/252) Sample Graph: twitter-2010 42M vertices, 1.4B edges Graph algo: connected components
  10. • We make use of parallel collections from the standard

    library How Data Science team uses Scala? +
  11. • We make use of parallel collections from the standard

    library How Data Science team uses Scala? +
  12. Machine Learning with a SMILE :-) filter by transactions of

    interest map into features and labels take a predefined number of training points • Building a training Machine Learning applications… the lazy way with Stream! + +
  13. • Written in Java and Scala • Includes classical Machine

    Learning algorithms for Classification, Regression and more (Logistic Regression, Random Forest, SVM, Lasso/Ridge Regression, Dimensionality Reduction, Clustering, …) Machine Learning with a SMILE :-) Running Time [s] train model get accuracy of trained model predict from trained model get features and labels https://haifengl.github.io/smile/index.html Easy to use Fast + +
  14. • Scala ecosystem is more mature than you would expect

    • Scala allows Engineers and Data Scientists to speak the same language within a unified ecosystem • Choose your tools wisely: in-house, high-performance solutions can give better long-term returns • We love the parallel collections of the standard library! Take home
  15. WE ARE HIRING! ◦ Full Stack Web Engineers ◦ Data

    Engineers ◦ Dev-Ops Engineers ◦ Data Scientists ◦ Internships - technical and non-technical are possible www.elliptic.co/careers