Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fighting CryptoCurrency Crime with Scala by Ben Simpson

Shannon
November 29, 2018

Fighting CryptoCurrency Crime with Scala by Ben Simpson

An overview of Scala methodologies we use to detect and prevent criminal activities within the BlockChain.

Shannon

November 29, 2018
Tweet

More Decks by Shannon

Other Decks in Technology

Transcript

  1. Ben Simpson - Software Engineer
    Moreno Bonaventura - Data Scientist

    View full-size slide

  2. 2
    Preventing And Detecting
    Criminal Activity In
    Cryptocurrencies.
    We identify illicit activity in cryptocurrencies
    by providing actionable intelligence to
    cryptocurrency companies, financial
    institutions and government agencies.

    View full-size slide

  3. 3
    How we do this
    ● What is the
    source of
    funds for this
    transaction?

    View full-size slide

  4. 4
    How we do this
    ● Where are
    the funds
    being moved
    to?
    ● What is the
    source of
    funds for this
    transaction?

    View full-size slide

  5. 5
    How we do this
    Wikileaks
    Unlabelled
    AlphaBay
    Wikipedia
    Mt.
    Gox
    ● Where are
    the funds
    being moved
    to?
    ● Who controls
    the funds?
    ● What is the
    source of
    funds for this
    transaction?

    View full-size slide

  6. 6
    We build a large graph that sheds light on the activity of dark markets, scams, gambling...
    Satoshi
    Nakamoto
    https://www.elliptic.co/data-visualizations

    View full-size slide

  7. 7
    Case study - Ransomware
    https://www.theinquirer.net/inquirer/news/3064515/wannacry-attack-cost-cash-strapped-nhs-an-estimated-gbp92m

    View full-size slide

  8. 8
    Case study - Ransomware (continued)
    https://www.bbc.co.uk/news/world-us-canada-42407488

    View full-size slide

  9. Challenges
    Size of data set
    Calculating Risk Scores
    ● Using graph traversal algorithms on highly connected graphs
    ● May need to traverse 10s of hops on the graph to calculate a score
    ● Spikes of tens of thousands of risk calculations from our customers (fulfilled at 200/min)
    9
    350M Ethereum transfers
    70M Ethereum addresses
    360M Bitcoin transfers
    460M Bitcoin addresses
    Totals: ~1B nodes, ~3B edges

    View full-size slide

  10. Why Scala?
    Existing Java Codebase
    1
    0
    ETL Tooling
    Shared Knowledge and tools
    with Data Science Team

    View full-size slide

  11. Why Scala?
    Existing Java Codebase
    11
    In-memory, high performance,
    Parallel, graph engine
    Connected components algorithm
    on the Bitcoin address-graph projection
    (460M vertices, 741M edges)

    View full-size slide

  12. Why Scala?
    Existing Java Codebase
    12
    One-machine vs. distributed

    View full-size slide

  13. Why Scala?
    Existing Java Codebase
    13
    * RAM used on a machine with 250GB
    ** https://www.slideshare.net/databricks/webscale-graph-analytics-with-apache-spark-with-tim-hunter (2016)
    Elliptic graph engine Apache GraphFrames**
    Hardware
    32 CPUs
    ~20GB RAM used
    x16 AWS r3.4xlarge
    256 CPU
    1.9TB RAM
    Cost 500 $/month
    (250GB RAM machine)
    15,000 $/month
    5,000 $/month (3y upfront)
    Runtime ~14min 4min (6min on GraphX)
    Dynamic Graph
    (incremental)
    yes No (issues/252)
    Sample Graph: twitter-2010 42M vertices, 1.4B edges
    Graph algo: connected components

    View full-size slide

  14. Why Scala?
    Existing Java Codebase
    14
    ETL Tooling
    Shared Knowledge and tools
    with Data Science Team

    View full-size slide

  15. • We make use of parallel collections from the standard library
    How Data Science team uses Scala?
    +

    View full-size slide

  16. • We make use of parallel collections from the standard library
    How Data Science team uses Scala?
    +

    View full-size slide

  17. Machine Learning with a SMILE :-)
    filter by transactions of interest
    map into features and labels
    take a predefined number of training points
    • Building a training Machine Learning applications… the lazy way with Stream!
    + +

    View full-size slide

  18. • Written in Java and Scala
    • Includes classical Machine Learning algorithms for Classification, Regression and more (Logistic Regression,
    Random Forest, SVM, Lasso/Ridge Regression, Dimensionality Reduction, Clustering, …)
    Machine Learning with a SMILE :-)
    Running Time [s]
    train model
    get accuracy of trained model
    predict from trained model
    get features and labels
    https://haifengl.github.io/smile/index.html
    Easy to use
    Fast
    + +

    View full-size slide

  19. Exploratory analysis with vegas
    https://github.com/vegas-viz/Vegas
    + + +

    View full-size slide

  20. ● Scala ecosystem is more mature than you would expect
    ● Scala allows Engineers and Data Scientists to speak the same language within a unified
    ecosystem
    ● Choose your tools wisely: in-house, high-performance solutions can give better long-term
    returns
    ● We love the parallel collections of the standard library!
    Take home

    View full-size slide

  21. WE ARE HIRING!
    ○ Full Stack Web Engineers
    ○ Data Engineers
    ○ Dev-Ops Engineers
    ○ Data Scientists
    ○ Internships - technical and non-technical are possible
    www.elliptic.co/careers

    View full-size slide

  22. www.elliptic.co

    View full-size slide