Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A brief intro to Neo4j using 'The Wire' as a data set.

A brief intro to Neo4j using 'The Wire' as a data set.

A talk I gave to Belfast Ruby on the 4th June 2013.

Stephen McCullough

June 04, 2013
Tweet

More Decks by Stephen McCullough

Other Decks in Programming

Transcript

  1. @belfastruby
    Neo4j - A Brief
    Introduction
    Tuesday, 4 June 13

    View Slide

  2. @
    sw
    m
    cc
    LISTEN AT OWN RISK
    Stephen McCullough - www.swm.cc
    Tuesday, 4 June 13

    View Slide

  3. Open Source
    NOSQL - Datastore
    Graph Database
    WHAT IS
    NEO4J?
    Boring Generic
    Explanation
    ZZZZZZZZZZZZZZZZ
    Tuesday, 4 June 13

    View Slide

  4. SO WHAT
    CAN IT DO?
    Whiteboard Friendly
    Focuses on Relationships
    Can store billions of nodes
    Another Boring Generic
    Explanation
    ZZZZZZZZZZZZZZZZ
    Tuesday, 4 June 13

    View Slide

  5. WHAT YOU TALKIN...
    I’m a software engineer
    I’m not smart
    This talk is not in-depth
    If I can understand it you can!
    This talk might be wrong! ;)
    I show/teach by ‘doing’
    Tuesday, 4 June 13

    View Slide

  6. $ brew install neo4j
    $ ne04j start
    INSTALLING
    & RUNNING
    Read the f*cking
    README
    ZZZZZZZZZZZZZZ
    Tuesday, 4 June 13

    View Slide

  7. Generic Boring Ass
    Screen Shots
    ZZZZZZZZZZZZZZZZ
    Tuesday, 4 June 13

    View Slide

  8. WHAT IS A GRAPH DB?
    Composed of nodes (verticies) &
    relationships (edges)
    Completely different paradigm from SQL
    As stated previously it focuses on the
    relationship between values and the
    commonalities among a ‘set’ of values
    Tuesday, 4 June 13

    View Slide

  9. FOR EXAMPLE
    Homer Simpson (is friends with) Kenny &
    Carl.
    Kenny (is friends with) Carl
    Tuesday, 4 June 13

    View Slide

  10. NODES
    A node is a “vertex between edges that may
    hold data”. I call it “a wee box of data stores”.
    [name: Homer]
    [name: Lenny]
    [name: Carl]
    Tuesday, 4 June 13

    View Slide

  11. WEB INTERFACE
    You a powerful web interface that can add new ‘nodes’
    and allow you to put data in.
    Tuesday, 4 June 13

    View Slide

  12. WEB INTERFACE
    You can put in a relationship between the ‘nodes’
    Tuesday, 4 June 13

    View Slide

  13. WEB INTERFACE
    You can also view the relationships between the nodes
    Tuesday, 4 June 13

    View Slide

  14. WEB INTERFACE
    You can also change the layout properties
    Tuesday, 4 June 13

    View Slide

  15. MORE
    POWER
    NEEDED
    You will require more power if
    you ever want to use this in a
    production environment.
    Tuesday, 4 June 13

    View Slide

  16. NEO4J VIA XXXX
    Rails - neography/neoid
    Ruby - Neography/Keymaker
    PHP/.net/python/node/perl
    http://www.neo4j.org/develop/drivers
    Tuesday, 4 June 13

    View Slide

  17. CYPHER
    Cypher is a graph query language supported
    by Neo4j.
    Based on pattern matching and a SQL-like
    syntax.
    ..... and I didn’t use it - much.
    Tuesday, 4 June 13

    View Slide

  18. GROOVY - GREMLIN
    I know this is a Ruby talk but I did this on my
    spare time and I wanted to “mess about”
    As much as I hate to admit it - I am
    becoming a fan of the ‘jvm’.
    I feel it gives you a more intimate/natural
    access. Deal with it ;)
    Tuesday, 4 June 13

    View Slide

  19. GROOVY -
    GREMLIN
    Gremlin is a DSL for Groovy.
    It is a handy (for me anyway)
    way to get to grips with
    graph databases. Your
    mileage may vary.
    Tuesday, 4 June 13

    View Slide

  20. CONSOLE
    For more powerful uses you will need to use a console
    Tuesday, 4 June 13

    View Slide

  21. GREMLIN
    CONSOLE
    As you can see... It uses
    proper math terms. Which is
    no good to a thick fuck like
    me. It also isn’t the fastest on
    my mac however I wanted to
    learn.
    Tuesday, 4 June 13

    View Slide

  22. Adding new boxes and relationships is easy
    GREMLIN CONSOLE
    Tuesday, 4 June 13

    View Slide

  23. THE WIRE
    Without doubt the best tv show ever
    Tuesday, 4 June 13

    View Slide

  24. THE WIRE
    The Law
    The Politicians
    The Street
    The Schools
    The Docks
    The Paper
    Tuesday, 4 June 13

    View Slide

  25. THE NODES - THE WIRE
    All have common attributes:
    Name
    First / Last Appearance
    Title
    Section
    Tuesday, 4 June 13

    View Slide

  26. JOE
    “PROP”
    STEWART
    name: 'Joe Stewart',
    alias: 'Prop Joe',
    first_appearance: '1.09',
    last_appearance: '5.04',
    section: 'The Street'
    Tuesday, 4 June 13

    View Slide

  27. RUSSELL
    “STRINGER”
    BELL
    name: 'Russell Bell',
    alias: 'Stringer',
    first_appearance: '1.01',
    last_appearance: '3.12',
    section: 'The Street'
    Tuesday, 4 June 13

    View Slide

  28. CEDRIC
    DANIELS
    name: 'Cedric Daniels',
    alias: 'Cedric',
    first_appearance: '1.01',
    last_appearance: '5.10',
    section: 'The Law'
    Tuesday, 4 June 13

    View Slide

  29. GREMLIN - ADD
    Multiple Additions of the data set - (such as it is).
    Tuesday, 4 June 13

    View Slide

  30. RELATIONSHIPS
    Now to add in the relationships between the characters
    Tuesday, 4 June 13

    View Slide

  31. DASHBOARD
    You can see a quick overview of what happened
    Tuesday, 4 June 13

    View Slide

  32. RESTFUL API
    The restful API allows you to create nodes and query
    Tuesday, 4 June 13

    View Slide

  33. RESTFUL
    API
    Just like other NOSQL you
    can query the db via REST
    Tuesday, 4 June 13

    View Slide

  34. REST + GREMLIN
    Neo4j comes with a REST + GREMLIN plugin
    Used together gives a potent tool to use
    Neo4j with.
    Tuesday, 4 June 13

    View Slide

  35. THE NETWORK GRAPH
    Tuesday, 4 June 13

    View Slide

  36. I RAN OUT OF TIME
    I wanted to do more
    Tuesday, 4 June 13

    View Slide

  37. WHAT I WANTED TO DO
    Show the relationships between the
    characters on a personal and professional
    level.
    Create a database that showed (given one
    episode) who talked to who and in how it
    added to the story.
    Show that Clay Davies is in “everyones shit”.
    Tuesday, 4 June 13

    View Slide

  38. WHAT I WILL DO
    On my spare time and reporting back on my
    blog (http://blog.swm.cc) - I will continue on
    with this.
    Would be great to have an engine that
    allowed us to detail every scene of the “street
    involving Bodie and see who he has
    interacted with”
    Tuesday, 4 June 13

    View Slide

  39. END GAME
    Neo4j is a very interesting bit of kit.
    It can be used for more than building.
    recommendation lists.
    Very powerful for ‘Big Data’.
    Distributed High Availability.
    Tuesday, 4 June 13

    View Slide

  40. NEO4J STRENGTHS
    Graph databases are great for unstructured
    data. Was hoping to go back and show
    adding in ‘rank’ to ‘the street’ and ‘the law’
    nodes but leaving it out of the rest.
    No constraints on data - it really is a ‘free for
    all’.
    Tuesday, 4 June 13

    View Slide

  41. NEO4J STRENGTHS
    According to the documentation it can handle
    34.4 billion nodes and 34.4 billion
    relationshops.
    Integrates with Lucene and has many
    extensions.
    Tuesday, 4 June 13

    View Slide

  42. NEO4J WEAKNESS
    If you aren’t used to modelling ‘graph data’
    like I was you might find it hard to get your
    head round the concept. To be honest I am
    not sure if I am there.
    Tuesday, 4 June 13

    View Slide

  43. WHAT I WOULD LIKE
    As a community I would like someone here
    today to take what I have done here and
    create a ‘ruby’ extension to query the data.
    Everything will be available on my github
    account which you can get from my site
    (http://swm.cc)
    Tuesday, 4 June 13

    View Slide