Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A brief intro to Neo4j using 'The Wire' as a data set.

A brief intro to Neo4j using 'The Wire' as a data set.

A talk I gave to Belfast Ruby on the 4th June 2013.

Stephen McCullough

June 04, 2013
Tweet

More Decks by Stephen McCullough

Other Decks in Programming

Transcript

  1. @ sw m cc LISTEN AT OWN RISK Stephen McCullough

    - www.swm.cc Tuesday, 4 June 13
  2. Open Source NOSQL - Datastore Graph Database WHAT IS NEO4J?

    Boring Generic Explanation ZZZZZZZZZZZZZZZZ Tuesday, 4 June 13
  3. SO WHAT CAN IT DO? Whiteboard Friendly Focuses on Relationships

    Can store billions of nodes Another Boring Generic Explanation ZZZZZZZZZZZZZZZZ Tuesday, 4 June 13
  4. WHAT YOU TALKIN... I’m a software engineer I’m not smart

    This talk is not in-depth If I can understand it you can! This talk might be wrong! ;) I show/teach by ‘doing’ Tuesday, 4 June 13
  5. $ brew install neo4j $ ne04j start INSTALLING & RUNNING

    Read the f*cking README ZZZZZZZZZZZZZZ Tuesday, 4 June 13
  6. WHAT IS A GRAPH DB? Composed of nodes (verticies) &

    relationships (edges) Completely different paradigm from SQL As stated previously it focuses on the relationship between values and the commonalities among a ‘set’ of values Tuesday, 4 June 13
  7. FOR EXAMPLE Homer Simpson (is friends with) Kenny & Carl.

    Kenny (is friends with) Carl Tuesday, 4 June 13
  8. NODES A node is a “vertex between edges that may

    hold data”. I call it “a wee box of data stores”. [name: Homer] [name: Lenny] [name: Carl] Tuesday, 4 June 13
  9. WEB INTERFACE You a powerful web interface that can add

    new ‘nodes’ and allow you to put data in. Tuesday, 4 June 13
  10. MORE POWER NEEDED You will require more power if you

    ever want to use this in a production environment. Tuesday, 4 June 13
  11. CYPHER Cypher is a graph query language supported by Neo4j.

    Based on pattern matching and a SQL-like syntax. ..... and I didn’t use it - much. Tuesday, 4 June 13
  12. GROOVY - GREMLIN I know this is a Ruby talk

    but I did this on my spare time and I wanted to “mess about” As much as I hate to admit it - I am becoming a fan of the ‘jvm’. I feel it gives you a more intimate/natural access. Deal with it ;) Tuesday, 4 June 13
  13. GROOVY - GREMLIN Gremlin is a DSL for Groovy. It

    is a handy (for me anyway) way to get to grips with graph databases. Your mileage may vary. Tuesday, 4 June 13
  14. GREMLIN CONSOLE As you can see... It uses proper math

    terms. Which is no good to a thick fuck like me. It also isn’t the fastest on my mac however I wanted to learn. Tuesday, 4 June 13
  15. THE WIRE The Law The Politicians The Street The Schools

    The Docks The Paper Tuesday, 4 June 13
  16. THE NODES - THE WIRE All have common attributes: Name

    First / Last Appearance Title Section Tuesday, 4 June 13
  17. JOE “PROP” STEWART name: 'Joe Stewart', alias: 'Prop Joe', first_appearance:

    '1.09', last_appearance: '5.04', section: 'The Street' Tuesday, 4 June 13
  18. RUSSELL “STRINGER” BELL name: 'Russell Bell', alias: 'Stringer', first_appearance: '1.01',

    last_appearance: '3.12', section: 'The Street' Tuesday, 4 June 13
  19. GREMLIN - ADD Multiple Additions of the data set -

    (such as it is). Tuesday, 4 June 13
  20. REST + GREMLIN Neo4j comes with a REST + GREMLIN

    plugin Used together gives a potent tool to use Neo4j with. Tuesday, 4 June 13
  21. WHAT I WANTED TO DO Show the relationships between the

    characters on a personal and professional level. Create a database that showed (given one episode) who talked to who and in how it added to the story. Show that Clay Davies is in “everyones shit”. Tuesday, 4 June 13
  22. WHAT I WILL DO On my spare time and reporting

    back on my blog (http://blog.swm.cc) - I will continue on with this. Would be great to have an engine that allowed us to detail every scene of the “street involving Bodie and see who he has interacted with” Tuesday, 4 June 13
  23. END GAME Neo4j is a very interesting bit of kit.

    It can be used for more than building. recommendation lists. Very powerful for ‘Big Data’. Distributed High Availability. Tuesday, 4 June 13
  24. NEO4J STRENGTHS Graph databases are great for unstructured data. Was

    hoping to go back and show adding in ‘rank’ to ‘the street’ and ‘the law’ nodes but leaving it out of the rest. No constraints on data - it really is a ‘free for all’. Tuesday, 4 June 13
  25. NEO4J STRENGTHS According to the documentation it can handle 34.4

    billion nodes and 34.4 billion relationshops. Integrates with Lucene and has many extensions. Tuesday, 4 June 13
  26. NEO4J WEAKNESS If you aren’t used to modelling ‘graph data’

    like I was you might find it hard to get your head round the concept. To be honest I am not sure if I am there. Tuesday, 4 June 13
  27. WHAT I WOULD LIKE As a community I would like

    someone here today to take what I have done here and create a ‘ruby’ extension to query the data. Everything will be available on my github account which you can get from my site (http://swm.cc) Tuesday, 4 June 13