Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Graph Capabilities in the Elastic Stack

Elastic Co
February 18, 2016

Graph Capabilities in the Elastic Stack

If you've ever wanted to provide recommendations or identify the behavior of malicious actors, this is the session for you. Come learn about how Elasticsearch can power graph exploration at scale by collecting signals like clicks or purchases to identify meaningful connections between subjects on-the-fly.

Elastic Co

February 18, 2016
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. ‹#› Mark Harwood Software Engineer @elasticmark Graph capabilities in the

    Elastic Stack Steve Kearns Sr. Director, Product Management @skearns64
  2. Data is not Flat 3 Much like the world "_source":

    { "created_at": "Tuesday Feb 16 02:10:52 +0000 2016", "text": "Snow can't keep me from #ElasticON!", "user": { "name": "Steve Kearns", "screen_name": "skearns64", "location": "Boston, MA", }, "hashtags": [{"text": “elastcon”}]. "lang": "en", "@timestamp": "2016-02-16T02:09:52.000Z", }
  3. Relationships live in our data 4 • Direct: one document

    references multiple entities "user": { "screen_name": "skearns64", "location": "Boston, MA", } "user": { "screen_name": "skearns64", "location": "Boston, MA", } "user": { "screen_name": "2muchsnow", "location": "Boston, MA", } • Indirect: Two or more documents share a reference
  4. Fraud Detection • Given credit card purchase histories.. • Where

    did people with fraudulent purchases shop most often? • Which vendor is responsible for stolen credit card numbers? • Given car emissions data… • Which car manufacturer fails emissions tests most often? • At which shops? 9
  5. Identifying Relationships • Given Wikipedia… • What topics / entities

    / locations are meaningfully related? • Given network traffic data… • What external IPs do machines on my network talk to? 10
  6. Recommendations • Given my purchase history… • What am I

    most likely to buy next? • Given Last.FM music preferences… • What music do people who like Mozart also like? • Given search and click data.. • What results do people who searched for “mixer” tend to click on? 11
  7. ‹#› …There’s no limit to how complicated things can get,

    on account of one thing always leading to another… E.B. White American essayist, columnist, poet and editor
  8. Theoretical Challenges with Graph Technology • Zipf’s Law results in

    super-connected entities • Super connected entities make graph exploration difficult • Graph exploration is typically done by “most frequent” connections 13
  9. Operational Challenges with Graph Technology • Where does data live

    naturally? In what structure? • Flexibility vs. complexity of query language (cypher, SPARQL) • Indexing speed, scale, query-speed, near-real time 14
  10. Our Advantage: Information Retrieval Techniques • When indexing data, we

    count and calculate key statistics • Using these statistics in new ways, we can bring relevance to relationships • Identify links/properties of an entity or group that are different from global averages • Aggregations enable efficient scale 16
  11. Guide Graph Exploration with Relevance • Follow links not by

    count, but by relevance • Don’t skip super connected entities, account for them! • Recognize that this won’t work in all cases ☺ 18
  12. Simple API that combines Search and Graph Techniques • Simple

    graph-walking API • Leverages full Elasticsearch query language • Relevance or count-based • Explore your existing indexes • Distributed query execution • Near-real-time data availability 20
  13. Simple API that combines Search and Graph Techniques 21 GET

    /wikipedia/_graph/explore { "query": { "query_string": { "query": "Jack Johnson” } }, "vertices": [{ "field": “artists.raw” }], "connections": { "vertices": [{ "field": “artists.raw" }] } }
  14. Random samples should hold no surprises 24 Dull.  But  in

     non-­‐random  samples  something  interesting  happens • 17% of all people like “Forest Gump” • In a random sample, 17% will also like “Forrest Gump”
  15. Non-random sample: people who liked “Talladega nights” Body copy here

    25 Find  all  people  who   liked  movie  #46970 Summarise  how  their   movie  tastes  differ   from  everyone  else <0.5%  of  all  people  like  “Anchorman”   In  the  set  of  “Talladega-­‐likers”,  20%  of   them  like  “Anchorman”   ..a  huge  uplift  in  popularity  from  the   norm!   % of Talladega fans who liked movie