Slide 1

Slide 1 text

‹#› Mark Harwood Software Engineer @elasticmark Graph capabilities in the Elastic Stack Steve Kearns Sr. Director, Product Management @skearns64

Slide 2

Slide 2 text

2 The Origin Story

Slide 3

Slide 3 text

Data is not Flat 3 Much like the world "_source": { "created_at": "Tuesday Feb 16 02:10:52 +0000 2016", "text": "Snow can't keep me from #ElasticON!", "user": { "name": "Steve Kearns", "screen_name": "skearns64", "location": "Boston, MA", }, "hashtags": [{"text": “elastcon”}]. "lang": "en", "@timestamp": "2016-02-16T02:09:52.000Z", }

Slide 4

Slide 4 text

Relationships live in our data 4 • Direct: one document references multiple entities "user": { "screen_name": "skearns64", "location": "Boston, MA", } "user": { "screen_name": "skearns64", "location": "Boston, MA", } "user": { "screen_name": "2muchsnow", "location": "Boston, MA", } • Indirect: Two or more documents share a reference

Slide 5

Slide 5 text

Let’s visualize some relationships

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

8 What is Graph Technology Good for?

Slide 9

Slide 9 text

Fraud Detection • Given credit card purchase histories.. • Where did people with fraudulent purchases shop most often? • Which vendor is responsible for stolen credit card numbers? • Given car emissions data… • Which car manufacturer fails emissions tests most often? • At which shops? 9

Slide 10

Slide 10 text

Identifying Relationships • Given Wikipedia… • What topics / entities / locations are meaningfully related? • Given network traffic data… • What external IPs do machines on my network talk to? 10

Slide 11

Slide 11 text

Recommendations • Given my purchase history… • What am I most likely to buy next? • Given Last.FM music preferences… • What music do people who like Mozart also like? • Given search and click data.. • What results do people who searched for “mixer” tend to click on? 11

Slide 12

Slide 12 text

‹#› …There’s no limit to how complicated things can get, on account of one thing always leading to another… E.B. White American essayist, columnist, poet and editor

Slide 13

Slide 13 text

Theoretical Challenges with Graph Technology • Zipf’s Law results in super-connected entities • Super connected entities make graph exploration difficult • Graph exploration is typically done by “most frequent” connections 13

Slide 14

Slide 14 text

Operational Challenges with Graph Technology • Where does data live naturally? In what structure? • Flexibility vs. complexity of query language (cypher, SPARQL) • Indexing speed, scale, query-speed, near-real time 14

Slide 15

Slide 15 text

‹#› Information Retrieval to the Rescue 15

Slide 16

Slide 16 text

Our Advantage: Information Retrieval Techniques • When indexing data, we count and calculate key statistics • Using these statistics in new ways, we can bring relevance to relationships • Identify links/properties of an entity or group that are different from global averages • Aggregations enable efficient scale 16

Slide 17

Slide 17 text

‹#› We didn’t steal your credit card information Steve Kearns

Slide 18

Slide 18 text

Guide Graph Exploration with Relevance • Follow links not by count, but by relevance • Don’t skip super connected entities, account for them! • Recognize that this won’t work in all cases ☺ 18

Slide 19

Slide 19 text

19 What have we built?

Slide 20

Slide 20 text

Simple API that combines Search and Graph Techniques • Simple graph-walking API • Leverages full Elasticsearch query language • Relevance or count-based • Explore your existing indexes • Distributed query execution • Near-real-time data availability 20

Slide 21

Slide 21 text

Simple API that combines Search and Graph Techniques 21 GET /wikipedia/_graph/explore { "query": { "query_string": { "query": "Jack Johnson” } }, "vertices": [{ "field": “artists.raw” }], "connections": { "vertices": [{ "field": “artists.raw" }] } }

Slide 22

Slide 22 text

Simple UI to Explore Your Data in New Ways 22

Slide 23

Slide 23 text

‹#› Mark Harwood is a smart dude. He’ll explain the details Steve Kearns

Slide 24

Slide 24 text

Random samples should hold no surprises 24 Dull.  But  in  non-­‐random  samples  something  interesting  happens • 17% of all people like “Forest Gump” • In a random sample, 17% will also like “Forrest Gump”

Slide 25

Slide 25 text

Non-random sample: people who liked “Talladega nights” Body copy here 25 Find  all  people  who   liked  movie  #46970 Summarise  how  their   movie  tastes  differ   from  everyone  else <0.5%  of  all  people  like  “Anchorman”   In  the  set  of  “Talladega-­‐likers”,  20%  of   them  like  “Anchorman”   ..a  huge  uplift  in  popularity  from  the   norm!   % of Talladega fans who liked movie