Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Traversing the Academic Graph

Traversing the Academic Graph

Graphs are a powerful abstraction in many highly connected domains. Social networks are an obvious example, but less obvious include biology and security. Scientific publishing is one of those domains. Citation and co-authorship in research form a huge graph. I'll share my experience using just such a graph to power our academic search engine, http://scholr.ly. I'll focus on how we use a graph database to solve a number of problems, including search, disambiguation, and recommendations.

A rough script of the presentation can be found at http://mattluongo.com/post/presentation-at-datadaytexas.

Matt Luongo

March 30, 2013
Tweet

Other Decks in Technology

Transcript

  1. A comment on query examples • Cypher • Declarative •

    SQL-like • Easy, smooth pattern matching • Neo4j only • Gremlin • DSL atop JVM languages like Groovy • Lower-level, but more powerful • Cross-database
  2. Similar Profiles START author=node(123) MATCH author-[:wrote]->(work)<-[:wrote]-(coauthor) \ -[:wrote]->(other_work) \ <-[:wrote]-(second_coauthor)

    WITH author, second_coauthor, COUNT(coauthor) AS shared_coauthors ORDER BY shared_coauthors RETURN second_coauthor
  3. Entity Resolution How do we reconcile different data sources? What

    happens when people share names? How do we know who's who?
  4. E. Agichtein and L. Gravano. Snowball: Extracting relations from large

    plain-text collections. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000.
  5. E. Agichtein and L. Gravano. Snowball: Extracting relations from large

    plain-text collections. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000.
  6. E. Agichtein and L. Gravano. Snowball: Extracting relations from large

    plain-text collections. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000.
  7. E. Agichtein and L. Gravano. Snowball: Extracting relations from large

    plain-text collections. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000.
  8. Entity Resolution votes = [:] g.v(clusterIds).out(‘clusters’).map.each{ properties -> properties.each{ votes[it.key]

    = (votes[it.key] ?: [:]) votes[it.key][it.value] = \ (votes[it.key][it.value] ?: 0) + 1 } } newClusterProperties = votes.collectEntries{prop, valueVotes -> [prop, valueVotes.sort{-it.value}[0].key] }
  9. Search We'd like to show the expected publication results on

    the left. On the right, we want to show influencers based on the publication results.
  10. Search authorCounts = [:] coauthorCounts = [:] g.v(publicationIds).in(‘WROTE’) \ .groupCount(authorCounts).out(‘WROTE’)

    \ .in(‘WROTE’).groupCount(coauthorCounts) \ .iterate() // IMAGINE - poor man’s “histogram” totalAuthorCounts = [:] return totalAuthorCounts.sort(-it.value}
  11. April 15th – Austin Graph DB Meetup April 16th –

    Austin Neo4j Tutorial Upcoming Events More details at meetup.com/graph-database-austin
  12. Bibliography Nicholas Menghini and Alex Fuller from the Noun Project

    – thanks for the icons! TinkerPop – thanks for the graphic!