Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Let Neo4j chat with Python, it's easy!

Let Neo4j chat with Python, it's easy!

In this talk I am going to show some Neo4j use cases together with Python, from NLP analysis of academic publications topics to mobility analysis with Twitter. Finally I will present some tips to do “quick-and-dirty” and advanced analysis taking data out from Neo4j directly to Pandas, leading data analysis package for Pythonistas.

Fabio Lamanna

September 25, 2017
Tweet

More Decks by Fabio Lamanna

Other Decks in Programming

Transcript

  1. Let Neo4j chat with Python, 
 it’s easy! FABIO LAMANNA

    @fblamanna Data Science Milano MILANO, 25 SETTEMBRE 2017 larus-ba.it/neo4j @AgileLARUS
  2. ABOUT ME Ph.D. Transportation Engineer Freelance Civil Engineer working Urban

    Mobility, Traffic Science, Data Analysis (mainly on) NETWORKS Two years Post-doc contract in Spain working on 
 Twitter + Language Detection + Mobility now @ LARUS BUSINESS AUTOMATION
 working on data projects with Python and Neo4j
  3. ABOUT LARUS LARUS BUSINESS AUTOMATION • Founded in 2004 •

    Headquartered in Venice, ITALY • Delivering services Worldwide • Mission: “Bridging the gap between Business and IT” OUR SPECIALITIES • Consulting and Developing Solutions on the Latest Open-Source Technologies • Training and Coaching on Agile & Lean Methodologies • Custom Software Design and Development • Strong focus on Light-Weight Architectures and No-SQL Technologies
  4. LARUS HISTORY WITH NEO4J ITALY’S #1 OFFICIAL PARTNER SINCE 2014

    DELIVERING NEO4J CONSULTING WORLDWIDE 58 58
  5. LARUS HISTORY WITH NEO4J 2016 Neo4j JDBC Driver 2015 2011

    First Spikes in Retail for Articles’ Clustering 2014 2017 Neo4j APOC, ETL, GraphQL, Spark
  6. ABOUT LARUS CUSTOMER ACKNOWLEDGEMENT • What customers say about us

    - “Reliable”, “Competent”, “Enthusiast”
  7. ABOUT LARUS COLLABORATION WITH THE UNIVERSITY OF VENICE • LARUS

    is actively involved in some research projects
 and collateral trainings on BIG-DATA and NO-SQL topics • Students interested in graph theory and databases
 have their pre / post degree internships at LARUS [:COLLABORATE_WITH]
  8. NEO4J WORKSHOPS • TORINO, 6 Ottobre • MILANO, 9 Novembre

    • ROMA, 5 Dicembre • VENEZIA, 23 Novembre e 14 Dicembre
  9. Neo4j + Python: Case 1 Natural Language Processing in Digital

    Humanities Neo4j v 3.2.3 Main Python Packages: NLTK
 Pandas
 Pattern
 TextBlob
  10. Goals Find and categorize Topics in Academia Look up for

    Publications, Journals etc. by Topics Unveiling collaboration patterns among researchers Recommendations
  11. Airport Mobility with Twitter Data Neo4j v 3.2.3 Main Python

    Packages: Pandas
 py2neo Neo4j + Python: Case 2
  12. Goals Using new data sources for mobility The use of

    space within airports terminals Finding patterns in data
  13. Dataset Twitter users "overlapping" the 25 busiest airports in Europe

    
 in the last three years. Users that at least once are passing through an airport area, 
 emitting a tweet. Tracing users through consecutive locations back and forth in time.
  14. Data Management Twitter .json Stream Prepare files for Neo4j import

    tool due to massive amount of data .bin/neo4j-import --into <destination folder> \ --nodes:User users.csv \ --nodes:Loc locations.csv \ --nodes:Tweet tweets.csv \ --nodes:Airport airport.csv \ --relationships:VISITED rels-visited.csv \ --relationships:EMITTED_IN rels-emitted_in.csv \ --relationships:WRITES rels-writes.csv \ --relationships:IS_WITHIN rels-is_within.csv \ --relationships:NEXT rels-next.csv \ --delimiter "|"
  15. py2neo Python package to let you interact with Neo4j directly

    from Python Can replace (it’s easier!) the official Neo4j Python Driver Support for pandas DataFrames Interaction via Jupyter Notebook
  16. py2neo - How it works? # Import modules from py2neo

    import Graph import pandas as pd # Inizialize Graph, 
 # calling the instance to connect to our Neo4j database g = Graph() # Query for some data query = g.data("MATCH (m:Movie) RETURN m.title AS Title, m.releaseDate AS ReleaseDate")
  17. py2neo - How it works? # Build a dataframe with

    the results df = pd.DataFrame(query)
  18. py2neo - How it works? # Extract Month, Year and

    Weekday (Monday=0, Sunday=6) from Date df['Month'] = df['Date'].dt.month.astype(np.int) df['Year'] = df['Date'].dt.year.astype(np.int) df['Weekday'] = df['Date'].dt.dayofweek.astype(np.int)
  19. py2neo - How it works? # Plot the distribution of

    year of release sns.distplot(df['Year'], kde=True, rug=True, color="r")
  20. py2neo - How it works? In [11]: query = """

    ...: MATCH p=(:Set)<-[:IN_SET]-(s:Song)-[:PART_OF]->(:Concert)- [:IN]->(:Location) ...: WHERE s.name contains 'Ultraviolet' ...: RETURN p LIMIT 20 ...: """ In [12]: graph.data(query)
  21. py2neo - How it works? In [5]: graph.data(query) Out[5]: [{u'p':

    (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(e991d64)-[:IN]->(fd6d1be)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(a029d5f)-[:IN]->(e5fa8b3)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(fe3a7f3)-[:IN]->(ddf9b43)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(c37420f)-[:IN]->(f613b34)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(ecd9021)-[:IN]->(ac7ccfe)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(d68c490)-[:IN]->(ad2b199)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(f60ce94)-[:IN]->(d80258d)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(ab854b3)-[:IN]->(ceb2494)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(e05e9a8)-[:IN]->(a204538)}, …]
  22. py2neo - How it works? In [9]: query = """

    ...: MATCH (set:Set)<-[:IN_SET]-(s:Song)-[:PART_OF]->(c:Concert) ...: WHERE s.name contains 'Ultraviolet' ...: RETURN s.name, c.id, set.number ...: """ In [10]: DataFrame(graph.data(query)) Out[10]: c.id s.name set.number 0 3bd6f83c Ultraviolet (Light My Way) 5 1 23d6f833 Ultraviolet (Light My Way) 5 2 2bd6f836 Ultraviolet (Light My Way) 5 3 3bd6f834 Ultraviolet (Light My Way) 5 4 33d6f835 Ultraviolet (Light My Way) 5 5 2bd6f832 Ultraviolet (Light My Way) 5 6 3bd6f830 Ultraviolet (Light My Way) 5 7 33d6f831 Ultraviolet (Light My Way) 5 8 2bd6f82a Ultraviolet (Light My Way) 5 …
  23. Conclusions We need tools and packages for “quick-and-dirty” and advanced

    data analysis Python improves the power of Neo4j and can easily interact with it (both native and with py2neo)