Slide 1

Slide 1 text

Let Neo4j chat with Python, 
 it’s easy! FABIO LAMANNA @fblamanna Data Science Milano MILANO, 25 SETTEMBRE 2017 larus-ba.it/neo4j @AgileLARUS

Slide 2

Slide 2 text

ABOUT ME Ph.D. Transportation Engineer Freelance Civil Engineer working Urban Mobility, Traffic Science, Data Analysis (mainly on) NETWORKS Two years Post-doc contract in Spain working on 
 Twitter + Language Detection + Mobility now @ LARUS BUSINESS AUTOMATION
 working on data projects with Python and Neo4j

Slide 3

Slide 3 text

ABOUT LARUS LARUS BUSINESS AUTOMATION • Founded in 2004 • Headquartered in Venice, ITALY • Delivering services Worldwide • Mission: “Bridging the gap between Business and IT” OUR SPECIALITIES • Consulting and Developing Solutions on the Latest Open-Source Technologies • Training and Coaching on Agile & Lean Methodologies • Custom Software Design and Development • Strong focus on Light-Weight Architectures and No-SQL Technologies

Slide 4

Slide 4 text

LARUS HISTORY WITH NEO4J ITALY’S #1 OFFICIAL PARTNER SINCE 2014 DELIVERING NEO4J CONSULTING WORLDWIDE 58 58

Slide 5

Slide 5 text

LARUS HISTORY WITH NEO4J 2016 Neo4j JDBC Driver 2015 2011 First Spikes in Retail for Articles’ Clustering 2014 2017 Neo4j APOC, ETL, GraphQL, Spark

Slide 6

Slide 6 text

ABOUT LARUS OFFICIAL APOC MANTAINER

Slide 7

Slide 7 text

ABOUT LARUS CUSTOMER ACKNOWLEDGEMENT • What customers say about us - “Reliable”, “Competent”, “Enthusiast”

Slide 8

Slide 8 text

ABOUT LARUS COLLABORATION WITH THE UNIVERSITY OF VENICE • LARUS is actively involved in some research projects
 and collateral trainings on BIG-DATA and NO-SQL topics • Students interested in graph theory and databases
 have their pre / post degree internships at LARUS [:COLLABORATE_WITH]

Slide 9

Slide 9 text

NEO4J SERVICES TRAINING CONSULTING SOFTWARE DEVELOPMENT

Slide 10

Slide 10 text

NEO4J WORKSHOPS • TORINO, 6 Ottobre • MILANO, 9 Novembre • ROMA, 5 Dicembre • VENEZIA, 23 Novembre e 14 Dicembre

Slide 11

Slide 11 text

Outline Developing data projects with Neo4j How to work seamlessly with Neo4j and Python

Slide 12

Slide 12 text

Working Schemes (1/2) RAW DATA

Slide 13

Slide 13 text

Working Schemes (2/2) RAW DATA

Slide 14

Slide 14 text

Neo4j + Python: Case 1 Natural Language Processing in Digital Humanities Neo4j v 3.2.3 Main Python Packages: NLTK
 Pandas
 Pattern
 TextBlob

Slide 15

Slide 15 text

Goals Find and categorize Topics in Academia Look up for Publications, Journals etc. by Topics Unveiling collaboration patterns among researchers Recommendations

Slide 16

Slide 16 text

Raw Data Data extraction from internal sources or API Usually big .csv files (or even .xls!)

Slide 17

Slide 17 text

Language Detection TextBlob + Google API TextBlob(string).language_detect() https://github.com/sloria/textblob

Slide 18

Slide 18 text

Part-Of-Speech Tagging

Slide 19

Slide 19 text

Neo4j Data Ingestion Cleaned .csv files using LOAD CSV Neo4j Import Tool

Slide 20

Slide 20 text

Neo4j Data Model

Slide 21

Slide 21 text

Neo4j Graph Rivista Pubblicazione Argomento

Slide 22

Slide 22 text

Neo4j Pattern Search

Slide 23

Slide 23 text

Airport Mobility with Twitter Data Neo4j v 3.2.3 Main Python Packages: Pandas
 py2neo Neo4j + Python: Case 2

Slide 24

Slide 24 text

Goals Using new data sources for mobility The use of space within airports terminals Finding patterns in data

Slide 25

Slide 25 text

Dataset Twitter users "overlapping" the 25 busiest airports in Europe 
 in the last three years. Users that at least once are passing through an airport area, 
 emitting a tweet. Tracing users through consecutive locations back and forth in time.

Slide 26

Slide 26 text

Data Model

Slide 27

Slide 27 text

Data Management Twitter .json Stream Prepare files for Neo4j import tool due to massive amount of data .bin/neo4j-import --into \ --nodes:User users.csv \ --nodes:Loc locations.csv \ --nodes:Tweet tweets.csv \ --nodes:Airport airport.csv \ --relationships:VISITED rels-visited.csv \ --relationships:EMITTED_IN rels-emitted_in.csv \ --relationships:WRITES rels-writes.csv \ --relationships:IS_WITHIN rels-is_within.csv \ --relationships:NEXT rels-next.csv \ --delimiter "|"

Slide 28

Slide 28 text

Pattern Detection

Slide 29

Slide 29 text

Pattern Detection

Slide 30

Slide 30 text

Bot Detection

Slide 31

Slide 31 text

Hamming Distance

Slide 32

Slide 32 text

Hamming Distance

Slide 33

Slide 33 text

py2neo Python package to let you interact with Neo4j directly from Python Can replace (it’s easier!) the official Neo4j Python Driver Support for pandas DataFrames Interaction via Jupyter Notebook

Slide 34

Slide 34 text

py2neo - How it works? # Import modules from py2neo import Graph import pandas as pd # Inizialize Graph, 
 # calling the instance to connect to our Neo4j database g = Graph() # Query for some data query = g.data("MATCH (m:Movie) RETURN m.title AS Title, m.releaseDate AS ReleaseDate")

Slide 35

Slide 35 text

py2neo - How it works? # Build a dataframe with the results df = pd.DataFrame(query)

Slide 36

Slide 36 text

py2neo - How it works? # Extract Month, Year and Weekday (Monday=0, Sunday=6) from Date df['Month'] = df['Date'].dt.month.astype(np.int) df['Year'] = df['Date'].dt.year.astype(np.int) df['Weekday'] = df['Date'].dt.dayofweek.astype(np.int)

Slide 37

Slide 37 text

py2neo - How it works? # Plot the distribution of year of release sns.distplot(df['Year'], kde=True, rug=True, color="r")

Slide 38

Slide 38 text

py2neo - How it works?

Slide 39

Slide 39 text

py2neo - How it works? In [11]: query = """ ...: MATCH p=(:Set)<-[:IN_SET]-(s:Song)-[:PART_OF]->(:Concert)- [:IN]->(:Location) ...: WHERE s.name contains 'Ultraviolet' ...: RETURN p LIMIT 20 ...: """ In [12]: graph.data(query)

Slide 40

Slide 40 text

py2neo - How it works? In [5]: graph.data(query) Out[5]: [{u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(e991d64)-[:IN]->(fd6d1be)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(a029d5f)-[:IN]->(e5fa8b3)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(fe3a7f3)-[:IN]->(ddf9b43)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(c37420f)-[:IN]->(f613b34)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(ecd9021)-[:IN]->(ac7ccfe)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(d68c490)-[:IN]->(ad2b199)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(f60ce94)-[:IN]->(d80258d)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(ab854b3)-[:IN]->(ceb2494)}, {u'p': (f327e15)<-[:IN_SET]-(d003ee8)-[:PART_OF]->(e05e9a8)-[:IN]->(a204538)}, …]

Slide 41

Slide 41 text

py2neo - How it works? In [9]: query = """ ...: MATCH (set:Set)<-[:IN_SET]-(s:Song)-[:PART_OF]->(c:Concert) ...: WHERE s.name contains 'Ultraviolet' ...: RETURN s.name, c.id, set.number ...: """ In [10]: DataFrame(graph.data(query)) Out[10]: c.id s.name set.number 0 3bd6f83c Ultraviolet (Light My Way) 5 1 23d6f833 Ultraviolet (Light My Way) 5 2 2bd6f836 Ultraviolet (Light My Way) 5 3 3bd6f834 Ultraviolet (Light My Way) 5 4 33d6f835 Ultraviolet (Light My Way) 5 5 2bd6f832 Ultraviolet (Light My Way) 5 6 3bd6f830 Ultraviolet (Light My Way) 5 7 33d6f831 Ultraviolet (Light My Way) 5 8 2bd6f82a Ultraviolet (Light My Way) 5 …

Slide 42

Slide 42 text

Conclusions We need tools and packages for “quick-and-dirty” and advanced data analysis Python improves the power of Neo4j and can easily interact with it (both native and with py2neo)

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content