This guide is meant to highlight some of the powers of Cypher, not to belittle/diminish SQL, which is a very powerful and the most widely used database query language.
For specific questions and use-cases specific languages are often a better fit, so pick the right tool for the job.
In the guide we will mostly look at Cypher, showing sometimes the SQL equivalent as comparison.
Declarative query language for relational databases (no relationships) that is being developed since the 70s to also allow less-technical users to interact with databases.
Originally for single table operations, JOINs later added, has been evolving since with support for geo, json, xml, analytics etc.
SELECT p.product_name, count(*) as cnt FROM customers c JOIN orders o ON (c.customer_id = o.customer_id) JOIN order_details od on (od.order_id = o.order_id) JOIN products p ON (od.product_id = p.product_id) WHERE p.product_name LIKE '%G%' GROUP BY p.product_name ORDER BY cnt DESC LIMIT 5;
product_name | cnt ------------------------+----- Guaraná Fantástica | 51 Gorgonzola Telino | 51 Gnocchi di nonna Alice | 50 Mozzarella di Giovanni | 38 Gumbär Gummibärchen | 32
The relational model is based on tables (relations) as the main building block. A fixed schema defines columns, their types and additional contraints and indexes. References to other tables are foreign keys, many-to-many connections as join/link-tables.
Declarative query language for graphs, based on graph pattern syntax (ascii-art). It focuses on readability and efficient syntax.
Here is an example. Later on you will be able to execute these examples by clicking on them and running them from the command-bar on top.
MATCH (a:Person)-[:ACTED_IN]->(m:Movie) WITH a, count(m) as movies WHERE movies > 2 MATCH (a)-[:DIRECTED]->(m) RETURN a.name, collect(m.title) as movies, count(*) as count ORDER BY count DESC LIMIT 5;
The property graph model is based on representing entities and their connections Its building blocks are nodes and relationships each of which can hold arbitrary properties. Graphs are often schema free or schema optional to allow for a more flexible evolution of the data model.
We pick two simple datasets, IMDB (Movies) and Northwind (Retail) as they don’t require you to learn a new domain and we can focus on the query langauge specifics.
Note
|
See how to import them below. |
To play along, you can run the :play movies
command and execute the create script on slide 2.
And also the :play northwind
command and execute all the load commands of the guide.
Even better, you can also click and run this box to import all the data we need:
link:https://raw.githubusercontent.com/neo4j-graph-examples/northwind/main/scripts/northwind.cypher[role=include]
link:https://raw.githubusercontent.com/neo4j-graph-examples/movies/main/scripts/movies.cypher[role=include]
-
Main Focus on patterns - Ascii-Art
-
Node:
() / (:Person) / (p:Person {name:'Emil'})
-
Relationship:
-- / --> / -[:RATED]-> / -[:RATED {stars:5}]->
-
(:Person {name:'Emil'})-[:RATED {stars:5}]->(:Movie {title:'The Matrix'})
-
Graph Patterns used for finding (
MATCH
), creating (CREATE
), updating (MERGE
), exploring data -
each dash would be one to two join in SQL, e.g. 6 joins here
(:Customer)-->(:Product)<--(:Customer)-->(:Product)
FROM customers c1 JOIN orders o1 ON (c1.customer_id = o1.customer_id) JOIN order_details od1 ON (od1.order_id = o1.order_id) JOIN products p1 ON (od1.product_id = p1.product_id) JOIN orders o2 ON (o2.customer_id = o1.customer_id) JOIN order_details od2 ON (od2.order_id = o2.order_id) JOIN products p2 ON (od2.product_id = p2.product_id) WHERE c1.customer_name = 'Emil' AND p1.product_name = 'Philz Coffee'
The following clauses may follow a MATCH
clause. They work with the
properties stored at the nodes and relationships found in the graph
matching that pattern.
match |
|
filter |
|
aggregate |
|
return |
|
order |
|
limit |
|
Or all together:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title CONTAINS 'Matrix'
WITH a, count(m) as movies WHERE movies > 2
RETURN a.name, movies
ORDER BY movies DESC LIMIT 5;
Follow a chain of relationships to an arbitrary depth and return the full path
:param product=>"Pavlova";
MATCH path = (p:Product {productName:$product})-[:PART_OF*]->(root:Category)
RETURN path
WITH RECURSIVE subcategories AS ( SELECT category_id, parent_category_id, category_name FROM categories WHERE category_id = 1 UNION SELECT category_id, parent_category_id, category_name FROM categories c INNER JOIN subcategories sc ON sc.category_id = c.parent_category_id ) SELECT * FROM subcategories;
Finding loops:
MATCH path = (p:Person)-[*1..5]-(p)
RETURN [n in nodes(path) | coalesce(n.name, n.title)] as path, length(path)
LIMIT 10
MATCH path = (p:Person)-[*3..10]-(p)
RETURN path
LIMIT 1
Find the shortest connection between two entities
:param name=>"Meg Ryan";
MATCH path = shortestPath(
(:Person {name:$name})-[:ACTED_IN*..10]-(:Person {name:'Kevin Bacon'}) )
RETURN path, length(path);
-
GROUP BY is superfluous, non-aggregated fields form grouping key
-
"in-between" aggregation with
WITH
, noHAVING
required -
any number of times
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
// in-between agreggation, only passing on declared fields
WITH a, count(m) as movies
// with filter on the aggregated value
WHERE movies > 2
// next query part
MATCH (a)-[:DIRECTED]->(m)
// aggregation at the end by a.name
RETURN a.name, collect(m.title) as movies, count(*) as count
ORDER BY count DESC LIMIT 5;
-
Literal syntax for lists
[1,2,3]
andrange(1,100,2)
and -
maps
{answer: 42}
-
list slices, index access, list comprehensions, quantors
-
map keys, dot-access, map projections
// list comprehension
WITH [id IN range(1,100)
WHERE id%13 = 0 |
// literal map construction
{id:id, name:'Joe '+id, age:id%100}] as people
// list quantor predicate on list elements
WHERE ALL(p in people WHERE p.age % 13 = 0)
RETURN people
-
First class aggreation functions for creating real lists:
collect
-
very useful for creating nested documents
-
keeps ordering
-
-
Turn a list into rows of values with
UNWIND
,-
esp. helpful for data creation and updates based on lists of data
-
// turn list into 10k rows
UNWIND range(1,10000) as id
CREATE (p:Person {id:id, name:"Joe "+id, age:id%100})
// aggregate people into age groups, collect per age-group
RETURN p.age, count(*) as ageCount, collect(p.name)[0..5] as ageGroup
-
Nested documents like in a document database
-
Borrowed from the GraphQL query langauge
MATCH (m:Movie)
// map projection, with property access
RETURN m { .*,
// pattern comprehension
actors:
[(m)<-[r:ACTED_IN]-(a) |
// property access, nested expressions
a { .name, roles: r.roles,
movies: size([()<-[:ACTED_IN]-(a)|a]) }
][0..5], // list slice
// pattern comprehension with filter & expression
directors: [(m)<-[:DIRECTED]-(d) WHERE d.born < 1975 | d.name]
} as movieDocument
LIMIT 10
e.g. Graph Refactoring, partial updates, idempotent graph construction
:param rating=>({title:'The Matrix',name:'Emil Eifrem', stars:5})
MATCH (m:Movie {title:$rating.title})
// possibly write
MERGE (u:User {name:$rating.name})
// possibly write
MERGE (u)-[r:RATED]->(m)
// write
SET r.stars = $rating.stars
WITH *
MATCH (m)<-[:ACTED_IN]-(a:Person)
// read & return
RETURN u, m, r, collect(a)
Transactional batching for loads, updates, refactoring, deletes
:auto MATCH (o:Order) // imagine 100M
CALL { WITH o
MATCH (o)-[r:ORDERS]->()
WITH o, sum(r.count) as products, sum(toFloat(r.unitPrice)*r.quantity) as total
SET o.count = products, o.total = total
} IN TRANSACTIONS OF 100000 ROWS
Note
|
This is a new feature of Neo4j 4.4. You can check your installation version in the Neo4j Browser sidebar or by running: call dbms.components(); |
Load CSV, JSON and other datatypes from other data sources
WITH "https://data.neo4j.com/importing/ratings.csv" AS url
LOAD CSV WITH HEADERS FROM url AS row
MATCH (m:Movie {movieId:row.movieId})
MERGE (u:User {userId:row.userId})
ON CREATE SET u.name = row.name
MERGE (u)-[r:RATED]->(m)
SET r.rating = toFloat(row.rating)
SET r.timestamp = toInteger(row.timestamp);
Load data from a web API (StackOverflow) using apoc.load.json
WITH "https://api.stackexchange.com/2.2/questions?pagesize=2&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
// load the json data from the URL
CALL apoc.load.json(url) YIELD value
// turn element list into rows
UNWIND value.items AS item
// deconstruct row data
RETURN item.title, item.owner, item.creation_date, keys(item)
LIMIT 5;
The apoc utility library is already installed on Neo4j AuraDB and Neo4j Sandbox
In Neo4j Desktop you can install it from the database settings.
In other Neo4j installations you can copy the jar from the labs
into the plugins
folder.
For Docker you can use the --env NEO4J_LABS_PLUGINS=["apoc"]
environment variable.
You can check if apoc is installed by running:
call apoc.help();
Documentation for apoc.load.json
Spatial datatypes (polar and cartesian, 2d and 3d), distance, bounding box
Dijkstra, A* and other pathfinding
MATCH (loc:Location)-[:HOSTS]->(e:Event)
// events near me
WHERE point.distance(loc.coords, $myLocation) < 1000
RETURN e.title, e.rating, size((e)<-[:RSVPD]-()) as guests
ORDER BY e.rating DESC LIMIT 5
Datetime, Timezones, Durations
formatting, parsing, computation
with date('2021-06-01') as start, date('2021-06-30') as end
unwind [days IN range(0,duration.between(start,end).days) | start + duration({days:days})] as day
WITH * WHERE NOT EXISTS { (:Appointment {date:day}) }
RETURN day
Cypher is a really powerful, but still humane query language for Graphs and Data.
It has been adopted by many other graph databases already, including AWS Neptune, RedisGraph, AgensGraph, Memgraph.
It is the basis for GQL the ISO Graph Query language, so it’s a worthwhile investment to learn it.
There are a lot of resources
-
✓ graph patterns vs. joins
-
✓ idempotent graph construction
-
✓ combining reads and writes
-
✓ UNWIND / collect
-
✓ in-line batching for updates / refactorings
-
✓ data deconstruction / map / list support
-
✓ list comprehensions / quantors
-
✓ multi-step-aggregation - WITH (bonus, look ma, no group by)
-
✓ LOAD CSV / apoc.load.json
-
✓ deep tree hierarchies
-
✓ shortest path
-
✓ map projection / pattern comprehension
-
✓ spatial, distance
-
✓ datetime, durations
This is great, Michael! There is now actually 14 things in your list! You might want to change it to read, (Actually More) so you don't have keep changing it.