Skip to content

Instantly share code, notes, and snippets.

@jexp
Last active April 1, 2023 21:22
Show Gist options
  • Save jexp/585491b1443e4108d018cb440e3c64c0 to your computer and use it in GitHub Desktop.
Save jexp/585491b1443e4108d018cb440e3c64c0 to your computer and use it in GitHub Desktop.
10 Things that are easier with Cypher than SQL

Cypher vs. SQL

10 Things that are easier with Cypher than SQL

This guide is meant to highlight some of the powers of Cypher, not to belittle/diminish SQL, which is a very powerful and the most widely used database query language.

For specific questions and use-cases specific languages are often a better fit, so pick the right tool for the job.

In the guide we will mostly look at Cypher, showing sometimes the SQL equivalent as comparison.

SQL

Declarative query language for relational databases (no relationships) that is being developed since the 70s to also allow less-technical users to interact with databases.

Originally for single table operations, JOINs later added, has been evolving since with support for geo, json, xml, analytics etc.

SELECT p.product_name, count(*) as cnt
FROM customers c
JOIN orders o ON (c.customer_id = o.customer_id)
JOIN order_details od on (od.order_id = o.order_id)
JOIN products p ON (od.product_id = p.product_id)
WHERE p.product_name LIKE '%G%'
GROUP BY p.product_name
ORDER BY cnt DESC LIMIT 5;
      product_name      | cnt
------------------------+-----
 Guaraná Fantástica     |  51
 Gorgonzola Telino      |  51
 Gnocchi di nonna Alice |  50
 Mozzarella di Giovanni |  38
 Gumbär Gummibärchen    |  32

The relational model is based on tables (relations) as the main building block. A fixed schema defines columns, their types and additional contraints and indexes. References to other tables are foreign keys, many-to-many connections as join/link-tables.

Cypher

Declarative query language for graphs, based on graph pattern syntax (ascii-art). It focuses on readability and efficient syntax.

Here is an example. Later on you will be able to execute these examples by clicking on them and running them from the command-bar on top.

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)

WITH a, count(m) as movies WHERE movies > 2

MATCH (a)-[:DIRECTED]->(m)

RETURN a.name, collect(m.title) as movies, count(*) as count
ORDER BY count DESC LIMIT 5;

The property graph model is based on representing entities and their connections Its building blocks are nodes and relationships each of which can hold arbitrary properties. Graphs are often schema free or schema optional to allow for a more flexible evolution of the data model.

Datasets

We pick two simple datasets, IMDB (Movies) and Northwind (Retail) as they don’t require you to learn a new domain and we can focus on the query langauge specifics.

Note
See how to import them below.

model model

To play along, you can run the :play movies command and execute the create script on slide 2. And also the :play northwind command and execute all the load commands of the guide.

Even better, you can also click and run this box to import all the data we need:

link:https://raw.githubusercontent.com/neo4j-graph-examples/northwind/main/scripts/northwind.cypher[role=include]
link:https://raw.githubusercontent.com/neo4j-graph-examples/movies/main/scripts/movies.cypher[role=include]

Graph Patterns vs. JOINS

  • Main Focus on patterns - Ascii-Art

  • Node: () / (:Person) / (p:Person {name:'Emil'})

  • Relationship: -- / --> / -[:RATED]-> / -[:RATED {stars:5}]->

  • (:Person {name:'Emil'})-[:RATED {stars:5}]->(:Movie {title:'The Matrix'})

  • Graph Patterns used for finding (MATCH), creating (CREATE), updating (MERGE), exploring data

  • each dash would be one to two join in SQL, e.g. 6 joins here
    (:Customer)-->(:Product)<--(:Customer)-->(:Product)

FROM customers c1
JOIN orders o1 ON (c1.customer_id = o1.customer_id)
JOIN order_details od1 ON (od1.order_id = o1.order_id)
JOIN products p1 ON (od1.product_id = p1.product_id)
JOIN orders o2 ON (o2.customer_id = o1.customer_id)
JOIN order_details od2 ON (od2.order_id = o2.order_id)
JOIN products p2 ON (od2.product_id = p2.product_id)
WHERE c1.customer_name = 'Emil'
  AND p1.product_name = 'Philz Coffee'

Other Clauses

The following clauses may follow a MATCH clause. They work with the properties stored at the nodes and relationships found in the graph matching that pattern.

match

MATCH (a:Person)-[:ACTED_IN]→(m:Movie)

filter

WHERE m.title CONTAINS 'Matrix'

aggregate

WITH a, count(m) as movies WHERE movies > 2

return

RETURN a.name, movies

order

ORDER BY movies DESC

limit

LIMIT 5;

Or all together:

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title CONTAINS 'Matrix'

WITH a, count(m) as movies WHERE movies > 2

RETURN a.name, movies
ORDER BY movies DESC LIMIT 5;

Path Patterns

Follow a chain of relationships to an arbitrary depth and return the full path

:param product=>"Pavlova";
MATCH path = (p:Product {productName:$product})-[:PART_OF*]->(root:Category)
RETURN path
Recursive SQL with Common Table Expression
WITH RECURSIVE subcategories AS (
    SELECT
        category_id,
        parent_category_id,
        category_name
    FROM
        categories
    WHERE
        category_id = 1
    UNION
        SELECT
            category_id,
            parent_category_id,
            category_name
        FROM
            categories c
        INNER JOIN subcategories sc
        ON sc.category_id = c.parent_category_id
) SELECT
    *
FROM
    subcategories;

Finding loops:

MATCH path = (p:Person)-[*1..5]-(p)
RETURN [n in nodes(path) | coalesce(n.name, n.title)] as path, length(path)
LIMIT 10
MATCH path = (p:Person)-[*3..10]-(p)
RETURN path
LIMIT 1

Shortest Path

Find the shortest connection between two entities

:param name=>"Meg Ryan";
MATCH path = shortestPath(
    (:Person {name:$name})-[:ACTED_IN*..10]-(:Person {name:'Kevin Bacon'}) )
RETURN path, length(path);

Aggregation without GROUP BY, Multi Part Statements with WITH

  • GROUP BY is superfluous, non-aggregated fields form grouping key

  • "in-between" aggregation with WITH, no HAVING required

  • any number of times

MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)

// in-between agreggation, only passing on declared fields
WITH a, count(m) as movies
// with filter on the aggregated value
WHERE movies > 2

// next query part
MATCH (a)-[:DIRECTED]->(m)

// aggregation at the end by a.name
RETURN a.name, collect(m.title) as movies, count(*) as count
ORDER BY count DESC LIMIT 5;

Built in data structure support - Lists and Maps

  • Literal syntax for lists [1,2,3] and range(1,100,2) and

  • maps {answer: 42}

  • list slices, index access, list comprehensions, quantors

  • map keys, dot-access, map projections

// list comprehension
WITH [id IN range(1,100)
        WHERE id%13 = 0 |
        // literal map construction
        {id:id, name:'Joe '+id, age:id%100}] as people

// list quantor predicate on list elements
WHERE ALL(p in people WHERE p.age % 13 = 0)
RETURN people

UNWIND and Collect

  • First class aggreation functions for creating real lists: collect

    • very useful for creating nested documents

    • keeps ordering

  • Turn a list into rows of values with UNWIND,

    • esp. helpful for data creation and updates based on lists of data

// turn list into 10k rows
UNWIND range(1,10000) as id

CREATE (p:Person {id:id, name:"Joe "+id, age:id%100})

// aggregate people into age groups, collect per age-group
RETURN p.age, count(*) as ageCount, collect(p.name)[0..5] as ageGroup

Pattern Comprehensions and Map Projections

  • Nested documents like in a document database

  • Borrowed from the GraphQL query langauge

MATCH (m:Movie)

// map projection, with property access
RETURN m { .*,
    // pattern comprehension
    actors:
        [(m)<-[r:ACTED_IN]-(a) |
            // property access, nested expressions
            a { .name, roles: r.roles,
                movies: size([()<-[:ACTED_IN]-(a)|a]) }
        ][0..5], // list slice
    // pattern comprehension with filter & expression
    directors: [(m)<-[:DIRECTED]-(d) WHERE d.born < 1975 | d.name]
    } as movieDocument
LIMIT 10

Combining Reads and Writes

e.g. Graph Refactoring, partial updates, idempotent graph construction

:param rating=>({title:'The Matrix',name:'Emil Eifrem', stars:5})
MATCH (m:Movie {title:$rating.title})
// possibly write
MERGE (u:User {name:$rating.name})

// possibly write
MERGE (u)-[r:RATED]->(m)
// write
SET r.stars = $rating.stars
WITH *
MATCH (m)<-[:ACTED_IN]-(a:Person)
// read & return
RETURN u, m, r, collect(a)

Transactional Batching

Transactional batching for loads, updates, refactoring, deletes

:auto MATCH (o:Order) // imagine 100M
CALL { WITH o
    MATCH (o)-[r:ORDERS]->()
    WITH o, sum(r.count) as products, sum(toFloat(r.unitPrice)*r.quantity) as total
    SET o.count = products, o.total = total
} IN TRANSACTIONS OF 100000 ROWS
Note

This is a new feature of Neo4j 4.4. You can check your installation version in the Neo4j Browser sidebar or by running:

call dbms.components();

Built in Data Loading

Load CSV, JSON and other datatypes from other data sources

WITH "https://data.neo4j.com/importing/ratings.csv" AS url

LOAD CSV WITH HEADERS FROM url AS row

MATCH (m:Movie {movieId:row.movieId})
MERGE (u:User {userId:row.userId})
ON CREATE SET u.name = row.name

MERGE (u)-[r:RATED]->(m)
SET r.rating = toFloat(row.rating)
SET r.timestamp = toInteger(row.timestamp);

Load data from a web API (StackOverflow) using apoc.load.json

WITH "https://api.stackexchange.com/2.2/questions?pagesize=2&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url

// load the json data from the URL
CALL apoc.load.json(url) YIELD value

// turn element list into rows
UNWIND value.items AS item
// deconstruct row data
RETURN item.title, item.owner, item.creation_date, keys(item)
LIMIT 5;

APOC Utility Library

The apoc utility library is already installed on Neo4j AuraDB and Neo4j Sandbox In Neo4j Desktop you can install it from the database settings. In other Neo4j installations you can copy the jar from the labs into the plugins folder. For Docker you can use the --env NEO4J_LABS_PLUGINS=["apoc"] environment variable.

You can check if apoc is installed by running:

call apoc.help();

Documentation for apoc.load.json

Spatial

Spatial datatypes (polar and cartesian, 2d and 3d), distance, bounding box

Dijkstra, A* and other pathfinding

MATCH (loc:Location)-[:HOSTS]->(e:Event)
// events near me
WHERE point.distance(loc.coords, $myLocation) < 1000
RETURN e.title, e.rating, size((e)<-[:RSVPD]-()) as guests
ORDER BY e.rating DESC LIMIT 5

Datetime

Datetime, Timezones, Durations

formatting, parsing, computation

with date('2021-06-01') as start, date('2021-06-30') as end
unwind [days IN range(0,duration.between(start,end).days) | start + duration({days:days})] as day
WITH * WHERE NOT EXISTS { (:Appointment {date:day}) }
RETURN day

Conclusion

Cypher is a really powerful, but still humane query language for Graphs and Data.

It has been adopted by many other graph databases already, including AWS Neptune, RedisGraph, AgensGraph, Memgraph.

It is the basis for GQL the ISO Graph Query language, so it’s a worthwhile investment to learn it.

There are a lot of resources

10 Things (actually 12)

  • ✓ graph patterns vs. joins

  • ✓ idempotent graph construction

  • ✓ combining reads and writes

  • ✓ UNWIND / collect

  • ✓ in-line batching for updates / refactorings

  • ✓ data deconstruction / map / list support

  • ✓ list comprehensions / quantors

  • ✓ multi-step-aggregation - WITH (bonus, look ma, no group by)

  • ✓ LOAD CSV / apoc.load.json

  • ✓ deep tree hierarchies

  • ✓ shortest path

  • ✓ map projection / pattern comprehension

  • ✓ spatial, distance

  • ✓ datetime, durations

@kraken-power
Copy link

This is great, Michael! There is now actually 14 things in your list! You might want to change it to read, (Actually More) so you don't have keep changing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment