Explore the new openCypher custom functions and subquery support in Amazon Neptune

In this post, we describe some of the openCypher features that have been released as part of the 1.4.2.0 engine update to Amazon Neptune. Neptune is a fast, reliable, and fully managed graph database service for building and running applications with highly connected datasets, such as knowledge graphs, fraud graphs, identity graphs, and security graphs. Neptune provides developers with the choice of building their graph applications using three open graph query languages: openCypher, Apache TinkerPop Gremlin, and the World Wide Web Consortium’s (W3C) SPARQL 1.1. Neptune announced the general availability of the latest engine release to 1.4.2.0 on December 19, 2024. Starting with this release, you can benefit from a variety of new features and improvements, including custom functions and support for the CALL subquery, which we further discuss in this post. You can use the guide at the end of this post to try out the new features that are described.

Support for read-only CALL subqueries

For queries that require sub-query support, such as executing a specific openCypher query on a node-by-node basis, support for the CALL function was added. Prior to this, if you wanted to execute additional MATCH statements against a collection of data, it was necessary to split the code into multiple queries, passing the output of one query as the input to the next.

The following is an example of using the CALL functionality to run a second query that will be run for each result in stopover. The initial MATCH starts at the Austin Bergstrom International (AUS) airport and performs a single-hop traversal across the route edge to a connected stopover node. For each stopover, it then retrieves the first two airports connected to the stopover, ordered by the route distance property value in descending order.

MATCH (origin:airport {code:"AUS"})-[:route]->(stopover) 
CALL { 
  WITH stopover 
  MATCH (stopover)-[r:route]->(destination) 
  RETURN destination 
  ORDER BY r.dist DESC LIMIT 2 
} 
RETURN origin.desc, stopover.desc, destination.desc

Prior to support for CALL, the preceding query would not have been possible in openCypher, because Neptune didn’t support functionality to run a subquery on a per-object basis. For more information on how the CALL subquery works, how to write queries using it, and current limitations, see CALL subquery support in Neptune.

Support for Neptune openCypher custom functions

Neptune openCypher functions are additions to the Neptune openCypher specification implementation that support customer requirements such as string matching, and collection and map sorting. The following functions are available in Neptune Database version 1.4.2.0 and above, as well as Amazon Neptune Analytics.

textIndexOf(text :: STRING, lookup :: STRING, from = 0 :: INTEGER?, to = -1 :: INTEGER?) :: (INTEGER?)

This function returns the index of the first occurrence of lookup in the range of text starting at offset from (inclusive), through offset to (exclusive). If to is -1, the range continues to the end of text. Indexing is zero-based and is expressed in Unicode scalar values (non-surrogate code points). In the following example, we search for a specific expression ‘e’ in the value ‘Amazon Neptune’:

RETURN textIndexOf('Amazon Neptune', 'e') as result
{
 "results": [{
  "result": 8
 }]
}

collToSet(values :: LIST OF ANY?) :: (LIST? OF ANY?)

If you wanted to return a list containing only a unique set of values, you would need to combine COLLECT with DISTINCT to produce the results. For example, the following query produces a unique collection of names of airports that have connecting routes to airports located in the US:

MATCH (:airport {country: 'US'})-[:route]->(d:airport)
WITH COLLECT(d.desc) AS locations
RETURN collToSet(locations)

collSubtract(first :: LIST OF ANY?, second :: LIST OF ANY?) :: (LIST? OF ANY?)

If you want to return a list that contains only values that are present in one list and not another, you can use the collSubtract function. This function returns a new list containing all the unique elements from the first list, excluding elements from the second list. The order of the list is maintained. For example, the following query produces a unique collection of names of airports in the US that have connecting routes to airports located in France, but don’t also connect to airports in the UK:

MATCH (:airport {country: 'FR'})-[:route]→(d1:airport {country: 'US'})
MATCH (:airport {country: 'UK'})-[:route]→(d2:airport {country: 'US'})
WITH COLLECT(d1.desc) AS FR_Routes, COLLECT(d2.desc) AS UK_Routes
RETURN collSubtract(FR_Routes,UK_Routes)
{
  "results": [
    {
      "Routes": [
        "Cincinnati Northern Kentucky International Airport",
        "Indianapolis International Airport"
      ]
    }
  ]
}

collIntersection(first :: LIST? OF ANY?, second :: LIST? OF ANY?) :: (LIST? OF ANY?)

If you want to return a list that contains only the items that exist in two given lists, you can use collIntersection. This function returns a new unique list of items that are present in both of the given parameter lists. For example, the following query produces a unique collection of airport names that have routes originating from either London Heathrow (LHR) or Seattle-Tacoma International (SEA) airport:

MATCH (lhr:airport {code: 'LHR'})-[:route]->(d1)
MATCH (sea:airport {code: 'SEA'})-[:route]->(d2)
WITH collIntersection(COLLECT(d1.desc),COLLECT(d2.desc)) as airports_list
UNWIND airports_list as airports
RETURN airports

Sorting functions

Neptune sorting functions focus on improving readability and providing flexibility for use cases that involve sorting over complex data types such as single or multiple maps, as well as providing single or multiple sort keys. For each of the sorting functions, the default sorting order is ascending.

collSort(coll :: LIST OF ANY, config :: MAP?) :: (LIST? OF ANY?)

You can use the collSort function to sort a list of values. The function returns a new, sorted list, containing the original list elements. By default, it sorts the values in ascending order, but you can modify this behavior by providing a map configuration, such as in the following example. This query sorts the names in descending alphabetical order of the first 10 airports located in the US:

MATCH (a:airport {country: 'US'}) 
RETURN collSort(a.desc, { order: 'asc' }) as result

collSortMaps(coll :: LIST OF MAP, config :: MAP) :: (LIST? OF ANY?)

If you have a list of map objects, as opposed to a list of single data type values, you can use collSortMaps to sort the list of maps based on a map property. To do this, you must provide a configuration map that specifies the property name to sort, along with the sort direction. For example, the following configuration specifies the desc map property as the sort key, and the order of the sorting to be in ascending order:

{ key: 'desc', order: 'asc' }

The following query returns a collection of map objects based on the desc property and code property of all airports located in the US. It then sorts this collection by the code property in descending order, before outputting the top 10 results:

MATCH (a:airport {country: 'US'}) 
WITH collSortMaps(COLLECT({name: a.desc, code: a.code}), {key: 'code', order: 'desc'}) as Sorted_Airports
UNWIND Sorted_Airports as Airports
RETURN Airports.name, Airports.codeLIMIT 10

collSortMulti(coll :: LIST OF MAP?, configs = [] :: LIST OF MAP, limit = -1 :: INTEGER?, skip = 0 :: INTEGER?) :: (LIST? OF ANY?)

Extending the functionality of sorting lists of maps, collSortMulti enables you to sort on multiple map properties, as well as optionally providing limit and skip configurations. For example, the following configuration specifies that each map should be first sorted using the runways property in descending order, then by the desc property in ascending order (default), skipping the first 10 records, and limited to the next 20:

[{ key: 'runways', order: 'desc' }, { key: 'desc' }], 20, 10

The following query returns a collection of map objects using the preceding configuration:

MATCH (a:airport {country: 'US'})
WITH COLLECT({runways: a.runways, code: a.code, desc: a.desc}) as US_Airports
WITH collSortMulti(US_Airports,[{ key: 'runways', order: 'desc' }, { key: 'desc' }], 20, 10) AS Sorted_Airports
UNWIND Sorted_Airports AS Airports
RETURN Airports.runways, Airports.desc

collSortNodes(coll :: LIST OF NODE, config :: MAP) :: (LIST? OF NODE?)

Some use cases require returning a sorted collection of nodes as opposed to maps of node values. For this, you can use collSortNodes, which sorts an input list of nodes based on the specified configuration, similar to collSortMaps. The following configuration defines the sort key as runways and the sort order as descending:

{ key: 'runways', order: 'desc' }

This configuration is demonstrated in the following query, which returns the top 10 airports in the world ordered by the number of runways, with the airport with the largest number of runways first, and those with fewer airports last:

MATCH (a:airport)
WITH COLLECT(a) AS Airports
WITH collSortNodes(Airports, {key: 'runways', order: 'desc'}) AS Sorted_Airports
UNWIND Sorted_Airports as Airports
RETURN Airports
LIMIT 10

Bulk load data using Neptune Database or Neptune Analytics

The preceding queries use the publicly available air-routes dataset that can be ingested into your Neptune Database cluster or Neptune Analytics graph automatically using the %seed Workbench magic command. Alternatively, you can bulk load the data into either Neptune Database or Neptune Analytics from a Neptune notebook, using the following commands.

For Neptune Database, you can use the bulk load API:

%load --s s3://aws-neptune-customer-samples/airroutes/ --l <iam_role_arn> --f csv

For more information about setting up IAM roles, and examples of initiating a Neptune Database bulk load using curl, see Example: Loading Data into a Neptune DB Instance.

For Neptune Analytics, you can use the neptune.load command with CALL:

CALL neptune.load({
source: 's3://aws-neptune-customer-samples-us-east-1/airroutes/',
region: 'us-east-1',
format: 'CSV',
failOnError: true,
concurrency: 1
})

Note that you are responsible for any costs incurred while trying out these examples on your Neptune Database cluster or Neptune Analytics graph.

Conclusion

In this post, we described how Neptune has extended the openCypher graph query language to provide you with additional functionality that in some cases wasn’t previously available, meaning queries needed to be segregated between the graph and application code, creating a more complex, highly coupled solution architecture.

You can find a complete list of improvements and fixes in the release notes. The following are a few ways to get started with this release:

Create your first Neptune cluster as part of the AWS Free Tier
Upgrade your existing Neptune cluster to 1.4.2.0 or later to take advantage of the latest features
Use the open source graph-explorer application to quickly visualize and explore graphs on Neptune
Run the open source graph-notebook library on Jupyter or JupyterLab notebooks to interactively query and build graph applications on Neptune

Leave your questions in the comments section.

About the authors

Kevin Phillips is a Sr. Neptune Specialist Solutions Architect working in the UK at Amazon Web Services, having spent the last 4 years working with customers across EMEA to get started and accelerate with graphs. He has over 20 years of development and solutions architectural experience, which he uses to help support and guide customers.

AWS Database Blog