Introducing the Neo4j 3.0 Apache Spark Connector

Introducing the Neo4j 3.0 #ApacheSpark Connector  #datascience #bigdata

  • UNWIND range(1, 1000000) AS x CREATE (:Person {id: x, name: ‘name’ + x, age: x % 100}))
  • All the interaction with Neo4j is as simple as sending parameterized Cypher statements to the graph database to read, create and update nodes and relationships.
  • Now we can start both spark-shell with our connector and GraphFrames as packages.
  • The connector, like our official drivers is licensed under the Apache License 2.0 .
  • The source code is available on GitHub and the connector and its releases are also listed on spark packages .

Learn all about the new connector between Apache Spark and Neo4j 3.0 with hands-on examples working with GraphFrames, GraphX, Spark Shell, RDD and more.

@techjunkiejh: Introducing the Neo4j 3.0 #ApacheSpark Connector #datascience #bigdata

We proudly want to participate in this week’s flurry of announcements around Apache Spark.

While we’re cooperating with Databricks in other areas like the implementation of openCypher on Spark and as an industry-partner of AMPLab, today I want to focus on the Neo4j Spark Connector.

One of the important features of Neo4j 3.0 is Bolt, the new binary protocol with accompanying official drivers for Java, JavaScript, .NET and Python. That caused me to give implementing a connector to Apache Spark a try, and also to see how fast I can transfer data from Neo4j to Spark and back again.

The implementation was really straightforward. All the interaction with Neo4j is as simple as sending parameterized Cypher statements to the graph database to read, create and update nodes and relationships.

So I started with implementing a Resilient Distributed Dataset (RDD) and then added the other Spark features, including GraphFrames, so that the connector now supports:

You can find more detailed information about it’s usage here; this is only a quick overview on how to get started.

I presume you already have Apache Spark installed. Then download, install and start Neo4j 3.0.

relationships, all in about a minute.

UNWIND range(1, 1000000) AS x CREATE (:Person {id: x, name: ‘name’ + x, age: x % 100}))

Now we can start both spark-shell with our connector and GraphFrames as packages.

And to start using it, we only do a quick RDD and GraphX demo and then look at GraphFrames.

import org.neo4j.spark._ // statement to fetch nodes with id less than given value val query = “cypher runtime=compiled MATCH (n) where id(n) < {maxId} return id(n)" val params = Seq("maxId" -> 100000) Neo4jRowRDD(sc, query, params).count // res0: Long = 100000

The connector, like our official drivers is licensed under the Apache License 2.0. The source code is available on GitHub and the connector and its releases are also listed on spark packages.

I would love to get some feedback of the things you liked (and didn’t) and that worked (or didn’t). That’s what the relase candidate versions are meant for, so please go ahead and raise GitHub Issues.

Introducing the Neo4j 3.0 Apache Spark Connector

You might also like More from author

Comments are closed, but trackbacks and pingbacks are open.