Recently, I’m exploring the possibilities of using graph database at work. The best way to know if this technology works to try it out. In this post, I will set up Neo4j server using a docker image and play around with Neo4j.

What’s Neo4j

Neo4j is a highly scalable, robust native graph database.

Neo4j uses Cypher to query for data in the database. The community has also created an extensive range of drivers for other programming languages.

There are other graph databases available. You can view the list here.

As I’m very new in graph databases, I have no idea what technical aspect should I be concerned. I have chosen Neo4j for the time being as it is open-source, regularly updated and has a commercial license.

Setup Neo4j with a Docker Image

If you have not installed Docker for Mac (or other OS), you can download the installer on this page.

Pull the latest docker image from Docker Hub:

docker pull neo4j

Then, start a Neo4j container:

docker run --publish=7474:7474 --publish=7687:7687 --env=NEO4J_AUTH=none --volume=$HOME/neo4j/data:/tmp/neo4j -v $HOME/neo4j/import:/var/lib/neo4j/import neo4j

Now, we can access Neo4j through the web browser at: http://localhost:7474.

We have bounded a volume ~/neo4j/data to allow the database remains persistent outside of the container. We have also bounded an import volume so that we can import files in the later stage.

Since we are testing the technology, we disable the authentication on the server by passing --env=NEO4J_AUTH=none to docker run. You should never do this if you are going to run this container on production.

Using Neo4j

Once you access http://localhost:7474, you will be greeted with the dashboard. The dashboard contains a stream of query data that you have requested to run in the editor windows.

Website Recommendation Engine

In this post, I will build a website recommendation engine. This engine will recommend new website for the user based on what he has read in the past.

Preparing Data

Before I go to Neo4j, I will need to create some data. I use Feedly API to search and populate the data. You can find the snippet here.

This is how the extracted data looks like:

Data that we extracted to test the Graph Database

Fortunately, we can load csv data into Neo4j using a function called LOAD CSV. I will not go through the details. If you need more information, you can read the guide here.

Let’s copy the file into Neo4j’s instance:

cp feedly.csv ~/neo4j/import/

Let’s check that our file can be processed by Neo4j. In the editor windows:

// check first few raw lines
LOAD CSV WITH HEADERS FROM "file-url" AS line WITH line
RETURN line
LIMIT 5;

You should be able to get similar results:

Imported result

Create Our Property Graph

Now it’s time to create some nodes!

In our recommendation engine, we have two labels - Website and Tag. Each Website is tagged with one or more Tag.

First, we create unique constraint for Website and Tag:

// Add constraint for uniqueness
CREATE CONSTRAINT ON (website:Website) ASSERT website.name IS UNIQUE
CREATE CONSTRAINT ON (tag:Tag) ASSERT tag.name IS UNIQUE

Then, we load the data into Neo4j and create the relationship:

LOAD CSV WITH HEADERS FROM "file:///feedly.csv" AS line 
WITH line, SPLIT(line.tags, "|") as tags

// For each tag in tags, create a node with name as the property
// Then create the relationship between this website and the tags
// As tag can be repeated, we have to use MERGE to ensure there's only one node created for each tag.

FOREACH(each_tag IN tags | 
MERGE (website:Website { name: line.website, subscriber: toInteger(line.subscribers)} )
MERGE (tag:Tag { name: each_tag}) 
MERGE (website)-[:TAG]->(tag))

You can check that you have created the relationship correctly using the following query:

MATCH (website:Website)-[:TAG]->(tag:Tag {name: "apple"})
return website, tag

This is what you will probably get:

Adding More Nodes & Relationship

Now that we have our data ready in Neo4j, we can start to create some users.

We have two users, Alice and Bob. Alice loves to eat and catch up on technology news. On the other hand, Bob likes cats and dogs. He also likes to eat and drink alcohol.

In editor, we set our parameter.

// Use parameter to store the data
:param props: [{"name": "Alice" }, {"name": "Bob"}]

Then, we unwind the parameter and create the nodes.

// Create user based on the given parameters
UNWIND $props AS userMap
MERGE (user:User {name:userMap.name})
RETURN user

Now, we want to create relationship among the users and tags.

MATCH (tag:Tag) where tag.name in ["food", "apple", "mac", "tech"]
MATCH (user:User {name:"Alice"})
MERGE (user)-[:LOVE]->(tag)
RETURN tag, user

MATCH (tagB:Tag) where tagB.name in ["food", "cats", "dogs", "alcohol", "mac", "apple"]
MATCH (userB:User {name:"Bob"})
MERGE (userB)-[:LOVE]->(tagB)
RETURN tagB, userB

To check if you have executed the commands correctly, you can run the following:

MATCH (user:User)-[:LOVE]->(tag:Tag)
return user, tag

And this is what you will see:

Let’s say Alice has already subscribed to some of the websites related to food. We can create a new relationship between Alice and websites.

// Subscribed to some random website
MATCH (user:User {name: "Alice"})

// Note that your ID might be different from mine
MATCH (website:Website) where ID(website) in [504, 517, 541, 501, 522, 478,503,500,634]
MERGE (user)-[:SUBSCRIBE]->(website)
return user, website

What’s Next?

Now, let’s recommend some websites for Alice and Bob to get started. We can run the following command:

MATCH (user:User)-[:LOVE]->(tag:Tag)
MATCH (website:Website)-[:TAG]->(tag)
return user, tag, website

This is what we will get:

First step to our recommendation engine

Basically, we query the database to extract all the websites that are tagged with the tags which Alice and Bob are interested.

We can make the query even more precise. As Alice is a very picky reader, we can recommend those websites that have more subscribers.

MATCH (user:User {name: "Alice"})-[:LOVE]->(tag:Tag {name:"food"})
MATCH (website:Website)-[:TAG]->(tag) WHERE NOT (user)-[:SUBSCRIBE]->(website) and website.subscriber > 20000
return user, tag, website

This will be how it looks like:

These are what Alice might like.

Neo4j, Python and REST API

I see that there are two ways to access data from Neo4j. First, we can install Python driver to access the graph database. Second, we can access the data directly through REST API.

You can read more about Python driver here.

Basically, you need to use Neo4j Python driver to connect to the database. Then you can run queries with the driver.

#!/usr/bin/python

from neo4j.v1 import GraphDatabase, basic_auth

password = "neo4j" #use env to protect this when go to production
driver = GraphDatabase.driver('bolt://localhost',auth=basic_auth("neo4j", password))

# connects to db
db = driver.session()

# get some graphs
results = db.run("MATCH (user:User {name: 'Alice'})-[:LOVE]->(tag:Tag {name:'food'}) "
                "MATCH (website:Website)-[:TAG]->(tag) WHERE NOT (user)-[:SUBSCRIBE]->(website) and website.subscriber > 20000 "
                "return user.name as user, tag.name as tag, website.name as link, website.subscriber as count")

for record in results:
    print record["user"], record["link"], record["count"]

I have also created API for my app using Flask. You can see the code here. Here’s a preview of the API:

My API to the graph database

Conclusion

I find that graph database is really useful in modeling connected data. The hardest part is to model the graph based on the given data.

Neo4j can be set up easily. Cypher is fairly straightforward and easy to use. The challenge is to formulate the queries such that it can return something that is expected.