My journey began with an inspiring blog post: Implementing advanced RAG strategies with Neo4j. Last few months I’ve been working to grasp RAG, LLMs, and LangChain, while also nurturing an idea in HRTech. Currently, we’re preparing mockups, and if things align with potential customers’ needs, we aim to develop an MVP by Q1 2024. That’s when a robust RAG method will be crucial for the project.
My experience in coding chat applications using LLMs and LangChain includes a project utilising a vectorstore. While embedding data into a vectorstore is straightforward, Neo4j presents a bit more of a learning curve. It’s not particularly hard, but it does require understanding the basics of Cypher, Neo4j’s query language, similar to SQL.
What is Neo4j?
Neo4j is a graph database that stores data in nodes and relationships, rather than in tables or documents. This format is akin to sketching ideas on a whiteboard, offering a flexible approach to data management. For more information, visit their website.
Having always been in tune with Mind Maps and a concept Systems Thinking, I found that working with a graph database like Neo4j felt incredibly natural and intuitive. I appreciate methods that can be visualised and easily explained.
Training in Neo4j
Neo4j offers extensive training material. When you sign up for their free AuraDB instance, you’ll find manuals and training links. There’s also the Graph Academy for free, self-paced, hands-on training. Training modules range from Neo4j fundamentals to Data Science, Developer languages, and advanced Cypher techniques.
O’Reilly also offers a free PDF book on Neo4j: O’Reilly Graph Databases.
Additionally, Neo4j’s Sandbox environment allows you to experiment with Cypher using various datasets. Although some manuals dive quickly into advanced data extraction techniques, they’re generally helpful.
My Experience Prepping for the Professional Certificate
I started with the Neo4j Certificated Professional, which consists of 80 questions to be answered in 60 minutes. It’s free, with one attempt allowed every 24 hours. I’ll write separately about the Data Science certificates and LLM training.
I found two different lists of trainings that helps to pass the certificate so here I paste those I completed myself:
- Neo4j property graph model
- Cypher queries
- Graph data modeling
- Importing data
- Application development concepts (Python one)
- Intermediate Cypher Queries – this one seems optional like others available in the Academy curriculum
Preparing for the certificate required more time than anticipated, especially for the ‘Immediate Cypher Queries’ module. I restarted this module midway to parallelly practice with the sandbox. This hands-on approach proved beneficial, though time consuming.
I generally spent 50% more time on each training and I also skimmed through almost a trainings just before the certificate.
Failing & Passing the Certificate
Upon taking the test, I scored 78%, just shy of the 80% passing mark. I noticed that as a non-native English speaker, some questions seemed straightforward at first but required careful reading. The Developer section and questions about data importing were particularly challenging to me.
Each incorrect answer in the test comes with an explanation or a link to documentation, which was quite helpful for understanding my mistakes. Motivated by this, I revisited all my incorrect answers and took the second test the following day. This time, I decided to proceed more cautiously with the questions. Despite my intention to go slower, I completed the test in under 40 minutes, double-checking my responses as I went along.
I observed some changes in the test: a few questions were different, others had the order of their multiple-choice answers shuffled, while some remained the same. But the fact that you already are familiar with the format and pace, pays off. I scored 93.4% and successfully passed the certificate.
Some Post-Study Notes
As I went through the training materials, I have saved some of the advices for later, when I build my first Neo4j database.
- Maximum 4 labels per node as a recommendation
- Eliminate duplicates
- Refactor
- Keep meta model simple as there is little value in building rich and expressive features which are not used
- Embrace just-enough semantics when creating new organising principles
- Careful use of EXPLAIN and PROFILE can significantly boost query performance
- Be aware of so called Eager operators. Eager operators pull in all data immediately and often create a choke point
- For imports with admin import tool: You don’t have to make sure the CSV files are perfect, just in good enough condition
- Performing one match clause will perform better than multiple match clauses since they are fewer relationship traversed
As I plan to pass LLM & Data Science trainings, as well as part of the O’Reilly book this list will get longer 🙂
I hope those of view who were considering training in Neo4j get encouraged by my post.
Cheers,
Ali