Companies you'll love to work for

Thesis Topic Opportunity (Fall 2024)



Malmö, Sweden
Posted on Tuesday, February 13, 2024
About Neo4j:

Neo4j is the leader in Graph Database & Analytics, helping organizations uncover hidden relationships and patterns across billions of data connections deeply, easily and quickly. Customers use Neo4j to gain a deeper understanding and reveal new ways of solving their most pressing problems. Over 75% of Fortune 100 companies use Neo4j, along with a vibrant community of 250,000+ developers, data scientists, and architects across the globe.

At Neo4j, we’re proud to be building the technology that powers breakthrough solutions for our customers, helping them cure diseases, fight fraud, crush pandemics, and accomplish their most ambitious missions—even if it’s getting humans to Mars. Learn more at and follow us @Neo4j.

Our Vision:

At Neo4j, we have always strived to help the world make sense of data.

As business, society and knowledge become increasingly connected, our technology promotes innovation by helping organizations to find and understand data relationships. We created, drive and lead the graph database category, and we’re disrupting how organizations leverage their data to innovate and stay competitive.

Job Overview:

Are you at the end of your studies and want to immerse yourself in graph technology? We are now looking for students who want to do their Master’s Thesis alongside us at Neo4j!
As part of Neo4j engineering in Malmö, you will work with a diverse team of talented colleagues worldwide. You will receive advice and continuous support from us - we are experts in graph technology and positioned to help you perform to the best of your ability.

Past Thesis Topics:

Predicting Loss of Fault Tolerance in a Cloud Graph Database:
With the development of cloud computing it becomes increasingly popular with applications which are hosted on the cloud and used over the internet. In order to keep the system operational and prevent loss of data in case of failure, many systems adapt fault tolerance. Fault tolerance is defined as a system’s ability to continue operating without loss of functionality when one or more of its com- ponents fail. Therefore, it becomes essential to be able to detect and predict when a system is at risk of losing fault tolerance. Every anomalous behaviour in a system is a potential cause to an incident that can lead to the system losing this quality. By detecting anomalies that can contribute to fault intolerance, this can be prevented.

Random Generation of Semantically Valid Cypher Queries:
Database management systems (DBMS) are integral tools at the center of many software applications, which means that these applications are deeply dependent on the correctness of their DBMS. In recent years, graph DBMSs have seen a significant rise in popularity, but they have not gotten the same amount of academic attention when it comes to testing as their relational counterparts. The most popular graph DBMS is called Neo4j and it has its own query language called Cypher. In this thesis, we present a tool that generates random semantically correct Cypher queries. This query generator has a versatile set of use-cases and is built to be configurable, and in this thesis we have focused on using it for random testing of the Neo4j DBMS. Random testing of a DBMS means generating random but correct queries, executing them on the database and then checking whether the output is incorrect, which can be accomplished in a few different ways.

Finding Candidate Node Pairs for Link Prediction at Scale:
There are methods for inferring whether a pair of people are likely to become friends or whether two kinds of drugs are likely to interact if consumed simultaneously. The methods solve the problem of link prediction, i.e. answer the question "Is a link (friendship, interaction) likely to form between two particular nodes (people, drugs)?". Generalizing the problem to graphs translates it to predicting if particular node pairs are likely to form links. As predicting links between all possible node pairs is computationally infeasible for larger graphs, methods for narrowing down the search space are required to efficiently solve the problem. We propose a novel algorithm, DAPPR, for resolving this issue and compare it against an existing solution LinkWaldo, along with breadth first search and a variant of KNN. The algorithms are evaluated by their ability of finding hidden edges on on real-world graphs, and it is shown that DAPPR outperforms all compared algorithms.

Preserving Availability in a Consensus Module Using Back Pressure:
In distributed systems, the consensus algorithm Raft is used to replicate a globally ordered log of entries. However, members that fall behind in replicating the log entries can cause system write unavailability. One reason for this write unavailability is that Raft needs a majority of members to replicate a log entry before it is accepted into the system.

Row vs. column data layout in a graph database query engine:
This thesis aims to examine if there is any performance improvement to be gained by changing the memory layout from row-wise to column-wise inside the Neo4j query engine. In order to test this a column-wise representation was created along with a new implementation for a few operators to better leverage the potential of the new memory layout, such as using SIMD. This change means that the query execution strategy is changed from the current approach, which relies upon fusing and compilation, to a vectorized approach instead.

Categorization of Cypher Queries to Improve Benchmark Coverage for Graph Databases:
Benchmarks are often used to find regressions to avoid performance dropping over time. To make benchmarks relevant for a product, the benchmarks should mirror the users’ needs and uses of functionality. To achieve this, user data can be used as a foundation when creating new benchmarks and thus improving the coverage. This thesis was carried out at Neo4j which develops the most frequently used graph database. Using data from their database as a service (AuraDB), we focused on finding a way to improve the coverage of the benchmark suite run by them.

We tackle challenges in:

  • Concurrency and parallelism
  • Distributed systems and fault tolerance
  • Language design and type systems
  • Performance tuning and benchmarking
  • Cloud architecture and service design
  • Site Reliability Engineering and cloud automation
  • Continuous Integration and Continuous Delivery
  • Graph algorithms and machine learning

Please send us a description in English of:

  1. Your area of study
  2. Your thesis idea and the area of engineering that it corresponds to
  3. If you are not completely sure, that is okay - please let us know if you would like to find out more information
  4. If you are applying as a group, please apply separately and indicate who you are applying together with in your Cover Letter.
Why Join Neo4j?

Neo4j is, without question, the most popular graph database in the world. We have customers in every industry across the globe, and our products are a proven product/market fit. Joining our team is an opportunity to shape the future of data and analytics. Below are just a few exciting facts about Neo4j.

  • Neo4j is one of the fastest scaling technology companies in this industry. Well over $100M ARR and still rapidly growing.
  • Raised biggest round of funding in all of database history ($325M Series F).
  • Backed by world class investors like Google Ventures (GV), Neo4j has raised over $582M in funding and is currently valued at $2Bn. This puts them among the most well-funded database companies in history.
  • 75% of Fortune 100 use Neo4j with more than 800 enterprise customers including Comcast, eBay, Adobe, Lyft, UBS, IBM, Volvo Cars and many more.
  • Emil Eifrem (CEO) has built an amazing culture that prides itself on relationships, inclusiveness, innovation and customer success.
  • Countless awards in the industry. Massive Enterprises and individual developers/ data scientists love Neo4j. Strong sense of community and ecosystem is built around the platform.
  • A recent Forrester Total Economic Impact Study pegged Neo4j as delivering 417% ROI to customers.

Research shows that members of underrepresented communities are less likely to apply for jobs when they don’t meet all of the qualifications. If this is part of the reason you hesitate to apply, we’d encourage you to reconsider and give us the opportunity to review your application. At Neo4j, we are committed to building awareness and helping to improve these issues.

One of our central objectives is to provide an inclusive, diverse, and equitable workplace for everyone to develop their potential and have a positive, career-defining experience. We look forward to receiving your application.

Neo4j Values:

Neo4j is a Silicon Valley company with a Swedish soul. We foster collaboration and each of us is empowered to contribute and put our innovative stamp on projects. We hire candidates who reflect the following Neo4j core values:

(we)-[:THRIVE_IN]->(:Culture {type: [‘Open’, ‘Inclusive’]})
(we)-[:ASSUME]->(:Intent {direction:’Positive’})
(we)-[:WELCOME]->(:Discussions {nature: ‘IntellectuallyHonest’})

Neo4j is committed to protecting and respecting your privacy. Please read the privacy notice regarding Neo4j's recruitment process to understand how we will handle the personal data that you provide.

More information at