Tech News
← Back to articles

We still chose C++ (instead of Rust) for new database development

read original related products more articles

We have recently introduced EloqKV, our distributed database product built on a cutting-edge architecture known as Data Substrate. Over the past several years, the EloqData team has worked tirelessly to develop this software, ensuring it meets the highest standards of performance and scalability. One key detail we’d like to share is that the majority of EloqKV’s codebase was written in C++.

Had we launched our product a decade ago, using C++ would have been an obvious and unremarkable choice. However, it's 2024, and the landscape has changed. Today, languages like Rust, Zig, and other type-safe options like Golang are considered modern and trendy for systems programming. So, when we chose C++, a language that some might view as outdated or less "cool", or even bug-prone and "unsafe", it’s natural for people to wonder why.

In this article, we’d like to share the thought process behind our decision to choose C++ over some of the newer, more fashionable languages, the historical lessons we drew inspiration from, and the upcoming progress we expect in the future.

Selecting the right programming language is crucial for any software project, but it becomes even more significant for complex systems software such as databases. The choice of language influences various aspects, including performance, ease of development, and maintainability. In a domain where efficiency and reliability are paramount, the programming language serves as the foundation upon which the entire system is built.

For databases, the implications of this choice are profound. A database must be capable of handling vast amounts of data while providing fast query responses and ensuring data integrity. These requirements necessitate a language that not only excels in performance but also allows for scalable and efficient development practices. Additionally, databases often undergo continuous development and enhancement over decades, making maintainability a critical factor. A well-chosen language can simplify the process of updating and expanding the software's features over time, ensuring that it remains relevant and effective in an ever-evolving technological landscape.

Consider the Hadoop big data stack, which is predominantly built on the Java Virtual Machine (JVM). While Java and JVM ecosystems have been one of the most popular programming language families and were lauded for their portability and rich features, in retrospect, this choice may not have been without controversy. The performance and memory overhead of the JVM, particularly issues related to garbage collection, has caused numerous challenges for developers. Indeed, RedPanda and ScyllaDB are notable examples of rewriting mature, widely-used Java-based frameworks—Kafka and Cassandra, respectively—in C++ from scratch to avoid the JVM penalties.

Another important consideration is the popularity of the programming language and the availability of developers familiar with it. For instance, Spark and Kafka are developed using Scala, while Couchbase and Rabbitmq are in Erlang. Although these languages offer robust features and capabilities, they are not as widely adopted as other programming languages. This relative lack of popularity can create challenges when it comes to larger-scale developer engagement and finding experienced programmers. Toolchain support is generally not on par with more popular programming languages. A less common language may result in increased difficulty in recruiting talent, slowing down development processes and limiting community support for troubleshooting and innovation.

By the late 2010s, Rust emerged as one of the leading programming languages for developing database software. Newer projects such as TiDB, RisingWave, DataFusion, and NeonDB are prominent examples that leverage Rust's capabilities to build efficient and high-quality databases. Notably, RisingWave even published a blog post detailing their decision to discard ten months of work in C++ to rewrite their entire codebase in Rust. Given that EloqData began its journey around 2021, when Rust was already well-established as a robust programming language with excellent features for building safe and performant databases, one might wonder why we opted for C++ instead.

When we began our project, we were keenly aware that Rust was a highly competitive language for building the foundations of our database. Our eventual decision to choose C++ was based on three main factors.

The first strength of C/C++ lies in its database ecosystem support. Most existing and popular databases are developed in C/C++, providing a wealth of resources and innovations we could leverage. Our Data Substrate technology aims to create a unified, modular architecture that can capitalize on these existing resources while avoiding the need to reinvent the wheel. Although Rust offers good interoperability with C/C++, its memory management model and certain safety restrictions can complicate integration with many established projects.

... continue reading