Density is a DBA's Best Friend: Data Capacity Improvements for Cassandra

In the world of Cassandra, there is an implementation rule-of-thumb that says the amount of data per node should be limited to 2TB. The motivation for this standard is performance. Cassandra database nodes slow down considerably when READ request volume is high. By limiting the amount of data per node, one hopes to limit the amount of READ requests on any single database node, spreading the overall load across many nodes in a cluster.

But there are problems with this rule-of-thumb.

The first is obvious – if you have a lot of data, say 10s of terabytes or 100s of terabytes or petabytes, spreading it into 500 GB - 2TB chunks creates a very sparse data distribution and requires several database nodes. And as new data arrives or is generated, you need more and more nodes. This can get extremely costly very quickly, and it is no secret that the more complex the application or the greater number of sources from which it receives data, the quicker that overall data set grows. AI, ML and IoT/edge application data sets can grow by TBs a day.

The second problem with the rule-of-thumb is that it is only a rule of thumb. There are no physics or architectural limitations driving this number. It is just a number gained through experience. It represents a kind of average of real-world experience. Like a lot of averages, the number may not apply optimally to any specific situation. If you have low-cost and low-performance database nodes, the amount of data per node should probably be even lower. If your data is not evenly distributed in terms of access volume – if you have “hot spots” of data that are accessed much more often than other data - then simply spreading the data out over more database nodes may not help if the hot spot data still resides on a single node or a limited number of nodes. In that case, some combination of data replication along with data spreading is probably needed. Which means even more nodes. Even worse, it means understanding your data and access pattern in great detail in order to execute complex deployment schema across many nodes. As data is added and user patterns change, you need to continually monitor, learn, and adjust. Is this what you want your best and brightest spending their time on?

FPGA Backed Database Acceleration vs. CPU-Based Servers

The necessity for distributing your data is due to the performance limitations of standard CPU-based server in executing the data-centric functions that databases are asked to do more than any others – terminate TCP/IP connections, parse READ requests, access stored data, and deliver that data back through TCP/IP connections. These inherent limitations of CPUs greatly limit the amount of READ requests a CPU-based database node can handle. This generally means it can serve less data, which in turn results in:
• sprawling and broad data deployments
• large and ever growing clusters
• continuous management headaches while ensuring nodes aren’t overloaded in an often failed attempt to prevent poor performance and user dissatisfaction

But what if database nodes did not have to service any READ requests at all? This can be achieved by front-ending database nodes with FPGA-based data engines that cache all the data and service all the READ requests. Being FPGA-based, such data engines can deliver data with superior scalability versus CPU-based database servers, and at much lower latency even at high loads.

Two or three such engines can service READs that might require fifty or more database nodes. What that means is that the Cassandra rule-of-thumb for 2 TB goes out the window. The primary design decision now is based on the write capacity of the back-end cluster which means that you can probably increase the amount of data per node by 10x-20x. Imagine the cost savings associated with that. Perhaps even more importantly, you no longer need to worry about hot spots, data replication, careful and nuanced understanding of usage patterns, or disgruntled user complaints.

Data Density Improvements for Apache Cassandra

We ran a data density benchmark in our lab to prove this concept; that by offloading READ requests from a traditional CPU to a highly optimized processor – an FPGA -through a coordinating solution -rENIAC - data density would increase.
The benchmark setup is below.

Bare Metal Baseline:
Read/write ratio - 80:20
Data payload size (per request): 20KB
Total memory: 574GB

With rENIAC Data Engine:
Read/write ratio - 80:20
Data payload size (per request): 20KB
Total memory: 5.9TB

The results with rENIAC Data Engine in the mix are 2.8x higher throughput and 10.3x more data. This means by offloading those critical operations and READ requests onto rENIAC’s FPGA-based data accelerator, users can see 2.8x higher throughput while also increasing data density by more than 10x. This is a 30x improvement, overall.

They say that one of nature’s most dense objects – the diamond – can be a best friend. But in IT, rENIAC is a data professional’s best friend.

To see rENIAC in action, book a live demo.

Book a live demo