With the rapid growth of data, customers are looking for ways to build data applications without wanting to learn about the intricacies and advanced techniques required to tune and scale their databases. To this end, Amazon Keyspaces delivers scalable, highly available and managed Apache Cassandra-compatible database service. Amazon Keyspaces is serverless, so you pay for only the resources you use and the service can automatically scale tables up and down in response to application traffic.
New hardware acceleration instances such as AWS F1 are available to provide dramatic performance boost that delivers performance beyond what is achievable using traditional CPU-based software approaches. AWS F1 instances use FPGAs to deliver hardware acceleration. FPGAs are seeing increasing adoption in the cloud for data acceleration – Amazon Redshift Queries are accelerated using AQUA, Advanced Query Accelerator using FPGAs.
rENIAC, in partnership with Xilinx and AWS (press release of rDE F1 launch), released its patented Data Engine to hardware accelerate Apache Cassandra NoSQL database using AWS F1 instances. rENIAC Data Engine for Cassandra NoSQL database is available in the AWS Marketplace.
As James Hamilton from AWS stated "I can give you any bandwidth you want. Its just parallel lanes and I can do it with anyone’s equipment and pay for. It is not absolutely hard to do. You know what is hard to do? Latency, that is physics. Bandwidth is money and latency is physics (and harder). So latency is key. When you move to HW, latency is fundamentally changed. I tell SW people, the thing you measure are called milliseconds, and if you put it in HW the thing they measure are nanoseconds and microseconds. So you are changing by big margins. " (Source: 2016 AWS re: Invent Talk: Tuesday Night Live with James Hamilton, https://video.cube365.net/c/928374)
Developers on AWS using Cassandra Query Language-based applications are leveraging Amazon Keyspaces to quickly build and scale applications without needing to develop significant experience and know-how of internals of the database. Now with rENIAC Data Engine they can tap into hardware acceleration when using Amazon Keyspaces. The following blog elaborates using Amazon Keyspaces with rENIAC Data Engine.
Introduction to rENIAC Data Engine
rENIAC's Data Engine is an acceleration solution for data centric workloads, for example NoSQL databases such as Cassandra, using FPGAs within cloud instances (e.g. AWS F1 instance). rDE has multiple deployment options for Cassandra and one of them being a transparent caching proxy that provides key benefits:
- CQL compatible, requiring no changes to the application(s)
- Write-through cache; auto-managed data consistency
- Decouples read and write pipeline, providing read SLA's in sub-milliseconds
- Scales out performance with a cluster of data proxy nodes
Figure 1 shows a simple illustration of how rDE is deployed as a caching proxy for a Cassandra cluster. The results can be astounding. When rDE nodes are added to an existing cluster, users experience predictably low latency, even at high loads.
We will now discuss rDE support for Amazon Keyspaces, starting with a brief introduction to Keyspaces first.
Introduction to Amazon Keyspaces
Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service in AWS. The main advantages of Amazon Keyspaces are that - you can run your Cassandra workloads on AWS using the same Cassandra application code and developer tools that you use today. You don’t have to provision, patch, or manage servers, and you don’t have to install, maintain, or operate software.
The following diagram shows the architecture of Amazon Keyspaces. A client program accesses Amazon Keyspaces by connecting to a predetermined endpoint (hostname and port number) and issuing CQL statements. rDE supports CQL versions 4 and 5 completely. An example client code is discussed in the subsequent section.
Figure 2 (need permission from AWS: have to ensure we can copy this picture or reproduce it as is)
rDE for Keyspaces
Traditionally, Reniac has been used to supercharge Open Source Databases but with managed services built on Open Source Technology, such as Amazon Keyspaces, we see new and interesting architectures that benefit our mutual customers.
Connecting to rDE from application code involves changing the IP address to point to rENIAC cluster, port number and TLS configuration in your application.You will also need to configure rDE to connect to Amazon Keyspaces.
The example below shows how to connect to rDE using an open source Datastax Python driver for Cassandra rDE will connect to Amazon Keyspaces as a client to enable a transparent (caching) proxy deployment.
from cassandra.cluster import Cluster from ssl import SSLContext, PROTOCOL_TLSv1_2, CERT_REQUIRED from cassandra.auth import PlainTextAuthProvider import boto3 from cassandra_sigv4.auth import SigV4AuthProvider
ssl_context = SSLContext(PROTOCOL_TLSv1_2) ssl_context.load_verify_locations('<path_to_file>/sf-class2-root.crt') ssl_context.verify_mode = CERT_REQUIRED
#use this if you want to use Boto to set the session parameters. #rENIAC cluster will be available as a service within Boto #boto_session = boto3.Session ( rENIAC_access_key_id="<path_to_key>", rENIAC_secret_access_key="<path_to_secret_key>", rENIAC_session_token="<path_to_token>") #auth_provider = SigV4AuthProvider(boto_session)
#Use this instead of the above line if you want to use the Default Credentials and not bother with a session. auth_provider = SigV4AuthProvider()
cluster = Cluster(['<ip_address_for_rENIAC_cluster>'], ssl_context=ssl_context, auth_provider=auth_provider, port=8002)
session = cluster.connect()
#Issue CQL queries based on your schema(s) r = session.execute('select * from example_schema.keyspaces')
There are two main reasons for customers to consider rDE: first, as a performance enhancement and TCO saving solution. Second, as a means to migrate workloads from Apache Cassandra to Amazon Keyspaces.
rDE for performance enhancement and TCO saving
From a performance enhancement perspective, there are two main metrics that are enhanced by use of rDE:
- Improved Latency - Amazon Keyspaces provides consistent single digit millisecond latencies at any scale. For customers looking for lower latency and sub millisecond latency they can leverage reniac to boost performance for Amazon Keyspaces.
- Increased Throughput (for partitions) - To utilize the full capacity of an Amazon Keyspaces table, throughput is expected to be well distributed across the partitioned dataset. With skewed access or zipfian distributions customers can see increased throttling on frequently accessed keys. With rENIAC, frequently accessed keys would be cached allowing for greater throughput.
Finally from a TCO savings perspective, Amazon Keyspaces is pay per usage and cost reduction can come from caching frequent reads to the same data. The ability to cache frequently accessed items or recently inserted data will reduce cost capacity required for Amazon Keyspaces lowering cost for the end user.
In this blog we have described how rDE can be used with Amazon Keyspaces and the benefits of using rDE for with Amazon Keyspaces. A demo is available for interested customers. Please contact email@example.com to see the demo.
“Customers can stand up Cassandra clusters in minutes and scale their database up and down with ease using Amazon Keyspaces and it’s serverless architecture. It is important to provide these customers an option to tap into performance available beyond traditional software optimizations. That’s the need delivered by this combined solution using rENIAC Data Engine and Amazon Keyspaces. Low predictable latency drives business KPIs, especially for use-cases that involve session data and those can be met or exceeded with hardware accelerated cache of the Data Engine.” Chidamber Kulkarani, CTO, rENIAC.