Amazon DynamoDB

Dirk BrysBig Data

DynamoDB is a NoSQL database provided by Amazon Web Services (AWS). It provides extremely high performance with requests in trillions per day and peaks in the millions per second. It supports virtually any size with horizontal scaling. DynamoDB is cloud-native and integrates very well with other AWS services like Amazon Redshift, Amazon EMR, Amazon S3 and Amazon Cognito.

A key-value store with document-oriented features

Main characteristics

DynamoDB is in its core a key-value store but also supports document-oriented database features. It provides full support for JSON (Javascript Object Notation) values with nested levels up to 32. DynamoDB supports queries on those values through pre-defined indices. It doesn’t provide the full query power of full-fledged document-oriented databases like MongoDB. But performance remains practically unchanged regardless of the query structure and database size.

And that’s one of its strengths. If you know upfront the query requirements for your use case, DynamoDB can be a perfect fit for a 24h 7 days a week solution.

DynamoDB is also schema-less: you can’t define a schema to which the data contents need to adhere to. This makes it by default a true NoSQL database. DynamoDB is also partially ACID compliant: ACID transactions apply to table data only, not to indexes.

To apply DML and DQL operations DynamoDB supports PartiQL. PartiQL is a common language that provides SQL-compatible access to relational, semi-structured, and nested data.

Security

Amazon DynamoDB encrypts all stored user data at rest automatically using encryption keys stored either in DynamoDB, AWS KMS or managed by the customer.

AWS provides full authentication and access security through AWS IAM and policies, including federated access through a directory server like Amazon Cognito or your own enterprise directory server.

Querying in DynamoDB

The basis of a query is defined by a Get operation on the key. Primary keys in DynamoDB are however limited to the partition key and the sort key which limits query flexibility. But if you use DynamoDB merely as a true key-value store, that should be enough.

It is possible to extend the basic query functionalities by applying Global Secondary Indices (GSIs) or Local Secondary Indices (LSIs). But there’s a limit in how many indices can be built: 20 for GSIs and 5 for LSIs per table. DynamoDB indices are immutable; you have to create a new table with the new name and drop/delete the old one.

The Query operation then comes to the rescue. This query operation runs on a table or an index. The result set however cannot exceed 1Mb per query. So you need subsequent queries when more data needs to be retrieved.

For more elaborated queries, you inject data in other AWS services like Redshift or export those data to S3.

So for some flexibility in querying, indices are required. But this also comes with a cost. Indices are sized, billed & provisioned separately from data. This makes it also more complex to manage the system.

Other limitations

DynamoDB also supports only a limited set of datatypes and has limits on the data size you can store for your values (400kb per key). DynamoDB supports string, number, binary, list, set and map datatypes.

It is important to understand these limitations to make a well thought decision when selecting the appropriate database for your use case. But it’s not because DynamoDB has limitations, it couldn’t be a perfect fit for your use case.

It’s strength lies in its capability of extending its scale without limits and it’s replication support. This makes it an ideal serverless database that will be up-and-running all the time, 24h 7.

However, as with any AWS service, you also need to take into account cost increase when volume or transactions increase. So be sure to investigate the related costs which is also an important decision criterion.

Horizontal scaling in DynamoDB

To achieve horizontal scaling, DynamoDB assigns data to different partitions hosted on distinct machines. DynamoDB uses a cluster of machines where each machine is responsible for storing a set of the data. Whenever DynamoDB adds a machine to the DynamoDB cluster DynamoDB randomly assigns it an integer value. You can consider this a kind of distribution key offset.

When we insert a key-value pair to DynamoDB, the key will be hashed into an integer. Next it will be stored on the machine that meets the distribution key offset. Suppose machine A has a distribution key offset of 1000 and machine B has a distribution key offset of 3000. DynamoDB stores all hashed keys and corresponding data in the range of 0 – 1000 on machine A and all hashed keys and corresponding data in the range of 1000 – 3000 on machine B.

In case we receive a lot of keys that are hashed in the distribution key offset of 1000 – 3000, that machine will probably get overloaded. What DynamoDB then does is assigning another machine with distribution key offset of 2000 and hence evenly spread data between both machines. Hence DynamoDB can spread and merge data and hence scalability becomes unlimited. And this all comes out-of-the-box: you don’t have to worry about how this actually works. DynamoDB auto-scales automatically.

Replication in DynamoDB

In DynamoDB each machine has complete knowledge of the distribution key offsets of all machines inside the cluster.

For replication, DynamoDB replicates amongs AWS availability zones and AWS Regions and supports either eventual consistency or strong consistency:

  • With eventual consistency the response might not reflect the results of a recently completed write operation. The response might include some stale data. If you repeat your read request after a short time, the response should return the latest data.
  • With strong consistency, the response returns the most up-to-date data, reflecting the updates from all prior write operations that were successful. However it might not be available if there is a network delay or outage and hence return an error and it may have higher latency.

Typical use cases for Amazon DynamoDB

Amazon DynamoDB could be an ideal choice for systems that need to store images linked to a kind of key. A typical example is a product catalogue. Hence websites can use AmazonDB as their backbone to serve product data.

Let’s look at a more advanced example: GE Health Cloud. GE Healthcare Systems provides access to a single portal for healthcare professionals all over the US to process and share images of patient cases. They use DynamoDB to store all these images.

Other examples where DynamoDB can be a perfect fit are IoT edge device configurations, product catalogs, single transaction data etc.

Basically, if you need an OLTP backbone in the cloud that needs to be available 24hrs 7 days a week, Amazon DynamoDB is a perfect fit. However if you require an OLAP solution, don’t go for Amazon DynamoDB. Also if you upfront know that you require a lot of different kind of queries on your data, DynamoDB will not suit your use case.

Amazon DynamoDB truly serves as a key-value store with extended document-oriented features.

Key take-away:

Amazon DynamoDB is an enhanced key-value store with full JSON support and unlimited scaling and performance capabilities. It is a perfect fit for OLTP applications that do not require elaborated query capabilities.

Keep an eye on costs though if you require quite a number of different query types!

Want to know more?

Get in Touch