Connecting to Amazon DocumentDB as a Replica Set
Amazon DocumentDB has three endpoints that you can use to connect to your cluster:
Reader endpoint
Instance endpoints
In most cases when you connect to Amazon DocumentDB, we recommend that you use the cluster endpoint. This is a CNAME that points to the primary instance in your cluster, as shown in the following diagram.
When using an SSH tunnel, we recommend that you connect to your cluster using the cluster endpoint and do not attempt to connect in replica set mode (i.e., specifying in your connection string) as it will result in an error.
Note
For more information about Amazon DocumentDB endpoints, see Amazon DocumentDB Endpoints.
The advantage of connecting as a replica set is that it enables your SDK to discover the cluster topography automatically, including when instances are added or removed from the cluster. You can then use your cluster more efficiently by routing read requests to your replica instances.
When you connect as a replica set, you can specify the readPreference
for the connection. If you specify a read preference of secondaryPreferred
, the client routes read queries to your replicas and write queries to your primary instance (as in the following diagram). This is a better use of your cluster resources. For more information, see .
Reads from Amazon DocumentDB replicas are eventually consistent. They return the data in the same order as it was written on the primary, and there is often less than a 50 ms replication lag. You can monitor the replica lag for your cluster using the Amazon CloudWatch metrics DBInstanceReplicaLag
and . For more information, see Monitoring Amazon DocumentDB with CloudWatch.
Unlike traditional monolithic database architecture, Amazon DocumentDB separates storage and compute. Given this modern architecture, we encourage you to read scale on replica instances. Reads on replica instances don’t block writes being replicated from the primary instance. You can add up to 15 read replica instances in a cluster and scale out to millions of reads per second.
The key benefit of connecting as a replica set and distributing reads to replicas is that it increases the overall resources in your cluster that are available to do work for your application. We recommend connecting as a replica set as a best practice. Further, we recommend it most commonly in the following scenarios:
You’re using nearly 100 percent CPU on your primary.
You reach the connection or cursor limits for an individual instance.
For more information, see the following topics:
Consider the scenario of using all the connections in your cluster. For example, an r5.2xlarge
instance has a limit of 4,500 connections (and 450 open cursors). If you create a three-instance Amazon DocumentDB cluster and connect only to the primary instance using the cluster endpoint, your cluster limits for open connections and cursors are 4,500 and 450 respectively. You might reach these limits if you’re building applications that use many workers that get spun up in containers. The containers open up a number of connections all at once and saturate the cluster.
Instead, you could connect to the Amazon DocumentDB cluster as a replica set and distribute your reads to the replica instances. You could then effectively triple the number of available connections and cursors available in the cluster to 13,500 and 1,350 respectively. Adding more instances to the cluster only increases the number of connections and cursors for read workloads. If you need to increase the number of connections for writes to your cluster, we recommend increasing the instance size.
Note
The number of connections for large
, , and 2xlarge
instances increases with the instance size up to 4,500. The maximum number of connections per instance for 4xlarge
instances or greater is 4,500. For more information on limits by instance types, see .
Typically we don’t recommend that you connect to your cluster using the read preference of secondary
. This is because if there are no replica instances in your cluster, the reads fail. For example, suppose that you have a two-instance Amazon DocumentDB cluster with one primary and one replica. If the replica has an issue, read requests from a connection pool that is set as secondary
fail. The advantage of secondaryPreferred
is that if the client can’t find a suitable replica instance to connect to, it falls back to the primary for reads.
In some scenarios, reads in an application need to have read-after-write consistency, which can be served only from the primary instance in Amazon DocumentDB. In these scenarios, you might create two client connection pools: one for writes and one for reads that need read-after-write consistency. To do that, your code would look something like the following.
To better use the resources in your cluster, we recommend that you connect to your cluster using the replica set mode. If it’s suitable for your application, you can read scale your application by distributing your reads to the replica instances.