Amazon Relational Database Service (RDS) is a feature of Amazon’s cloud computing platform that allows customers to establish an installation of the MySQL Relational Database Management System (RDBMS) that is running “in the cloud.” This installation of MySQL functions exactly like MySQL installed on conventional servers–existing software requires no changes and new software requires no special design considerations. However, unlike conventional servers, Amazon RDS has tremendous advantages in terms of scaling and fault tolerance that are of tremendous value.
Amazon RDS Scaling
Scaling refers to the ability to increase the capacity of the database server to meet customer demand. As customer demand increases, database servers may be unable to keep up with this demand. They will become slower and slower as their capacity to serve requests is exceeded and eventually will begin dropping requests and causing errors. Amazon RDS provides cloud-based tools for scaling the database to head off this trouble in the form of both horizontal and vertical scaling.
Horizontal scaling refers to increasing the capacity of the database by adding more machines. Amazon RDS facilitates this by providing a simple way to implement database replication. With database replication, an installation of Amazon RDS serves as the “master” and one or more instances of Amazon RDS are set up as separate and distinct “read replicas.” All updates to the database take place on the single master instances, and those updates propagate out to the read replicas. Because the typical choke point with database-driven web applications is in many simultaneous attempts to read the database, this provides the value benefit of sharing the load of retrieving the data among multiple servers. Amazon RDS supports establishing as many read replicas as necessary.
One drawback with database replication is that it is asynchronous. Because updates to the database must propagate out from the master to the read replicas, and because this is not instantaneous, the data on the read replicas may differ from the data on the master. Special consideration is required in the design of a software application where this type of scaling is utilized to prevent data inconsistency errors.
Another type of horizontal scaling of database is called “sharding”. Sharding is the act of splitting up pieces of the database to separate servers so that a single server doesn’t have to bear the entire load. A database scaled in this manner does not suffer the disadvantages of being asynchronous, but sharding requires very special design considerations, both in terms of database and software design. For extra scaling power, this may be employed in tandem with database replication, where each piece of the database is being read by its own set of read replicas.
Vertical scaling refers to increasing the database capacity without adding more machines. This usually means faster processors, more RAM, bigger disks, faster network connections, and so on. Because the instance of Amazon RDS is running on the cloud, increasing the amount of RAM, number of processor cores, or disk space is a simple matter and may be done “on-the-fly.”
Amazon RDS has an option called a Multiple Availability Zone, or Multi-AZ deployment. This means that instead of a single server instance, an additional server instance with identical configuration is established at a geographically distinct location. If the primary server is ever unreachable (network outages or hardware failures), this second server immediately takes over in a completely transparent manner. The second server’s data is “synchronous” with the primary server, meaning the databases are always identical. This provides an effective backup of the database and goes a very long way towards ensuring the database is always available.
Recent Amazon Outage
In April of 2011, Amazon suffered a much-publicized outage of aspects of its cloud computing platform, illustrating quite dramatically that even their system is not perfect. At a high level, the outage occurred in multiple availability zones in their “US-EAST-1” region. This caused some high profile web sites to be unavailable and some actual permanent data loss. Although outages of this nature are rare, they are a risk that must be accounted for. Risks of this type may be mitigated by ensuring availability zones are in distinct broad geographic regions and having a backup plan for the database that is independent of Amazon RDS.
Combining Scaling with Fault Tolerance
Scaling and fault tolerance may be combined to maximize the availability of databases to clients. In an environment with a single master and multiple read replicas, the master may be configured as a Multi-AZ deployment. In the event the master is unavailable, its invisible backup will immediately take over its duties and changes will automatically propagate from it to the replicas. It is also possible to configure the read replicas as Multi-AZ deployments, though the advantage of doing so is not as obvious. Multiple read replicas serve as available backups of each other, though they may not be located in separate geographic locations (which increases risk).
Finding the right solution
Boomcycle employs benchmarking and load testing of the Amazon RDS instances using real data scenarios from the application to determine capacity. These data are balanced with the requirements and budget of the client to arrive at an ideal configuration to meet clients’ needs.