Amazon Aurora: A production Horror story

We moved to Amazon Aurora to save costs, lower replication time and to help our database performance.  We planned to shut down read only $slaves during the evening. During the day we would add these read only slaves back.

Before Aurora “Cluster”

With vanilla mysql master-slave setup the code had to know which servers were read only slaves. Historically, we would specify a list of read only servers:

 ['hostnames'][] => 'aurora-rds.slave1.us-east-1.rds.amazonaws.com',
 ['hostnames'][] => 'aurora-rds.slave2.us-east-1.rds.amazonaws.com',
 ['hostnames'][] => 'aurora-rds.slave3.us-east-1.rds.amazonaws.com',
 ['hostnames'][] => 'aurora-rds.slave4.us-east-1.rds.amazonaws.com',
...

To then create a connection to a reader we would pick a read instance at random.

// Create connection
$rand = rand(0,count($hostnames['hostnames']) - 1); 
$conn = new mysqli($servername, $username, $password, $hostnames['hostnames][$rand]);

The problem 1: nothing expands and contracts

The specified servers for reading do not expand and contract.  If load dramatically increases, you would need to add new servers through the RDS console then add the host strings to your $hostnames. Due to size, adding servers typically would take us 2-4 hours.

On the other hand, reducing the load would required us to carefully drop machines and push code to remove the connection strings.

Solution, Amazon Aurora: Read only cluster

We were excited to see amazon aurora offered a read only connection string.  Instead of specifying the read db database strings:

 ['hostnames'][] => 'aurora-rds.slave1.us-east-1.rds.amazonaws.com',
 ['hostnames'][] => 'aurora-rds.slave2.us-east-1.rds.amazonaws.com',
 ['hostnames'][] => 'aurora-rds.slave3.us-east-1.rds.amazonaws.com',
 ['hostnames'][] => 'aurora-rds.slave4.us-east-1.rds.amazonaws.com',
...

We could now just use one read only connection string

Amazon Aurora Clusters Production

As you add Aurora slaves, they are added to the pool of machines for that one read only connection string.  You can click for details to see your one connection string.

Under cluster details you will get a connection string, sweet we now have it

 ['hostnames'][] => 'aurora-rds.cluster.read-1.rds.amazonaws.com'

Amazon Aurora: Let’s start dropping databases

Well. It was late at night load was light. Let’s see what happens if we nuke a reader. What could go wrong? We deleted a machine:

Screen-Shot-2016-10-30-at-10.06.08-AM

With the machine deleting, we thought since it was a ‘cluster’ amazon would just redirect the reads to another machine.  We continued to high-five.

Shit just got real.

Ok.  PageDuty page comes in.  Sweating. Second page comes inApparently, amazon continues to send read requests to a bad slave.

amzon-aurora-prodution-stop-reader

 Our site starts to timeout.

Screen Shot 2016-10-25 at 11.54.31 AM

Soon within 5 minutes, the site stabilized.

Thankfully, things cleared after 10 minutes.  So **warning** amazon aurora clusters are not meant to act like a cluster.

2 thoughts on “Amazon Aurora: A production Horror story

  1. jeff

    WIith aurora, how did you connect your instances not on a VPC to the instance?

    Did you read about this?
    “You can communicate with an EC2 instance that is not in a VPC and an Amazon Aurora DB cluster using ClassicLink. For more information, see A DB Instance in a VPC Accessed by an EC2 Instance Not in a VPC.”

    Reply
    1. James Ransom Post author

      We used classic link. You can connect the Aurora instance to classic link by using their API. Currently, Amazon disabled adding new instances to classic link. API example:

      // Create a service builder using a configuration file
      $aws = Aws::factory(array(
      'region' => $region,
      'key' => self::KEY,
      'secret' => self::SECRET
      ));

      // Get the client from the builder by namespace
      $client = $aws->get('Ec2');
      debug($instance_id, "INSTANCE ID");
      $client->attachClassicLinkVpc(array(
      'DryRun' => false,
      // InstanceId is required
      'InstanceId' => $instance_id,
      // VpcId is required
      'VpcId' => VPCID,
      // Groups is required
      'Groups' => array(GROUP),
      ), true);

      “If you don’t have a default VPC or you have not created a VPC, you can have Amazon RDS automatically create a VPC for you when you create an Aurora DB cluster using the RDS console. Otherwise, you must do the following”

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *


9 − = one

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>