Category Archives: AWS

Amazon Aurora: A production Horror story

We moved to Amazon Aurora to save costs, lower replication time and to help our database performance.  We planned to shut down read only $slaves during the evening. During the day we would add these read only slaves back.

Before Aurora “Cluster”

With vanilla mysql master-slave setup the code had to know which servers were read only slaves. Historically, we would specify a list of read only servers:

 ['hostnames'][] => '',
 ['hostnames'][] => '',
 ['hostnames'][] => '',
 ['hostnames'][] => '',

To then create a connection to a reader we would pick a read instance at random.

// Create connection
$rand = rand(0,count($hostnames['hostnames']) - 1); 
$conn = new mysqli($servername, $username, $password, $hostnames['hostnames][$rand]);

The problem 1: nothing expands and contracts

The specified servers for reading do not expand and contract.  If load dramatically increases, you would need to add new servers through the RDS console then add the host strings to your $hostnames. Due to size, adding servers typically would take us 2-4 hours.

On the other hand, reducing the load would required us to carefully drop machines and push code to remove the connection strings.

Solution, Amazon Aurora: Read only cluster

We were excited to see amazon aurora offered a read only connection string.  Instead of specifying the read db database strings:

 ['hostnames'][] => '',
 ['hostnames'][] => '',
 ['hostnames'][] => '',
 ['hostnames'][] => '',

We could now just use one read only connection string

Amazon Aurora Clusters Production

As you add Aurora slaves, they are added to the pool of machines for that one read only connection string.  You can click for details to see your one connection string.

Under cluster details you will get a connection string, sweet we now have it

 ['hostnames'][] => ''

Amazon Aurora: Let’s start dropping databases

Well. It was late at night load was light. Let’s see what happens if we nuke a reader. What could go wrong? We deleted a machine:


With the machine deleting, we thought since it was a ‘cluster’ amazon would just redirect the reads to another machine.  We continued to high-five.

Shit just got real.

Ok.  PageDuty page comes in.  Sweating. Second page comes inApparently, amazon continues to send read requests to a bad slave.


 Our site starts to timeout.

Screen Shot 2016-10-25 at 11.54.31 AM

Soon within 5 minutes, the site stabilized.

Thankfully, things cleared after 10 minutes.  So **warning** amazon aurora clusters are not meant to act like a cluster.

AWS EC2: Moving /var to EBS

From Tamas at work! not mine! :0)

Moving var to /mnt on aws. Often if you have the default storage size set when you create a new instance it is 8gig. To move your var director to another mount point:

rm -rf /mnt/*
cd /var
cp -ax * /mnt/
cd /
mv var var.old
mkdir var
umount -l /dev/xvdb
mount /dev/xvdb /var
nano /etc/fstab # make sure it will automount after restart
rm -rf /var.old