Category Archives: S3cmd

s3-parallel-put: Move files to AWS S3 Fast

Today I used s3-parallel-put to send files to S3.   The directory I was working with contained millions of small files.  Using the standard s3cmd with the sync option never seemed to finish and without any error messages. With s3-parallel-put I can push files in parallel even controlling the number of processes.

Using the command:

python /usr/bin/s3-parallel-put
--bucket=[YOUR BUCKET] 
 /uploads/ >> /tmp/backup/log.txt 2>&1

The only tricky part here is the “guess” option. This basically tells AWS to guess the content type of the object you are uploading. AWS needs this information when it retrieves the object. Web browsers do most of the retrieving and they want headers! (which include content-types).

Also in the examples in the github project there is a “PREFIX”. I still have no idea what it is.

You may need to install boto, if so this is what I did (Using Linux):

  875  2015-05-18_12:41:14 sudo easy_install pip
  874  2015-05-18_12:41:24 pip install boto
  876  2015-05-18_12:41:43 sudo pip install boto