Today I used s3-parallel-put to send files to S3. The directory I was working with contained millions of small files. Using the standard s3cmd with the sync option never seemed to finish and without any error messages. With s3-parallel-put I can push files in parallel even controlling the number of processes.
Using the command:
python /usr/bin/s3-parallel-put --content-type=guess --processes=30 --verbose --bucket=[YOUR BUCKET] /uploads/ >> /tmp/backup/log.txt 2>&1
The only tricky part here is the “guess” option. This basically tells AWS to guess the content type of the object you are uploading. AWS needs this information when it retrieves the object. Web browsers do most of the retrieving and they want headers! (which include content-types).
Also in the examples in the github project there is a “PREFIX”. I still have no idea what it is.
You may need to install boto, if so this is what I did (Using Linux):
875 2015-05-18_12:41:14 sudo easy_install pip 874 2015-05-18_12:41:24 pip install boto 876 2015-05-18_12:41:43 sudo pip install boto