The general tutorial from Heroku: https://devcenter.heroku.com/articles/s3-upload-python
The main benefit for this approach is that AWS will now handle the process of the actually uploading, so that our server does not need to worry about the performance and it’s scalable.
Basic work flow:
- Setup AWS credential for your instance so that it has the access to your S3 bucket
- Create your S3 bucket to be used to upload. (CROS need to be set for the client to do the upload)
- Create an API in your server for the client to get the generated presigned S3 upload URL:
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.generate_presigned_post
- https://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlUploadObject.html
- The client uses the URL to upload files and get the file S3 readable URL
- https://ant.design/components/upload/
- The client make another call to the server to save the url to the database
More details
S3 Bucket
You don’t need to make the S3 bucket public, you can just keep the policy empty. For the CROS, set it to allow traffic from your own domain, can you might also allow the header of “Location”, the value of which will be the newly created URL
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>*</AllowedOrigin>
<AllowedMethod>HEAD</AllowedMethod>
<AllowedMethod>GET</AllowedMethod>
<AllowedMethod>POST</AllowedMethod>
<ExposeHeader>ETag</ExposeHeader>
<ExposeHeader>Location</ExposeHeader>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
Double check your credential
You will be able to create the presaged S3 URL even you don’t have a correct credentials, but the URL will not be valid. That what got me stuck when I kept getting “Access Denied” when trying to upload. If that happens to you, double check your credentials. (https://stackoverflow.com/a/55293112/980050)
Adding Restrictions
By pass the uploading process all to AWS, we lost the control of what actual files the users are able to upload. But still, AWS provides setting to restrict content length and file type or the acl (https://stackoverflow.com/questions/13390343/s3-direct-upload-restricting-file-size-and-type):
s3.generate_presigned_post(
Bucket=bucket_name,
Key=bucket_key,
Fields={"acl": "public-read", "Content-Type": file_type},
Conditions=[
{"acl": "public-read"},
{"Content-Type": file_type},
["content-length-range", 0, maximum_file_size]
],
ExpiresIn=expires_in
)
How to handle Other Security Issues?
As the project I’m working on is just a MVP, so I didn’t add to much work on this part, but even with the configs you can add with the generate_presigned_post
, the user can always fake their file types.
My idea of prevent potential malicious files is using Lambda to listen to the event of the S3 bucket. Once a new file is uploaded, the lambda can do extra file content check and scan. At the same time, the newly created url once saved to the database, it can be marked as “being verified” so that it won’t be used to the application immediately. Once the lambda finishes all the check, it can call the server to make the upload as safe.