I recently had a task which required updating a large number of JSON manifest files housed within S3 folders, so that Quicksight could read and import the data.
boto3 makes this achievable with just a few lines of code. This blog post will show a few different ways to use this module effectively.
For reference, the boto3 documentation lives here;
https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
Assume an IAM role which can access the S3 bucket
If you have a robust permissions model configured within AWS, it may be that you need to assume an IAM role in order to execute actions against a particular s3 bucket. Using the assume_role function allows this.
sess = boto3.Session(aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY) sts_connection = sess.client('sts') assume_role_object = sts_connection.assume_role(RoleArn="arn:aws:iam::12345678901:role/s3userrole_admin", RoleSessionName="billing", DurationSeconds=900)
Create a boto3 session
Create a boto3 session, passing the returned values from the assume_role_object variable, which was generated in the previous step.
session = boto3.Session( aws_access_key_id=assume_role_object['Credentials']['AccessKeyId'], aws_secret_access_key=assume_role_object['Credentials']['SecretAccessKey'], aws_session_token=assume_role_object['Credentials']['SessionToken'], region_name = 'eu-west-2') client = session.client('s3') >>> client <botocore.client.S3 object at 0x7f51dd378050>
Querying s3 folder structure
The function we are calling here is list_objects, but for buckets of unknown size, running it through the paginator means that if more than 1000 records exists, the pagnitator will abstract this process into a single request/response for querying.
paginator = client.get_paginator('list_objects') s3_bucket_name = 'tg-master-bucket' #Top Level for result in paginator.paginate(Bucket=s3_bucket_name, Delimiter='/'): for prefix in result.get('CommonPrefixes'): a = prefix.get('Prefix') print(prefix.get('Prefix')) # 2nd level for result in paginator.paginate(Bucket=s3_bucket_name, Delimiter='/', Prefix=a): for prefix in result.get('CommonPrefixes'): b = prefix.get('Prefix') print(prefix.get('Prefix')) >> /folder1 >> /folder1/folder2
Deleting S3 objects
A simple function where filename needs to be the full path, excluding the bucket name.
Note that to start working with specific services, we need to create a resource session as follows:
s3 = session.resource(‘s3’)
s3 = session.resource('s3') s3_bucket_name = 'tg-master-bucket' filename = '/folder1/folder2/example-file.json' s3.Object(s3_bucket_name,filename).delete()
Reading a S3 JSON file into a usable python object
In this example, I needed to read the JSON file from the bucket into a python JSON object, which could then be manipulated before pushing back to the s3 bucket.
content_object = s3.Object(s3_bucket_name, json_file) file_content = content_object.get()['Body'].read() json_content = json.loads(file_content) json_content['Name'] = 'New Name'
Dependant on the file format that is being read, the content_object.get call above (line 2) may need an extra argument adding of .decode(‘utf-8’).
content_object.get()['Body'].read().decode('utf-8')
Copying s3 files
A simple action to copy an existing file in a bucket.
s3_bucket_name = 'tg-master-bucket' existing_file = '/folder1/folder2/example-file.json' new_filename = '/folder1/folder2/folder3/new-file.json' copy_source = {'Bucket': s3_bucket_name, 'Key': existing_file} s3.Object(s3_bucket_name,new_filename).copy_from(CopySource=copy_source)
Upload local files to S3 bucket
Upload a file from a local path to a defined path within a bucket.
s3 = session.resource('s3') s3_bucket_name = 'tg-master-bucket' s3_file_name = "folder1/folder2/newjsonfile.json" s3.meta.client.upload_file(Filename='./temp.json', Bucket=s3_bucket_name, Key=s3_file_name)
It would be helpful to see examples of using the Bucket Object, and the ‘action’ properties,
For Example:
#!python
import boto3
import os
# assuming existing & valid ~/.aws/config , ~/.aws/credentials
s3client = boto3.client(‘s3′)
s3client.create_bucket(Bucket=’object _key_string’)
s3rsrc = boto3.resource(‘s3′)
oBukt = s3rsrc.Bucket(name=’object _key_string’) # now you have a ‘Bucket Object’
for filename in os.listdir():
oBukt.upload_file(‘./’ + filename, filename) # you can call ‘actions’ on it
for object in oBukt.objects.all(): # and refer to the contents of it
print(object.key)
os.mkdir(‘new’)
os.chgdir(‘new’)
for object in oBukt.objects.all():
oBukt.download_file(object.key, object.key)
object.delete() # and call ‘actions’ on the content objects
Particularly, how to copy / move objects within a bucket object, from one ‘folder’ ( prefix ) to another.
Something like the below would be very helpful:
oBukt.copy_file(object.key, ‘subfolder_name/’ + object.key) # replicate same in new location
oBukt.move_file(object.key, ‘subfolder_name/’ + object.key) # replicate, and remove original
I understand that those specific ‘actions’ ( .copy_file, .move_file ) do not exist, but clear examples of how to achieve the intended result would be much appreciated.