Managing AWS S3 objects via Python & boto3

I recently had a task which required updating a large number of JSON manifest files housed within S3 folders, so that Quicksight could read and import the data.

boto3 makes this achievable with just a few lines of code. This blog post will show a few different ways to use this module effectively.

For reference, the boto3 documentation lives here;

https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

Assume an IAM role which can access the S3 bucket

If you have a robust permissions model configured within AWS, it may be that you need to assume an IAM role in order to execute actions against a particular s3 bucket. Using the assume_role function allows this.

sess = boto3.Session(aws_access_key_id=ACCESS_KEY,
                     aws_secret_access_key=SECRET_KEY)
sts_connection = sess.client('sts')
assume_role_object = sts_connection.assume_role(RoleArn="arn:aws:iam::12345678901:role/s3userrole_admin",
                                                RoleSessionName="billing",
                                                DurationSeconds=900)

Create a boto3 session

Create a boto3 session, passing the returned values from the assume_role_object variable, which was generated in the previous step.

session = boto3.Session(
    aws_access_key_id=assume_role_object['Credentials']['AccessKeyId'],
    aws_secret_access_key=assume_role_object['Credentials']['SecretAccessKey'],
    aws_session_token=assume_role_object['Credentials']['SessionToken'],
	region_name = 'eu-west-2')
client = session.client('s3')

>>> client
<botocore.client.S3 object at 0x7f51dd378050>

Querying s3 folder structure

The function we are calling here is list_objects, but for buckets of unknown size, running it through the paginator means that if more than 1000 records exists, the pagnitator will abstract this process into a single request/response for querying.

paginator = client.get_paginator('list_objects')
s3_bucket_name = 'tg-master-bucket'
#Top Level
for result in paginator.paginate(Bucket=s3_bucket_name, Delimiter='/'):
    for prefix in result.get('CommonPrefixes'):
        a = prefix.get('Prefix')
        print(prefix.get('Prefix'))
        # 2nd level
        for result in paginator.paginate(Bucket=s3_bucket_name, Delimiter='/', Prefix=a):
            for prefix in result.get('CommonPrefixes'):
                b = prefix.get('Prefix')
                print(prefix.get('Prefix'))
                
>> /folder1
>> /folder1/folder2

Deleting S3 objects

A simple function where filename needs to be the full path, excluding the bucket name.

Note that to start working with specific services, we need to create a resource session as follows:

s3 = session.resource(‘s3’)

s3 = session.resource('s3')
s3_bucket_name = 'tg-master-bucket'
filename = '/folder1/folder2/example-file.json'
s3.Object(s3_bucket_name,filename).delete()

Reading a S3 JSON file into a usable python object

In this example, I needed to read the JSON file from the bucket into a python JSON object, which could then be manipulated before pushing back to the s3 bucket.

content_object = s3.Object(s3_bucket_name, json_file)
file_content = content_object.get()['Body'].read()
json_content = json.loads(file_content)
json_content['Name'] = 'New Name'

Dependant on the file format that is being read, the content_object.get call above (line 2) may need an extra argument adding of .decode(‘utf-8’).

content_object.get()['Body'].read().decode('utf-8')

Copying s3 files

A simple action to copy an existing file in a bucket.

s3_bucket_name = 'tg-master-bucket'
existing_file = '/folder1/folder2/example-file.json'
new_filename = '/folder1/folder2/folder3/new-file.json'
copy_source = {'Bucket': s3_bucket_name, 'Key': existing_file}
s3.Object(s3_bucket_name,new_filename).copy_from(CopySource=copy_source)

Upload local files to S3 bucket

Upload a file from a local path to a defined path within a bucket.

s3 = session.resource('s3')
s3_bucket_name = 'tg-master-bucket'
s3_file_name = "folder1/folder2/newjsonfile.json"
s3.meta.client.upload_file(Filename='./temp.json', Bucket=s3_bucket_name, Key=s3_file_name)

1 Comment

  1. Jerry Permalink

    It would be helpful to see examples of using the Bucket Object, and the ‘action’ properties,

    For Example:

    #!python

    import boto3

    import os

    # assuming existing & valid ~/.aws/config , ~/.aws/credentials

    s3client = boto3.client(‘s3′)

    s3client.create_bucket(Bucket=’object _key_string’)

    s3rsrc = boto3.resource(‘s3′)

    oBukt = s3rsrc.Bucket(name=’object _key_string’) # now you have a ‘Bucket Object’

    for filename in os.listdir():

    oBukt.upload_file(‘./’ + filename, filename) # you can call ‘actions’ on it

    for object in oBukt.objects.all(): # and refer to the contents of it

    print(object.key)

    os.mkdir(‘new’)

    os.chgdir(‘new’)

    for object in oBukt.objects.all():

    oBukt.download_file(object.key, object.key)
    object.delete() # and call ‘actions’ on the content objects

    Particularly, how to copy / move objects within a bucket object, from one ‘folder’ ( prefix ) to another.

    Something like the below would be very helpful:

    oBukt.copy_file(object.key, ‘subfolder_name/’ + object.key) # replicate same in new location

    oBukt.move_file(object.key, ‘subfolder_name/’ + object.key) # replicate, and remove original

    I understand that those specific ‘actions’ ( .copy_file, .move_file ) do not exist, but clear examples of how to achieve the intended result would be much appreciated.

    Reply

Leave a Reply