Transcoding video files with S3 Batch Operations

Posted on April 26, 2022

Transcoding video files with S3 Batch Operations

Many media companies store their extensive video repositories in S3. Typical practice suggests storing current videos in S3 Standard and archiving older videos into S3 Glacier. Lambda invocations could be triggered on S3 events such as putting a new object into a bucket. S3 Batch Operations provides you with a managed solution to assist with triggering Lambda functions for existing objects and performing other large-scale tasks in S3.

Create a Lambda transcoding workflow

First, set up the VOD automation workflow using the post linked above. When you reach step 2, creating the Lambda function, return to this post. You must change the Lambda function so it works with S3 Batch Operations. This modification in the Lambda function uses metadata from a new field in the event JSON passed to Lambda and sends a job response back to S3 Batch Operations.

The VOD Automation post uses an S3 trigger in Lambda to point to the bucket ingesting video files. In the Lambda function you create for S3 Batch Operations, do not setup an S3 trigger as your S3 Batch Operations job will trigger the Lambda function directly.

In the convert.py file, add the following lines of code at the beginning to extract the S3 Batch Operations event ID and specific task information, such as S3Bucket and S3Key.

First, the S3 Batch Operations job will send job parameters to Lambda in the event JSON, which includes jobId, invocationId, and invocationSchemaVersion. Lambda grabs these parameters by adding the following three lines

Python

jobId = event[‘job’][‘id’]

invocationId = event[‘invocationId’] invocationSchemaVersion = event[‘invocationSchemaVersion’]

With the specific S3 bucket and key information now in the event JSON task dictionary, a few more lines extract this information directly in the event records. Add the following lines:

Python

task = event[‘tasks’][0]

taskId = task[‘taskId’]

sourceS3BucketArn = task[‘S3BucketArn’]

sourceS3Key = task[‘S3Key’]

Each invocation of Lambda from S3 Batch Operations also needs a results dictionary to include taskId, resultCode, and resultString. I set resultCode and resultString depending on the response from the AWS Elemental MediaConvert jobs. For example, if there is an exception when submitting the MediaConvert jobs, then resultCode = 500 and resultString = e for the exception string. Then at the end of the MediaConvert job, I’ve added the following code block to add results to the results array:

Python

finally:

results.append({

‘taskId’: taskId,

‘resultCode’: resultCode,

‘resultString’:

resultString,

})

When triggering Lambda with S3 Batch Operations, a new response needs to be created that returns special data fields. This tells S3 Batch Operations if the Lambda function successfully executed each task. In this specific case, Lambda returns whether or not the MediaConvert job was successfully submitted:

Python

return {

‘invocationSchemaVersion’: invocationSchemaVersion,

‘treatMissingKeyAs’: ‘Permenant Failure’,

‘invocationId’: invocationId,

‘results’: results

}

Create an S3 Batch Operations job using the console

In the S3 console and choose Buckets, Batch Operations on the left tab under Buckets.

Choose Create Job.

Choose the appropriate Region for your S3 bucket. Under Choose manifest, select where your objects are stored and the CSV file and enter the path to your S3 bucket. If your manifest contains version IDs, make sure to check that box, and choose Next.

Select Invoke AWS Lambda function, and select the Lambda function you created — in this case VODLambdaConvertBatch. Select the function version if you need a different version from $LATEST.

Choose Next.

In Step 3 of the setup wizard, give your job a description and priority level, and choose a report type and destination.

Choose an IAM role for your S3 Batch Operations job to assume. To learn more about the necessary role and invoking Lambda functions in S3, Select a role with permissions to s3:GetObject and s3:GetObjectVersion for the object source buckets as well as the bucket that holds the manifest file.

The role also needs s3:PutObject for the destination bucket for the job completion report.

Lastly, the role needs the lambda:InvokeFunction permission for the Lambda function that it invokes. In this case, the role name is S3BatchLambdaRole.

Choose Next.

In Step 4, review and verify your job parameters before choosing Create Job.

After S3 finishes reading your job’s manifest, it moves the job to the Awaiting your confirmation From here, you can check the number of objects in the manifest and choose Confirm the job to run it.

After the job starts running, you can check its object-level progress through the console dashboard view or by selecting the specific job. As each Lambda invocation occurs, S3 writes logs to CloudWatch Logs.

In the Lambda console, select your Lambda function, and choose Monitoring. You can see Lambda Invoke metrics along with an easy navigation button to View logs in CloudWatch.

When the S3 Batch Operations job completes, view the Successful and Failed object counts to confirm that everything performed as expected. For the details on failed objects, see your job report.

To monitor video transcode jobs, navigate to the MediaConvert service to monitor transcoding jobs for each of the files.

Changes for S3 Glacier Restore

When creating your Lambda function using the VOD Automation workflow, configure an S3 trigger on the S3 bucket that receives your restored videos. For the event type, choose Restore from Glacier Completed. Create a separate Lambda function for triggering with S3, as the function that you created earlier needed specific additions for S3 Batch Operations reporting.

Create the S3 Batch Operations job as before, with the following changes:

In step 2, instead of selecting Invoke AWS Lambda function, select Restore.

Under Restore options, select the number of days that the restored objects should remain available. The original objects remain in S3 Glacier. After the selected number of days, S3 removes the restored object, leaving only the S3 Glacier copy. For a transcoding job, you only have to restore objects long enough to complete the transcode job, at which point you or S3 can remove the restored object.

Select the retrieval time required, either Bulk or Standard.Choose Next.

In step 3, use the same options described before, except the IAM role. In the case of an S3 Glacier restore job, the IAM role also needs s3:RestoreObjectpermissions for the bucket containing the S3 Glacier object.

Complete the job creation and confirmation steps as before.

This new function activates when a client submits an S3 Glacier restore job. Based on the retrieval time selected, S3 restores the objects from S3 Glacier. Each completed object restore function then triggers the VOD automation workflow and completes the transcode jobs in MediaConvert.