Automated AWS Receipt Processing System

3.1 Overview of Project

This project focuses on automating receipt processing using AWS services. Instead of manually handling receipts which can be time-consuming, error-prone, and difficult to scale—this system extracts structured data from receipts and stores it efficiently for record-keeping and auditing.

The architecture consists of:

  • Storage Layer: Amazon S3 stores receipt images and PDFs.

  • Processing Layer: Amazon Textract extracts text from receipts using AI-powered OCR.

  • Database Layer: DynamoDB stores the extracted data in a structured format.

  • Notification System: Amazon SES sends email alerts with receipt details.

  • Compute Layer: AWS Lambda automates the workflow by processing the receipts in real-time.

3.2 Storage and Database Setup: S3 bucket and DynamoDB Table

Steps to be Performed 👩‍💻

  1. Setup S3 Buckets for Receipt Storage and Archiving.

  2. Create a DynamoDB Table to Store Extracted Receipt Data.

#1 Set Up S3 Buckets for Receipt Storage and Archiving

The receipt processing system needs a central, secure location to store both the original receipt files and track their processing status.

Amazon S3 (Simple Storage Service) provides the ideal solution with its durability, availability, and security features. This bucket will serve as both the entry point for new receipts and the archive for processed documents.

Steps

  • Sign in to the AWS Management Console

  • Navigate to the Amazon S3 service

  • Click "Create bucket"

    • This will open the bucket creation wizard

    • The bucket will store all your receipt files

    • Select “Bucket type” as General purpose

  • Enter a unique bucket name (e.g., "automated-receipts-yourusername")

    • Note: S3 bucket names must be globally unique across all AWS accounts

    • Best practice: Include your username or organization name to ensure uniqueness

Important: Keep all default settings for the bucket configuration

  • Do not change any of the default options for:

    • Object Ownership

    • Block Public Access settings

    • Bucket Versioning

    • Tags

    • Default encryption

    • Advanced settings

  • Click "Create bucket"

    • AWS will validate your settings and create the bucket

    • You'll be redirected to the S3 buckets list where your new bucket should appear

  • Create an organizational folder (Recommended)

    • Navigate into your newly created bucket

    • Click "Create folder" and name it "incoming" for new receipt uploads

      • This folder help organize receipts by processing status

The S3 bucket serves as both the starting point and archive for the receipt processing workflow:

  1. Users or systems upload receipt images/PDFs to the "incoming" folder

  2. These uploads automatically trigger the Lambda function

  3. The system maintains the original documents for audit purposes while extracting their data

#2 Create a DynamoDB Table to Store Extracted Receipt Data

After extracting structured data from receipts, the system needs a scalable, high-performance database to store this information.

DynamoDB is an ideal choice for this application because it provides consistent single-digit millisecond response times at any scale, without the need to manage database servers or worry about capacity planning.

Steps

  • Navigate to DynamoDB in the AWS Console

  • Click "Create table"

    • This opens the table creation interface

    • The table will store all extracted receipt data in a structured format

  • Configure the table basics

    • Table name: Enter "Receipts"

    • Partition key: Enter "receipt_id" and select "String" type

      • This will be a unique identifier for each receipt

      • Generated automatically by the processing Lambda function

    • Sort key: Enter "date" and select "String" type

      • This allows you to query later on, by date or date ranges

      • Format will be "YYYY-MM-DD" for consistent sorting

  • Review and create

    • Leave default settings for the rest of the options

    • Click "Create table"

  • AWS will provision your table (this typically takes less than a minute)

DynamoDB Table Setup

After creating your "Receipts" table in DynamoDB:

  • The table will initially show "No items" as seen in the screenshot

  • Items will only appear after:

  1. Receipt files are uploaded to the S3 "incoming" folder

  2. The Lambda function processes these uploads

  3. Extracted data is stored in this DynamoDB table

This empty state is normal and indicates the table is ready to receive data once the workflow begins.

Data Model for Receipt Information

The DynamoDB table will store the following attributes for each receipt:

The DynamoDB table serves as the structured data repository for all processed receipts:

  1. The Lambda function extracts data from receipts using Amazon Textract

  2. The extracted data is formatted according to the data model

  3. The Lambda function writes this data to the DynamoDB table

  4. Applications and reports can query the table for receipt information

  5. The original receipt images remain in S3, linked via the receipt_url attribute

3.3 Notification Setup: Configuring Amazon SES

  • Steps to be Performed 👩‍💻

    1. Setting up SES for Email Notifications.

    2. Verifying the Email Address.

    3. Configuring SES for Receipt Notifications.

    #1 Setting up SES for Email Notifications

    After receipts are processed, users need to be notified about the extracted information and any actions required.

    Amazon Simple Email Service (SES) provides a reliable, cost-effective email solution for sending these notifications. Setting up SES ensures that appropriate stakeholders are informed when new receipts are processed.

    • Navigate to Amazon SES in the AWS Console

  • Access Identity Management

    • In the SES console, go to Configuration > Identities

    • This section is where you'll verify email addresses or domains

  • Begin Identity Creation

    • Click the Create identity button

    • This starts the process of verifying an email address or domain for sending

  • Select Identity Type

    • Select Email address option

    • Note: For production systems, you might want to verify an entire domain instead

  • Configure Sender Email

    • Enter your sender email address (the address you'll send notifications from)

    • Best practice: Use a business email or a dedicated notification address

    • Example: receipts-noreply@yourcompany.com

  • Complete Initial Setup

    • Click Create identity

    • AWS will process your request and prepare verification

#2 Verifying the Email Address

After creating the identity, navigate to your Gmail inbox and perform the verification steps as provided below:

  • Verify Sender Email

    • AWS will send a verification email to the address you provided

    • This verification ensures you own the email address

    • Check both inbox and spam/junk folders for this email

  • Complete Verification

    • Go to the inbox of that email address and find the AWS verification email

    • Click the verification link in that email

  • You'll be redirected to a confirmation page in AWS

#3 Configuring SES for Receipt Notifications

❗Important: Email Verification Options

  • Option 1: Use the same email for sender and recipient (simplest approach)

    • Verify only one email address to use for both sending and receiving notifications

    • This is recommended for tutorial and testing purposes

  • Option 2: Use different sender and recipient emails

    • Repeat steps 1 and 2 for each recipient email address

    • Remember: While in SES sandbox mode, both sender and recipient emails must be verified

The SES configuration enables the notification component of the receipt processing workflow:

  1. After the Lambda function processes a receipt and stores data in DynamoDB

  2. It uses SES to send an email notification with receipt details

  3. The notification includes key information like vendor, date, amount, and a link to the original receipt

  4. Users receive timely updates without needing to check the system manually

3.4 Processing Setup: Creating a Lambda function

Steps to be Performed 👩‍💻

  1. Create an IAM role for AWS Lambda.

  2. Create a Lambda Function.

  3. Add the Lambda function Code.

  4. Understand the Lambda function Code.

#1 Create an IAM role

Lambda functions need permission to access other AWS services.

Creating a dedicated IAM role follows security best practices by explicitly defining what actions your Lambda function can perform. To do this—

  • Navigate to IAM in the AWS Console

  • Initiate Role Creation

    • Click "Roles" in the left navigation pane

    • Click "Create role" button to start the process

  • Select Trusted Entity

    • Select "AWS service" as the trusted entity type

    • This specifies which AWS service can assume this role

  • Specify Service Use Case

    • Choose "Lambda" as the service that will use this role

    • This allows Lambda functions to assume this role at runtime

  • Configure Permissions

    • Click "Next: Permissions" to continue

    • On the Permissions policies page, you'll attach policies that define what the role can do

  • Attach Required Policies

  • In the search box, search for and select these policies:

    • AmazonS3ReadOnlyAccess: Allows reading from S3 buckets

    • AmazonTextractFullAccess: Enables document analysis with Textract

    • AmazonDynamoDBFullAccess: Permits writing extracted data to DynamoDB

    • AmazonSESFullAccess: Enables sending email notifications

    • AWSLambdaBasicExecutionRole: Allows Lambda to write logs to CloudWatch

  • Review and Name Role

    • Click "Next"

    • Role name: "ReceiptProcessingLambdaRole"

    • Description (optional): "Allows Lambda functions to process receipts using S3, Textract, DynamoDB, and SES"

  • Create the Role

    • Review the role configuration

    • Click "Create role" to finalize

  • Validate Role Creation

    • Verify the role appears in the IAM roles list

    • Click on the role name to view its details and ensure all policies are attached correctly

The IAM role connects all components of the receipt processing system:

  1. When the Lambda function executes, it assumes this role

  2. The role's permissions allow the function to:

  • Read receipt files from the S3 bucket

    • Send those files to Textract for analysis

    • Write extracted data to DynamoDB

    • Send notification emails via SES

    • Write logs to CloudWatch for troubleshooting

#2 Create a Lambda Function

The Lambda function serves as the central processing engine for our receipt processing system.

This serverless function is triggered automatically when a new receipt is uploaded to S3, coordinates the extraction of data using Textract, stores the structured information in DynamoDB, and sends notifications via SES.

  • Navigate to Lambda in the AWS Console

  • Begin Function Creation

    • Click "Create function" to start the process

    • This opens the function creation wizard

  • Select Creation Method

    • Select "Author from scratch"

    • This allows you to create a custom function from the beginning

  • Configure Basic Settings

    • Function name: Enter "ReceiptProcessor"

    • Runtime: Select "Python 3.9" from the dropdown

    • Architecture: Leave as default (x86_64)

  • Set Permissions

    • Under "Permissions" expand "Change default execution role"

    • Select "Use an existing role"

    • From the dropdown, choose "ReceiptProcessingLambdaRole"

  • Create the Function

    • Review your settings

    • Click "Create function"

    • AWS will provision your Lambda function (this takes just a few seconds)

  • Adjust Function Timeout

    • Navigate to the "Configuration" tab

    • Select "General configuration"

    • Click "Edit"

  • Change the Timeout from the default 3 seconds to 3 minutes (180 seconds)

  • This extended timeout is necessary because Textract processing can take time for complex receipts

  • Click "Save"

  • Configure Environment Variables

    • Still in the "Configuration" tab, select "Environment variables"

    • Click "Edit"

  • Add the following key-value pairs:

  • Key: SES_RECIPIENT_EMAIL, Value: recipient-email@example.com (use the recipient email you verified)

    • Note: For detailed information about these environment variables and their usage in the Lambda function, please refer to the Lambda Environment Variables table given below.

  • Click Save.

Lambda Environment Variables:

Environment variables provide configuration without code changes:

The Lambda function forms the core of the receipt processing workflow:

  1. When a new receipt is uploaded to S3, it triggers the Lambda function

  2. Lambda retrieves the receipt file from S3

  3. Lambda calls Amazon Textract to extract data from the receipt

  4. Lambda processes and structures the extracted data

  5. Lambda writes the structured data to DynamoDB

  6. Lambda sends a notification email via SES

  7. Lambda moves the original receipt to the "processed" folder

#3 Add the Lambda function Code

  • Access the Code Editor

    • In the Lambda console, navigate to your "ReceiptProcessor" function

    • Scroll down to the code source section where you'll see the code editor

    • The default Lambda function contains a simple "Hello World" example

  • Replace the default code with the provided Python code

    • Delete the existing code in the editor

    • Copy and paste the following Python code:

  • Deploy the Function

    • After pasting the code, click the "Deploy" button

    • This saves your code and makes it available for execution

    • You should see a confirmation message once deployment is complete

This Lambda function code ties together all the components we've set up:

  1. It's triggered automatically when a new file appears in the S3 bucket

  2. It uses the IAM role we created to access other AWS services

  3. It processes the receipt image using Textract

  4. It stores the structured data in our DynamoDB table

  5. It sends notifications via our configured SES email

The code is designed to be resilient, with comprehensive error handling and logging to help troubleshoot any issues that might arise during processing.

#2 Understand the Lambda function Code

This point breaks down the Lambda function code that powers the AWS Receipt Processing System, explaining each major component and its purpose.

Code Structure Overview

The Lambda function is organized into four main components:

  1. Lambda Handler: Entry point that coordinates the entire workflow

  2. Textract Processing: Extracts structured data from receipt images

  3. DynamoDB Storage: Saves the processed data to the database

  4. Email Notification: Sends formatted results via email

Lambda Handler Function

What it does:

  • Acts as the orchestrator for the entire process

  • Extracts information about which file was uploaded from the S3 event

  • URL-decodes the file path to handle special characters

  • Verifies the file exists before attempting processing

  • Calls the specialized functions for each step of the process

  • Handles errors gracefully with detailed logging

Textract Processing Function

What it does:

  • Uses Amazon Textract's specialized analyze_expense API

  • Creates a unique ID for the receipt

  • Sets up default values for all expected fields

  • Extracts summary information (vendor, date, total amount)

  • Extracts individual line items with their quantities and prices

  • Returns structured data in a consistent format

Key insights:

  • Textract understands the structure of receipts, not just the text

  • The function handles missing data gracefully with default values

  • The unique ID ensures each receipt can be tracked independently

DynamoDB Storage Function

What it does:

  • Connects to the DynamoDB table

  • Formats the receipt data for database storage

  • Adds a processing timestamp for tracking

  • Stores all receipt information as a single item

  • Includes the S3 path to link back to the original document

Key insights:

  • The structured format makes it easy to query receipts later

  • The timestamp records when processing occurred

  • The S3 path allows you to access the original receipt if needed

Email Notification Function

What it does:

  • Creates a formatted HTML email with receipt details

  • Includes a list of all extracted line items

  • Uses Amazon SES to send the email

  • Creates a descriptive subject line with vendor and total

  • Includes the S3 path to access the original document

Key insights:

  • HTML formatting makes the email easy to read

  • Including line items provides complete information

  • The receipt ID and S3 path enable tracking and retrieval

Error Handling and Logging

Throughout the code, you'll notice:

  • Try-except blocks that catch and log errors

  • Detailed log messages at each processing step

  • Default values to handle missing data gracefully

  • Continuation of execution when non-critical parts fail (e.g., email)

This robust error handling ensures the system can process imperfect receipts and recover from temporary issues without manual intervention.

Environment Variables

The code uses these environment variables for configuration:

  • DYNAMODB_TABLE: The name of the DynamoDB table to store receipts

  • SES_SENDER_EMAIL: The verified email address to send notifications from

  • SES_RECIPIENT_EMAIL: The email address to receive notifications

Using environment variables allows you to change these settings without modifying the code.

3.5 Integration and Testing

Steps to be Performed 👩‍💻

  1. Setup S3 Event Notification Trigger.

  2. Test the Project Execution.

  3. Project Improvement Ideas.

#1 Setup S3 Event Notification Trigger

For our receipt processing system to operate automatically, we need to establish a trigger that will invoke our Lambda function whenever a new receipt is uploaded.

Amazon S3 offers event notifications that can detect new file uploads and trigger specific actions. This integration creates the vital connection between file upload (input) and processing (execution).

  • Navigate to S3 in the AWS Console

  • Access Your Bucket

    • From the list of buckets, select the receipt storage bucket you created earlier

    • This opens the bucket management interface

  • Access Properties Settings

    • Navigate to the "Properties" tab at the top of the bucket management interface

    • This tab contains various bucket configurations

  • Create Event Notification

    • Scroll down to the "Event Notifications" section

    • Click "Create event notification" to begin configuration

    • This opens the event notification creation form

  • Configure Event Details

    • Name: "ReceiptUploadEvent"

    • Prefix (optional): "incoming/" (if using folders)

    • Suffix (optional): Leave blank or add ".pdf,.jpg,.jpeg,.png"

  • Event types: Check "All object create events"

  • This includes put, post, copy, and multipart upload completions

  • Keep the rest of the boxes unchecked

  • Ensures any method of adding files triggers the process

  • Destination: Select "Lambda Function"

  • Lambda function: Select "ReceiptProcessor"

  • This connects the S3 event to your processing function

  • Save the Configuration

    • Review all settings to ensure they match your requirements

    • Click "Save" to activate the event notification

    • AWS will validate and create the notification configuration

The S3 event notification creates the automated workflow for receipt processing:

  1. When a receipt image is uploaded to S3, it generates an event

  2. The event notification system detects this upload event

  3. If the file matches the configured prefix and suffix filters, S3 invokes the Lambda function

  4. The Lambda function receives event data including the bucket name and file key

  5. Processing begins automatically without any manual intervention

#2 Test the Project Execution

This point explains how to verify your receipt processing system works correctly end-to-end.

Step 1: Upload a Test Receipt

  • Navigate to your S3 bucket in the AWS Console

  • Upload a receipt from this folder: Receipts (to the "incoming" folder if you created one)

  • Wait 10-15 seconds for processing to complete

Step 2: Monitor the Lambda Execution

  • Go to Lambda > Functions > ReceiptProcessor

  • Select the "Monitor" tab

  • Check that your function was recently invoked

📝Note: It may take a few minutes for the Lambda execution to appear in the monitoring dashboard. If you don't see your execution immediately, wait 2-3

minutes and refresh the page.

Important: Uploading a receipt in Step 1 is mandatory to trigger the Lambda function execution.

  • Click "View logs in CloudWatch" for detailed execution logs

What to Look For:

  • You may encounter this error: 'Log group '/aws/lambda/ReceiptProcessor' does not exist for account ID '211125755223''

This occurs because the CloudWatch log group isn't created until your Lambda function runs for the first time. To resolve:

  1. Upload a receipt to trigger your Lambda function

  2. Wait 2-3 minutes

  3. Refresh the CloudWatch Logs page

  • Successful function invocation

  • Log entries showing processing steps

  • Any error messages that need attention

Step 3: Verify Data in DynamoDB

  • Go to DynamoDB > Tables > Receipts

  • Select the "Items" tab

Look for your recently processed receipt

Check:

  • Receipt data is stored correctly

  • Key fields (vendor, date, total amount) are accurate

  • Line items were extracted properly

Step 4: Check Email Notifications

  • Open your configured recipient email account

  • Find the email with subject "Receipt Processed: [vendor] - $[total]"

  • Verify the receipt details in the email body

Troubleshooting Tips

  • No Lambda trigger? Check S3 event configuration

  • Processing errors? Review CloudWatch logs

  • Missing data? Examine text extraction results

Next Steps

After successful testing, try processing multiple receipts with different formats to ensure robustness before deploying the system for regular use.

Sample Output

Below are screenshots of email notifications received after successful receipt processing, demonstrating the expected output format:

Clean Up🗑️

  • Delete ALB:

    • Navigate to the EC2 Console, go to the Load Balancers section, and delete the ALB.

  • Terminate EC2 Instances:

    • In the EC2 Console, stop and terminate the 2 Webserver instances and the Bastion Host instance.

  • Delete RDS Instance:

    • In the RDS Console, delete the database instance, ensuring backups are created if needed.

  • Detach and Delete IGW:

    • In the VPC Console, detach and delete the Internet Gateway from the VPC.

  • Delete Subnets and Route Tables:

    • Remove all subnets and route tables associated with the project in the VPC Console.

  • Delete Security Groups:

    • Navigate to the EC2 Console, and delete custom security groups created for this project.

  • Delete VPC:

    • Ensure all dependencies are removed, then delete the VPC.