Automated AWS Receipt Processing System

3.1 Overview of Project

This project focuses on automating receipt processing using AWS services. Instead of manually handling receipts which can be time-consuming, error-prone, and difficult to scale—this system extracts structured data from receipts and stores it efficiently for record-keeping and auditing.

The architecture consists of:

Storage Layer: Amazon S3 stores receipt images and PDFs.
Processing Layer: Amazon Textract extracts text from receipts using AI-powered OCR.
Database Layer: DynamoDB stores the extracted data in a structured format.
Notification System: Amazon SES sends email alerts with receipt details.
Compute Layer: AWS Lambda automates the workflow by processing the receipts in real-time.

3.2 Storage and Database Setup: S3 bucket and DynamoDB Table

Steps to be Performed 👩‍💻

Setup S3 Buckets for Receipt Storage and Archiving.
Create a DynamoDB Table to Store Extracted Receipt Data.

#1 Set Up S3 Buckets for Receipt Storage and Archiving

The receipt processing system needs a central, secure location to store both the original receipt files and track their processing status.

Amazon S3 (Simple Storage Service) provides the ideal solution with its durability, availability, and security features. This bucket will serve as both the entry point for new receipts and the archive for processed documents.

Steps

Sign in to the AWS Management Console
Navigate to the Amazon S3 service

Click "Create bucket"
- This will open the bucket creation wizard
- The bucket will store all your receipt files
- Select “Bucket type” as General purpose

Enter a unique bucket name (e.g., "automated-receipts-yourusername")
- Note: S3 bucket names must be globally unique across all AWS accounts
- Best practice: Include your username or organization name to ensure uniqueness

❗Important: Keep all default settings for the bucket configuration

Do not change any of the default options for:
- Object Ownership
- Block Public Access settings
- Bucket Versioning
- Tags
- Default encryption
- Advanced settings
Click "Create bucket"
- AWS will validate your settings and create the bucket
- You'll be redirected to the S3 buckets list where your new bucket should appear

Create an organizational folder (Recommended)
- Navigate into your newly created bucket
- Click "Create folder" and name it "incoming" for new receipt uploads
  - This folder help organize receipts by processing status

The S3 bucket serves as both the starting point and archive for the receipt processing workflow:

Users or systems upload receipt images/PDFs to the "incoming" folder
These uploads automatically trigger the Lambda function
The system maintains the original documents for audit purposes while extracting their data

#2 Create a DynamoDB Table to Store Extracted Receipt Data

After extracting structured data from receipts, the system needs a scalable, high-performance database to store this information.

DynamoDB is an ideal choice for this application because it provides consistent single-digit millisecond response times at any scale, without the need to manage database servers or worry about capacity planning.

Steps

Navigate to DynamoDB in the AWS Console

Click "Create table"
- This opens the table creation interface
- The table will store all extracted receipt data in a structured format

Configure the table basics
- Table name: Enter "Receipts"
- Partition key: Enter "receipt_id" and select "String" type
  - This will be a unique identifier for each receipt
  - Generated automatically by the processing Lambda function
- Sort key: Enter "date" and select "String" type
  - This allows you to query later on, by date or date ranges
  - Format will be "YYYY-MM-DD" for consistent sorting

Review and create
- Leave default settings for the rest of the options
- Click "Create table"
AWS will provision your table (this typically takes less than a minute)

DynamoDB Table Setup

After creating your "Receipts" table in DynamoDB:

The table will initially show "No items" as seen in the screenshot
Items will only appear after:

Receipt files are uploaded to the S3 "incoming" folder
The Lambda function processes these uploads
Extracted data is stored in this DynamoDB table

This empty state is normal and indicates the table is ready to receive data once the workflow begins.

Data Model for Receipt Information

The DynamoDB table will store the following attributes for each receipt:

The DynamoDB table serves as the structured data repository for all processed receipts:

The Lambda function extracts data from receipts using Amazon Textract
The extracted data is formatted according to the data model
The Lambda function writes this data to the DynamoDB table
Applications and reports can query the table for receipt information
The original receipt images remain in S3, linked via the receipt_url attribute

3.3 Notification Setup: Configuring Amazon SES

Steps to be Performed 👩‍💻
1. Setting up SES for Email Notifications.
2. Verifying the Email Address.
3. Configuring SES for Receipt Notifications.
#1 Setting up SES for Email Notifications
After receipts are processed, users need to be notified about the extracted information and any actions required.
Amazon Simple Email Service (SES) provides a reliable, cost-effective email solution for sending these notifications. Setting up SES ensures that appropriate stakeholders are informed when new receipts are processed.
- Navigate to Amazon SES in the AWS Console

Access Identity Management
- In the SES console, go to Configuration > Identities
- This section is where you'll verify email addresses or domains

Begin Identity Creation
- Click the Create identity button
- This starts the process of verifying an email address or domain for sending

Select Identity Type
- Select Email address option
- Note: For production systems, you might want to verify an entire domain instead

Configure Sender Email
- Enter your sender email address (the address you'll send notifications from)
- Best practice: Use a business email or a dedicated notification address
- Example: receipts-noreply@yourcompany.com
Complete Initial Setup
- Click Create identity
- AWS will process your request and prepare verification

#2 Verifying the Email Address

After creating the identity, navigate to your Gmail inbox and perform the verification steps as provided below:

Verify Sender Email
- AWS will send a verification email to the address you provided
- This verification ensures you own the email address
- Check both inbox and spam/junk folders for this email

Complete Verification
- Go to the inbox of that email address and find the AWS verification email
- Click the verification link in that email

You'll be redirected to a confirmation page in AWS

#3 Configuring SES for Receipt Notifications

❗Important: Email Verification Options

Option 1: Use the same email for sender and recipient (simplest approach)
- Verify only one email address to use for both sending and receiving notifications
- This is recommended for tutorial and testing purposes
Option 2: Use different sender and recipient emails
- Repeat steps 1 and 2 for each recipient email address
- Remember: While in SES sandbox mode, both sender and recipient emails must be verified

The SES configuration enables the notification component of the receipt processing workflow:

After the Lambda function processes a receipt and stores data in DynamoDB
It uses SES to send an email notification with receipt details
The notification includes key information like vendor, date, amount, and a link to the original receipt
Users receive timely updates without needing to check the system manually

3.4 Processing Setup: Creating a Lambda function

Steps to be Performed 👩‍💻

Create an IAM role for AWS Lambda.
Create a Lambda Function.
Add the Lambda function Code.
Understand the Lambda function Code.

#1 Create an IAM role

Lambda functions need permission to access other AWS services.

Creating a dedicated IAM role follows security best practices by explicitly defining what actions your Lambda function can perform. To do this—

Navigate to IAM in the AWS Console

Initiate Role Creation
- Click "Roles" in the left navigation pane
- Click "Create role" button to start the process

Select Trusted Entity
- Select "AWS service" as the trusted entity type
- This specifies which AWS service can assume this role

Specify Service Use Case
- Choose "Lambda" as the service that will use this role
- This allows Lambda functions to assume this role at runtime

Configure Permissions
- Click "Next: Permissions" to continue
- On the Permissions policies page, you'll attach policies that define what the role can do

Attach Required Policies
In the search box, search for and select these policies:
- AmazonS3ReadOnlyAccess: Allows reading from S3 buckets
- AmazonTextractFullAccess: Enables document analysis with Textract
- AmazonDynamoDBFullAccess: Permits writing extracted data to DynamoDB
- AmazonSESFullAccess: Enables sending email notifications
- AWSLambdaBasicExecutionRole: Allows Lambda to write logs to CloudWatch

Review and Name Role
- Click "Next"
- Role name: "ReceiptProcessingLambdaRole"
- Description (optional): "Allows Lambda functions to process receipts using S3, Textract, DynamoDB, and SES"

Create the Role
- Review the role configuration
- Click "Create role" to finalize

Validate Role Creation
- Verify the role appears in the IAM roles list
- Click on the role name to view its details and ensure all policies are attached correctly

The IAM role connects all components of the receipt processing system:

When the Lambda function executes, it assumes this role
The role's permissions allow the function to:

Read receipt files from the S3 bucket
- Send those files to Textract for analysis
- Write extracted data to DynamoDB
- Send notification emails via SES
- Write logs to CloudWatch for troubleshooting

#2 Create a Lambda Function

The Lambda function serves as the central processing engine for our receipt processing system.

This serverless function is triggered automatically when a new receipt is uploaded to S3, coordinates the extraction of data using Textract, stores the structured information in DynamoDB, and sends notifications via SES.

Navigate to Lambda in the AWS Console

Begin Function Creation
- Click "Create function" to start the process
- This opens the function creation wizard

Select Creation Method
- Select "Author from scratch"
- This allows you to create a custom function from the beginning
Configure Basic Settings
- Function name: Enter "ReceiptProcessor"
- Runtime: Select "Python 3.9" from the dropdown
- Architecture: Leave as default (x86_64)

Set Permissions
- Under "Permissions" expand "Change default execution role"
- Select "Use an existing role"
- From the dropdown, choose "ReceiptProcessingLambdaRole"

Create the Function
- Review your settings
- Click "Create function"
- AWS will provision your Lambda function (this takes just a few seconds)

Adjust Function Timeout
- Navigate to the "Configuration" tab
- Select "General configuration"
- Click "Edit"

Change the Timeout from the default 3 seconds to 3 minutes (180 seconds)
This extended timeout is necessary because Textract processing can take time for complex receipts
Click "Save"

Configure Environment Variables
- Still in the "Configuration" tab, select "Environment variables"
- Click "Edit"

Add the following key-value pairs:
- Key: DYNAMODB_TABLE, Value: Receipts
- Key: SES_SENDER_EMAIL, Value: your-verified-email@example.com (use the email you verified in SES)
Key: SES_RECIPIENT_EMAIL, Value: recipient-email@example.com (use the recipient email you verified)
- Note: For detailed information about these environment variables and their usage in the Lambda function, please refer to the Lambda Environment Variables table given below.
Click Save.

Lambda Environment Variables:

Environment variables provide configuration without code changes:

The Lambda function forms the core of the receipt processing workflow:

When a new receipt is uploaded to S3, it triggers the Lambda function
Lambda retrieves the receipt file from S3
Lambda calls Amazon Textract to extract data from the receipt
Lambda processes and structures the extracted data
Lambda writes the structured data to DynamoDB
Lambda sends a notification email via SES
Lambda moves the original receipt to the "processed" folder

#3 Add the Lambda function Code

Access the Code Editor
- In the Lambda console, navigate to your "ReceiptProcessor" function
- Scroll down to the code source section where you'll see the code editor
- The default Lambda function contains a simple "Hello World" example

Replace the default code with the provided Python code
- Delete the existing code in the editor
- Copy and paste the following Python code:

Deploy the Function
- After pasting the code, click the "Deploy" button
- This saves your code and makes it available for execution
- You should see a confirmation message once deployment is complete

This Lambda function code ties together all the components we've set up:

It's triggered automatically when a new file appears in the S3 bucket
It uses the IAM role we created to access other AWS services
It processes the receipt image using Textract
It stores the structured data in our DynamoDB table
It sends notifications via our configured SES email

The code is designed to be resilient, with comprehensive error handling and logging to help troubleshoot any issues that might arise during processing.

#2 Understand the Lambda function Code

This point breaks down the Lambda function code that powers the AWS Receipt Processing System, explaining each major component and its purpose.

Code Structure Overview

The Lambda function is organized into four main components:

Lambda Handler: Entry point that coordinates the entire workflow
Textract Processing: Extracts structured data from receipt images
DynamoDB Storage: Saves the processed data to the database
Email Notification: Sends formatted results via email

Lambda Handler Function

What it does:

Acts as the orchestrator for the entire process
Extracts information about which file was uploaded from the S3 event
URL-decodes the file path to handle special characters
Verifies the file exists before attempting processing
Calls the specialized functions for each step of the process
Handles errors gracefully with detailed logging

Textract Processing Function

What it does:

Uses Amazon Textract's specialized analyze_expense API
Creates a unique ID for the receipt
Sets up default values for all expected fields
Extracts summary information (vendor, date, total amount)
Extracts individual line items with their quantities and prices
Returns structured data in a consistent format

Key insights:

Textract understands the structure of receipts, not just the text
The function handles missing data gracefully with default values
The unique ID ensures each receipt can be tracked independently

DynamoDB Storage Function

What it does:

Connects to the DynamoDB table
Formats the receipt data for database storage
Adds a processing timestamp for tracking
Stores all receipt information as a single item
Includes the S3 path to link back to the original document

Key insights:

The structured format makes it easy to query receipts later
The timestamp records when processing occurred
The S3 path allows you to access the original receipt if needed

Email Notification Function

What it does:

Creates a formatted HTML email with receipt details
Includes a list of all extracted line items
Uses Amazon SES to send the email
Creates a descriptive subject line with vendor and total
Includes the S3 path to access the original document

Key insights:

HTML formatting makes the email easy to read
Including line items provides complete information
The receipt ID and S3 path enable tracking and retrieval

Error Handling and Logging

Throughout the code, you'll notice:

Try-except blocks that catch and log errors
Detailed log messages at each processing step
Default values to handle missing data gracefully
Continuation of execution when non-critical parts fail (e.g., email)

This robust error handling ensures the system can process imperfect receipts and recover from temporary issues without manual intervention.

Environment Variables

The code uses these environment variables for configuration:

DYNAMODB_TABLE: The name of the DynamoDB table to store receipts
SES_SENDER_EMAIL: The verified email address to send notifications from
SES_RECIPIENT_EMAIL: The email address to receive notifications

Using environment variables allows you to change these settings without modifying the code.

3.5 Integration and Testing

Steps to be Performed 👩‍💻

Setup S3 Event Notification Trigger.
Test the Project Execution.
Project Improvement Ideas.

#1 Setup S3 Event Notification Trigger

For our receipt processing system to operate automatically, we need to establish a trigger that will invoke our Lambda function whenever a new receipt is uploaded.

Amazon S3 offers event notifications that can detect new file uploads and trigger specific actions. This integration creates the vital connection between file upload (input) and processing (execution).

Navigate to S3 in the AWS Console

Access Your Bucket
- From the list of buckets, select the receipt storage bucket you created earlier
- This opens the bucket management interface

Access Properties Settings
- Navigate to the "Properties" tab at the top of the bucket management interface
- This tab contains various bucket configurations

Create Event Notification
- Scroll down to the "Event Notifications" section
- Click "Create event notification" to begin configuration
- This opens the event notification creation form

Configure Event Details
- Name: "ReceiptUploadEvent"
- Prefix (optional): "incoming/" (if using folders)
- Suffix (optional): Leave blank or add ".pdf,.jpg,.jpeg,.png"

Event types: Check "All object create events"
This includes put, post, copy, and multipart upload completions
Keep the rest of the boxes unchecked
Ensures any method of adding files triggers the process

Destination: Select "Lambda Function"
Lambda function: Select "ReceiptProcessor"
This connects the S3 event to your processing function

Save the Configuration
- Review all settings to ensure they match your requirements
- Click "Save" to activate the event notification
- AWS will validate and create the notification configuration

The S3 event notification creates the automated workflow for receipt processing:

When a receipt image is uploaded to S3, it generates an event
The event notification system detects this upload event
If the file matches the configured prefix and suffix filters, S3 invokes the Lambda function
The Lambda function receives event data including the bucket name and file key
Processing begins automatically without any manual intervention

#2 Test the Project Execution

This point explains how to verify your receipt processing system works correctly end-to-end.

Step 1: Upload a Test Receipt

Navigate to your S3 bucket in the AWS Console

Upload a receipt from this folder: Receipts (to the "incoming" folder if you created one)

Wait 10-15 seconds for processing to complete

Step 2: Monitor the Lambda Execution

Go to Lambda > Functions > ReceiptProcessor
Select the "Monitor" tab

Check that your function was recently invoked

📝Note: It may take a few minutes for the Lambda execution to appear in the monitoring dashboard. If you don't see your execution immediately, wait 2-3

minutes and refresh the page.

❗Important: Uploading a receipt in Step 1 is mandatory to trigger the Lambda function execution.

Click "View logs in CloudWatch" for detailed execution logs

What to Look For:

You may encounter this error: 'Log group '/aws/lambda/ReceiptProcessor' does not exist for account ID '211125755223''

This occurs because the CloudWatch log group isn't created until your Lambda function runs for the first time. To resolve:

Upload a receipt to trigger your Lambda function
Wait 2-3 minutes
Refresh the CloudWatch Logs page

Successful function invocation

Log entries showing processing steps

Any error messages that need attention

Step 3: Verify Data in DynamoDB

Go to DynamoDB > Tables > Receipts
Select the "Items" tab

Look for your recently processed receipt

Check:

Receipt data is stored correctly
Key fields (vendor, date, total amount) are accurate

Line items were extracted properly

Step 4: Check Email Notifications

Open your configured recipient email account

Find the email with subject "Receipt Processed: [vendor] - $[total]"
Verify the receipt details in the email body

Troubleshooting Tips

No Lambda trigger? Check S3 event configuration
Processing errors? Review CloudWatch logs
Missing data? Examine text extraction results

Next Steps

After successful testing, try processing multiple receipts with different formats to ensure robustness before deploying the system for regular use.

Sample Output

Below are screenshots of email notifications received after successful receipt processing, demonstrating the expected output format:

Clean Up🗑️

Delete ALB:
- Navigate to the EC2 Console, go to the Load Balancers section, and delete the ALB.
Terminate EC2 Instances:
- In the EC2 Console, stop and terminate the 2 Webserver instances and the Bastion Host instance.
Delete RDS Instance:
- In the RDS Console, delete the database instance, ensuring backups are created if needed.
Detach and Delete IGW:
- In the VPC Console, detach and delete the Internet Gateway from the VPC.
Delete Subnets and Route Tables:
- Remove all subnets and route tables associated with the project in the VPC Console.
Delete Security Groups:
- Navigate to the EC2 Console, and delete custom security groups created for this project.
Delete VPC:
- Ensure all dependencies are removed, then delete the VPC.

Automated AWS Receipt Processing System

Step 3: Verify Data in DynamoDB

Look for your recently processed receipt

Step 4: Check Email Notifications

Portfolio