Automated AWS Receipt Processing System
3.1 Overview of Project
This project focuses on automating receipt processing using AWS services. Instead of manually handling receipts which can be time-consuming, error-prone, and difficult to scale—this system extracts structured data from receipts and stores it efficiently for record-keeping and auditing.
The architecture consists of:
Storage Layer: Amazon S3 stores receipt images and PDFs.
Processing Layer: Amazon Textract extracts text from receipts using AI-powered OCR.
Database Layer: DynamoDB stores the extracted data in a structured format.
Notification System: Amazon SES sends email alerts with receipt details.
Compute Layer: AWS Lambda automates the workflow by processing the receipts in real-time.
3.2 Storage and Database Setup: S3 bucket and DynamoDB Table
Steps to be Performed 👩💻
Setup S3 Buckets for Receipt Storage and Archiving.
Create a DynamoDB Table to Store Extracted Receipt Data.
#1 Set Up S3 Buckets for Receipt Storage and Archiving
The receipt processing system needs a central, secure location to store both the original receipt files and track their processing status.
Amazon S3 (Simple Storage Service) provides the ideal solution with its durability, availability, and security features. This bucket will serve as both the entry point for new receipts and the archive for processed documents.
Steps
Sign in to the AWS Management Console
Navigate to the Amazon S3 service


Click "Create bucket"
This will open the bucket creation wizard
The bucket will store all your receipt files
Select “Bucket type” as General purpose


Enter a unique bucket name (e.g., "automated-receipts-yourusername")
Note: S3 bucket names must be globally unique across all AWS accounts
Best practice: Include your username or organization name to ensure uniqueness


❗Important: Keep all default settings for the bucket configuration
Do not change any of the default options for:
Object Ownership
Block Public Access settings
Bucket Versioning
Tags
Default encryption
Advanced settings
Click "Create bucket"
AWS will validate your settings and create the bucket
You'll be redirected to the S3 buckets list where your new bucket should appear


Create an organizational folder (Recommended)
Navigate into your newly created bucket
Click "Create folder" and name it "incoming" for new receipt uploads
This folder help organize receipts by processing status


The S3 bucket serves as both the starting point and archive for the receipt processing workflow:
Users or systems upload receipt images/PDFs to the "incoming" folder
These uploads automatically trigger the Lambda function
The system maintains the original documents for audit purposes while extracting their data
#2 Create a DynamoDB Table to Store Extracted Receipt Data
After extracting structured data from receipts, the system needs a scalable, high-performance database to store this information.
DynamoDB is an ideal choice for this application because it provides consistent single-digit millisecond response times at any scale, without the need to manage database servers or worry about capacity planning.
Steps
Navigate to DynamoDB in the AWS Console

Click "Create table"
This opens the table creation interface
The table will store all extracted receipt data in a structured format


Configure the table basics
Table name: Enter "Receipts"
Partition key: Enter "receipt_id" and select "String" type
This will be a unique identifier for each receipt
Generated automatically by the processing Lambda function
Sort key: Enter "date" and select "String" type
This allows you to query later on, by date or date ranges
Format will be "YYYY-MM-DD" for consistent sorting


Review and create
Leave default settings for the rest of the options
Click "Create table"
AWS will provision your table (this typically takes less than a minute)





DynamoDB Table Setup
After creating your "Receipts" table in DynamoDB:
The table will initially show "No items" as seen in the screenshot
Items will only appear after:
Receipt files are uploaded to the S3 "incoming" folder
The Lambda function processes these uploads
Extracted data is stored in this DynamoDB table
This empty state is normal and indicates the table is ready to receive data once the workflow begins.


Data Model for Receipt Information
The DynamoDB table will store the following attributes for each receipt:


The DynamoDB table serves as the structured data repository for all processed receipts:
The Lambda function extracts data from receipts using Amazon Textract
The extracted data is formatted according to the data model
The Lambda function writes this data to the DynamoDB table
Applications and reports can query the table for receipt information
The original receipt images remain in S3, linked via the receipt_url attribute
3.3 Notification Setup: Configuring Amazon SES
Steps to be Performed 👩💻
Setting up SES for Email Notifications.
Verifying the Email Address.
Configuring SES for Receipt Notifications.
#1 Setting up SES for Email Notifications
After receipts are processed, users need to be notified about the extracted information and any actions required.
Amazon Simple Email Service (SES) provides a reliable, cost-effective email solution for sending these notifications. Setting up SES ensures that appropriate stakeholders are informed when new receipts are processed.
Navigate to Amazon SES in the AWS Console


Access Identity Management
In the SES console, go to Configuration > Identities
This section is where you'll verify email addresses or domains


Begin Identity Creation
Click the Create identity button
This starts the process of verifying an email address or domain for sending


Select Identity Type
Select Email address option
Note: For production systems, you might want to verify an entire domain instead


Configure Sender Email
Enter your sender email address (the address you'll send notifications from)
Best practice: Use a business email or a dedicated notification address
Example: receipts-noreply@yourcompany.com
Complete Initial Setup
Click Create identity
AWS will process your request and prepare verification


#2 Verifying the Email Address
After creating the identity, navigate to your Gmail inbox and perform the verification steps as provided below:
Verify Sender Email
AWS will send a verification email to the address you provided
This verification ensures you own the email address
Check both inbox and spam/junk folders for this email


Complete Verification
Go to the inbox of that email address and find the AWS verification email
Click the verification link in that email


You'll be redirected to a confirmation page in AWS


#3 Configuring SES for Receipt Notifications
❗Important: Email Verification Options
Option 1: Use the same email for sender and recipient (simplest approach)
Verify only one email address to use for both sending and receiving notifications
This is recommended for tutorial and testing purposes
Option 2: Use different sender and recipient emails
Repeat steps 1 and 2 for each recipient email address
Remember: While in SES sandbox mode, both sender and recipient emails must be verified


The SES configuration enables the notification component of the receipt processing workflow:
After the Lambda function processes a receipt and stores data in DynamoDB
It uses SES to send an email notification with receipt details
The notification includes key information like vendor, date, amount, and a link to the original receipt
Users receive timely updates without needing to check the system manually
3.4 Processing Setup: Creating a Lambda function
Steps to be Performed 👩💻
Create an IAM role for AWS Lambda.
Create a Lambda Function.
Add the Lambda function Code.
Understand the Lambda function Code.
#1 Create an IAM role
Lambda functions need permission to access other AWS services.
Creating a dedicated IAM role follows security best practices by explicitly defining what actions your Lambda function can perform. To do this—
Navigate to IAM in the AWS Console

Initiate Role Creation
Click "Roles" in the left navigation pane
Click "Create role" button to start the process


Select Trusted Entity
Select "AWS service" as the trusted entity type
This specifies which AWS service can assume this role




Specify Service Use Case
Choose "Lambda" as the service that will use this role
This allows Lambda functions to assume this role at runtime
Configure Permissions
Click "Next: Permissions" to continue
On the Permissions policies page, you'll attach policies that define what the role can do


Attach Required Policies
In the search box, search for and select these policies:
AmazonS3ReadOnlyAccess: Allows reading from S3 buckets
AmazonTextractFullAccess: Enables document analysis with Textract
AmazonDynamoDBFullAccess: Permits writing extracted data to DynamoDB
AmazonSESFullAccess: Enables sending email notifications
AWSLambdaBasicExecutionRole: Allows Lambda to write logs to CloudWatch


Review and Name Role
Click "Next"
Role name: "ReceiptProcessingLambdaRole"
Description (optional): "Allows Lambda functions to process receipts using S3, Textract, DynamoDB, and SES"


Create the Role
Review the role configuration
Click "Create role" to finalize


Validate Role Creation
Verify the role appears in the IAM roles list
Click on the role name to view its details and ensure all policies are attached correctly


The IAM role connects all components of the receipt processing system:
When the Lambda function executes, it assumes this role
The role's permissions allow the function to:
Read receipt files from the S3 bucket
Send those files to Textract for analysis
Write extracted data to DynamoDB
Send notification emails via SES
Write logs to CloudWatch for troubleshooting
#2 Create a Lambda Function
The Lambda function serves as the central processing engine for our receipt processing system.
This serverless function is triggered automatically when a new receipt is uploaded to S3, coordinates the extraction of data using Textract, stores the structured information in DynamoDB, and sends notifications via SES.
Navigate to Lambda in the AWS Console
Begin Function Creation
Click "Create function" to start the process
This opens the function creation wizard




Select Creation Method
Select "Author from scratch"
This allows you to create a custom function from the beginning
Configure Basic Settings
Function name: Enter "ReceiptProcessor"
Runtime: Select "Python 3.9" from the dropdown
Architecture: Leave as default (x86_64)


Set Permissions
Under "Permissions" expand "Change default execution role"
Select "Use an existing role"
From the dropdown, choose "ReceiptProcessingLambdaRole"


Create the Function
Review your settings
Click "Create function"
AWS will provision your Lambda function (this takes just a few seconds)




Adjust Function Timeout
Navigate to the "Configuration" tab
Select "General configuration"
Click "Edit"


Change the Timeout from the default 3 seconds to 3 minutes (180 seconds)
This extended timeout is necessary because Textract processing can take time for complex receipts
Click "Save"
Configure Environment Variables
Still in the "Configuration" tab, select "Environment variables"
Click "Edit"


Add the following key-value pairs:
Key: DYNAMODB_TABLE, Value: Receipts
Key: SES_SENDER_EMAIL, Value: your-verified-email@example.com (use the email you verified in SES)
Key: SES_RECIPIENT_EMAIL, Value: recipient-email@example.com (use the recipient email you verified)
Note: For detailed information about these environment variables and their usage in the Lambda function, please refer to the Lambda Environment Variables table given below.
Click Save.


Lambda Environment Variables:
Environment variables provide configuration without code changes:


The Lambda function forms the core of the receipt processing workflow:
When a new receipt is uploaded to S3, it triggers the Lambda function
Lambda retrieves the receipt file from S3
Lambda calls Amazon Textract to extract data from the receipt
Lambda processes and structures the extracted data
Lambda writes the structured data to DynamoDB
Lambda sends a notification email via SES
Lambda moves the original receipt to the "processed" folder
#3 Add the Lambda function Code
Access the Code Editor
In the Lambda console, navigate to your "ReceiptProcessor" function
Scroll down to the code source section where you'll see the code editor
The default Lambda function contains a simple "Hello World" example


Replace the default code with the provided Python code
Delete the existing code in the editor
Copy and paste the following Python code:
Deploy the Function
After pasting the code, click the "Deploy" button
This saves your code and makes it available for execution
You should see a confirmation message once deployment is complete


This Lambda function code ties together all the components we've set up:
It's triggered automatically when a new file appears in the S3 bucket
It uses the IAM role we created to access other AWS services
It processes the receipt image using Textract
It stores the structured data in our DynamoDB table
It sends notifications via our configured SES email
The code is designed to be resilient, with comprehensive error handling and logging to help troubleshoot any issues that might arise during processing.
#2 Understand the Lambda function Code
This point breaks down the Lambda function code that powers the AWS Receipt Processing System, explaining each major component and its purpose.
Code Structure Overview
The Lambda function is organized into four main components:
Lambda Handler: Entry point that coordinates the entire workflow
Textract Processing: Extracts structured data from receipt images
DynamoDB Storage: Saves the processed data to the database
Email Notification: Sends formatted results via email
Lambda Handler Function
What it does:
Acts as the orchestrator for the entire process
Extracts information about which file was uploaded from the S3 event
URL-decodes the file path to handle special characters
Verifies the file exists before attempting processing
Calls the specialized functions for each step of the process
Handles errors gracefully with detailed logging
Textract Processing Function
What it does:
Uses Amazon Textract's specialized analyze_expense API
Creates a unique ID for the receipt
Sets up default values for all expected fields
Extracts summary information (vendor, date, total amount)
Extracts individual line items with their quantities and prices
Returns structured data in a consistent format
Key insights:
Textract understands the structure of receipts, not just the text
The function handles missing data gracefully with default values
The unique ID ensures each receipt can be tracked independently
DynamoDB Storage Function
What it does:
Connects to the DynamoDB table
Formats the receipt data for database storage
Adds a processing timestamp for tracking
Stores all receipt information as a single item
Includes the S3 path to link back to the original document
Key insights:
The structured format makes it easy to query receipts later
The timestamp records when processing occurred
The S3 path allows you to access the original receipt if needed
Email Notification Function
What it does:
Creates a formatted HTML email with receipt details
Includes a list of all extracted line items
Uses Amazon SES to send the email
Creates a descriptive subject line with vendor and total
Includes the S3 path to access the original document
Key insights:
HTML formatting makes the email easy to read
Including line items provides complete information
The receipt ID and S3 path enable tracking and retrieval
Error Handling and Logging
Throughout the code, you'll notice:
Try-except blocks that catch and log errors
Detailed log messages at each processing step
Default values to handle missing data gracefully
Continuation of execution when non-critical parts fail (e.g., email)
This robust error handling ensures the system can process imperfect receipts and recover from temporary issues without manual intervention.
Environment Variables
The code uses these environment variables for configuration:
DYNAMODB_TABLE: The name of the DynamoDB table to store receipts
SES_SENDER_EMAIL: The verified email address to send notifications from
SES_RECIPIENT_EMAIL: The email address to receive notifications
Using environment variables allows you to change these settings without modifying the code.
3.5 Integration and Testing
Steps to be Performed 👩💻
Setup S3 Event Notification Trigger.
Test the Project Execution.
Project Improvement Ideas.
#1 Setup S3 Event Notification Trigger
For our receipt processing system to operate automatically, we need to establish a trigger that will invoke our Lambda function whenever a new receipt is uploaded.
Amazon S3 offers event notifications that can detect new file uploads and trigger specific actions. This integration creates the vital connection between file upload (input) and processing (execution).
Navigate to S3 in the AWS Console


Access Your Bucket
From the list of buckets, select the receipt storage bucket you created earlier
This opens the bucket management interface


Access Properties Settings
Navigate to the "Properties" tab at the top of the bucket management interface
This tab contains various bucket configurations


Create Event Notification
Scroll down to the "Event Notifications" section
Click "Create event notification" to begin configuration
This opens the event notification creation form


Configure Event Details
Name: "ReceiptUploadEvent"
Prefix (optional): "incoming/" (if using folders)
Suffix (optional): Leave blank or add ".pdf,.jpg,.jpeg,.png"


Event types: Check "All object create events"
This includes put, post, copy, and multipart upload completions
Keep the rest of the boxes unchecked
Ensures any method of adding files triggers the process


Destination: Select "Lambda Function"
Lambda function: Select "ReceiptProcessor"
This connects the S3 event to your processing function
Save the Configuration
Review all settings to ensure they match your requirements
Click "Save" to activate the event notification
AWS will validate and create the notification configuration


The S3 event notification creates the automated workflow for receipt processing:
When a receipt image is uploaded to S3, it generates an event
The event notification system detects this upload event
If the file matches the configured prefix and suffix filters, S3 invokes the Lambda function
The Lambda function receives event data including the bucket name and file key
Processing begins automatically without any manual intervention
#2 Test the Project Execution
This point explains how to verify your receipt processing system works correctly end-to-end.
Step 1: Upload a Test Receipt
Navigate to your S3 bucket in the AWS Console


Upload a receipt from this folder: Receipts (to the "incoming" folder if you created one)




Wait 10-15 seconds for processing to complete
Step 2: Monitor the Lambda Execution
Go to Lambda > Functions > ReceiptProcessor
Select the "Monitor" tab


Check that your function was recently invoked
📝Note: It may take a few minutes for the Lambda execution to appear in the monitoring dashboard. If you don't see your execution immediately, wait 2-3
minutes and refresh the page.
❗Important: Uploading a receipt in Step 1 is mandatory to trigger the Lambda function execution.




Click "View logs in CloudWatch" for detailed execution logs


What to Look For:
You may encounter this error: 'Log group '/aws/lambda/ReceiptProcessor' does not exist for account ID '211125755223''


This occurs because the CloudWatch log group isn't created until your Lambda function runs for the first time. To resolve:
Upload a receipt to trigger your Lambda function
Wait 2-3 minutes
Refresh the CloudWatch Logs page
Successful function invocation


Log entries showing processing steps


Any error messages that need attention
Step 3: Verify Data in DynamoDB
Go to DynamoDB > Tables > Receipts
Select the "Items" tab


Look for your recently processed receipt
Check:
Receipt data is stored correctly
Key fields (vendor, date, total amount) are accurate




Line items were extracted properly
Step 4: Check Email Notifications
Open your configured recipient email account
Find the email with subject "Receipt Processed: [vendor] - $[total]"
Verify the receipt details in the email body


Troubleshooting Tips
No Lambda trigger? Check S3 event configuration
Processing errors? Review CloudWatch logs
Missing data? Examine text extraction results
Next Steps
After successful testing, try processing multiple receipts with different formats to ensure robustness before deploying the system for regular use.
Sample Output
Below are screenshots of email notifications received after successful receipt processing, demonstrating the expected output format:



Clean Up🗑️
Delete ALB:
Navigate to the EC2 Console, go to the Load Balancers section, and delete the ALB.
Terminate EC2 Instances:
In the EC2 Console, stop and terminate the 2 Webserver instances and the Bastion Host instance.
Delete RDS Instance:
In the RDS Console, delete the database instance, ensuring backups are created if needed.
Detach and Delete IGW:
In the VPC Console, detach and delete the Internet Gateway from the VPC.
Delete Subnets and Route Tables:
Remove all subnets and route tables associated with the project in the VPC Console.
Delete Security Groups:
Navigate to the EC2 Console, and delete custom security groups created for this project.
Delete VPC:
Ensure all dependencies are removed, then delete the VPC.