This section documents the real-world difficulties encountered during the deployment of each infrastructure step for Money Manager in section 5.3 (VPC/EC2, RDS/ElastiCache, DynamoDB/SQS, S3/VPC Endpoints, CloudWatch), comparing them against the High Availability architecture proposed in section 2, how they were resolved, and future development directions.
Single EC2 instead of Multi-AZ Auto Scaling Group: The proposal in section 2, item 4 set the goal of High Availability using EC2 + Auto Scaling Group across 2 Availability Zones. Within the workshop timeframe, I only managed a single t3.micro EC2 instance running two containers moneymanager-api/moneymanager-worker (Step 1, section 5.3) — I didn’t have time to set up a Launch Template, ALB Target Group, and Auto Scaling Policy since additional time was needed for health check testing before cloning instances.
Confusion between Gateway Endpoint and Interface Endpoint for S3: Initially I thought only one type of VPC Endpoint was sufficient. After reading the documentation carefully, I understood that the Gateway Endpoint is free but only works for internal VPC traffic, while the Interface Endpoint (PrivateLink) charges by the hour but provides private IPs and allows access from VPN/on-prem through Route 53 Resolver. Since Step 4 (section 5.3) required both EC2 inside the VPC and office workstations over VPN to access S3, I ended up using a combination of both types.
Endpoint Policy and S3 Bucket Policy conflicting: When first setting up, I configured the Bucket Policy to block all access not going through the VPC Endpoint before assigning the correct bucket Resource ARN to the Endpoint Policy, causing even the EC2 API to get Access Denied when calling S3. I had to go through each step and test directly from EC2 using aws s3 cp to discover the configuration order issue.
Route 53 Resolver Inbound Endpoint was entirely new territory: Previously I was only familiar with using Route 53 for regular public DNS records, and had never configured an internal DNS resolver for the office over VPN to resolve S3 domain names to the Interface Endpoint’s IP. This was the most time-consuming part to learn across all of Step 4.
Migration from MongoDB Atlas to DynamoDB and redesigning the SQS queue: When implementing Step 3 (section 5.3), removing the old MongoDB Atlas library and redesigning the Partition Key for the two tables chat_sessions/chat_messages using the AWS SDK v2 took considerable trial-and-error time. Configuring the Redrive Policy for the DLQ moneymanager-async-jobs-dlq (maximum 3 retries) also had to be adjusted several times because the default retry count was too low, causing legitimate jobs to be sent to the DLQ unfairly.
aws s3 cp, aws dynamodb scan, aws sqs receive-message), then added the Interface Endpoint and Endpoint Policy, testing after each change to easily isolate errors.