We recently Migrated 360 million records across 1,500+ tables from DigitalOcean to AWS Aurora but to ensure seamless operation, we needed a robust monitoring setup that could detect performance bottlenecks, track resource utilization, and provide real-time alerts for critical issues.
Core Focus Areas
Monitor the historical metrics and visualize them on a user-friendly dashboard.
Trigger Alerts and send Slack notifications when metrics breach the threshold value.
Ensure alerts are clear and human-readable rather than raw CloudWatch data.
Services involved in RDS Monitoring
Amazon CloudWatch – Collects all RDS metrics and monitors their performance.
Amazon SNS (Simple Notification Service) Topic – Sends notifications to Lambda when CloudWatch alarms are triggered.
AWS Lambda – Processes SNS notification payload (JSON), converts payload into a human-readable format, and sends them to Slack.
You can find the Lambda Source Code here: GitHub Repository
Slack – A webhook URL is created and passed as an environment variable in the Lambda function to send alerts.
Key Benefits of Monitoring the RDS Database
Early Issue Detection – Identifies high CPU usage, slow queries, and connection spikes for proactive performance tuning.
Prevents Resource Exhaustion – Tracks memory and storage to avoid unexpected crashes.
Ensures High Availability – Detects connection overloads and reduces downtime with real-time alerts.
Optimizes Costs – Prevents over-provisioning by analyzing resource usage trends.
Supports Scaling – Identifies peak usage for auto-scaling and capacity adjustments.
Improves Disaster Recovery – Ensures backups, snapshots, and failover mechanisms are functioning.
Key Metrics to Monitor
1. CPU Utilization
This metric helps assess the database load and determine whether the database cluster is appropriately sized.
Metric to monitor - CPUUtilization
Alert Condition - CPUUtilization > 75% for 5 datapoints within 5 minutes
Alert Message
2. Database Connections
This metric tracks active database connections to prevent overload and ensure stability.
Metric to monitor - DatabaseConnections
Alert Condition - DatabaseConnections >= 115 for 5 data points within 5 minutes
Alert Message
3. Freeable Memory & Free Local Storage
These metrics monitor available memory and storage to prevent resource exhaustion.
Metrics to monitor - FreeableMemory, FreeLocalStorage
Alert Condition -
FreeableMemory <= 0.25GB for 3 datapoints within 5 minutes
FreeLocalStorage < 5GB for 3 datapoints within 3 minutes
Alert Message
4. Write Latency & Read Latency
These metrics ensure optimal database performance by monitoring read/write delays.
Metrics to monitor - WriteLatency, ReadLatency
Alert Condition -
WriteLatency > 30ms for 3 datapoints within 5 minutes
ReadLatency > 20ms for 3 datapoints within 5 minutes
Alert Message
All the key metrics to monitor are summarized in the table below
Metrics Type | Alert Condition |
---|---|
CPUUtilization | CPUUtilization > 75% for 5 data points within 5 minutes |
DBConnection | DatabaseConnections >= 115 for 5 data points within 5 minutes |
FreeableMemory & FreeLocalStorage | FreeableMemory <= 0.25GB for 3 data points within 5 minutes FreeLocalStorage < 5GB for 3 data points within 3 minutes |
WriteLatecy & ReadLatency | WriteLatency > 30ms for 3 data points within 5 minutes ReadLatency > 20ms for 3 data points within 5 minutes |
Conclusion
With CloudWatch alarms, SNS, and a Lambda function, we now have a streamlined monitoring setup for the RDS instance; getting notified in Slack about potential issues before they escalate, ensuring smooth database performance. If you're running RDS, setting up a similar monitoring system can save you from unexpected downtime and performance hiccups.
Survived this deep dive? Stay ahead—subscribe to EzyInfra Knowledge Base for more DevOps wisdom.
EzyInfra.dev is a DevOps and Infrastructure consulting company helping clients in Setting up the Cloud Infrastructure (AWS, GCP), Cloud cost optimization, and manage Kubernetes-based infrastructure. If you have any requirements or want a free consultation for your Infrastructure or architecture, feel free to schedule a call here.
Share this post