Real-Time RDS Monitoring: Avoiding Downtime with CloudWatch & Slack Alerts

Intro Image

We recently Migrated 360 million records across 1,500+ tables from DigitalOcean to AWS Aurora but to ensure seamless operation, we needed a robust monitoring setup that could detect performance bottlenecks, track resource utilization, and provide real-time alerts for critical issues.

Core Focus Areas

Monitor the historical metrics and visualize them on a user-friendly dashboard.
Trigger Alerts and send Slack notifications when metrics breach the threshold value.
Ensure alerts are clear and human-readable rather than raw CloudWatch data.

Services involved in RDS Monitoring

RDS Monitoring Workflow

Amazon CloudWatch – Collects all RDS metrics and monitors their performance.
Amazon SNS (Simple Notification Service) Topic – Sends notifications to Lambda when CloudWatch alarms are triggered.
AWS Lambda – Processes SNS notification payload (JSON), converts payload into a human-readable format, and sends them to Slack.
You can find the Lambda Source Code here: GitHub Repository
Slack – A webhook URL is created and passed as an environment variable in the Lambda function to send alerts.

Key Benefits of Monitoring the RDS Database

Early Issue Detection – Identifies high CPU usage, slow queries, and connection spikes for proactive performance tuning.
Prevents Resource Exhaustion – Tracks memory and storage to avoid unexpected crashes.
Ensures High Availability – Detects connection overloads and reduces downtime with real-time alerts.
Optimizes Costs – Prevents over-provisioning by analyzing resource usage trends.
Supports Scaling – Identifies peak usage for auto-scaling and capacity adjustments.
Improves Disaster Recovery – Ensures backups, snapshots, and failover mechanisms are functioning.

Key Metrics to Monitor

1. CPU Utilization

This metric helps assess the database load and determine whether the database cluster is appropriately sized.

Metric to monitor - CPUUtilization

Alert Condition - CPUUtilization > 75% for 5 datapoints within 5 minutes

Alert Message

2. Database Connections

This metric tracks active database connections to prevent overload and ensure stability.

Metric to monitor - DatabaseConnections

Alert Condition - DatabaseConnections >= 115 for 5 data points within 5 minutes

Alert Message

3. Freeable Memory & Free Local Storage

These metrics monitor available memory and storage to prevent resource exhaustion.

Metrics to monitor - FreeableMemory, FreeLocalStorage

Alert Condition -

FreeableMemory <= 0.25GB for 3 datapoints within 5 minutes

FreeLocalStorage < 5GB for 3 datapoints within 3 minutes

Alert Message

4. Write Latency & Read Latency

These metrics ensure optimal database performance by monitoring read/write delays.

Metrics to monitor - WriteLatency, ReadLatency

Alert Condition -

WriteLatency > 30ms for 3 datapoints within 5 minutes

ReadLatency > 20ms for 3 datapoints within 5 minutes

Alert Message

All the key metrics to monitor are summarized in the table below

Metrics Type	Alert Condition
CPUUtilization	CPUUtilization > 75% for 5 data points within 5 minutes
DBConnection	DatabaseConnections >= 115 for 5 data points within 5 minutes
FreeableMemory & FreeLocalStorage	FreeableMemory <= 0.25GB for 3 data points within 5 minutes FreeLocalStorage < 5GB for 3 data points within 3 minutes
WriteLatecy & ReadLatency	WriteLatency > 30ms for 3 data points within 5 minutes ReadLatency > 20ms for 3 data points within 5 minutes

Conclusion

With CloudWatch alarms, SNS, and a Lambda function, we now have a streamlined monitoring setup for the RDS instance; getting notified in Slack about potential issues before they escalate, ensuring smooth database performance. If you're running RDS, setting up a similar monitoring system can save you from unexpected downtime and performance hiccups.

Survived this deep dive? Stay ahead—subscribe to EzyInfra Knowledge Base for more DevOps wisdom.

EzyInfra.dev is a DevOps and Infrastructure consulting company helping clients in Setting up the Cloud Infrastructure (AWS, GCP), Cloud cost optimization, and manage Kubernetes-based infrastructure. If you have any requirements or want a free consultation for your Infrastructure or architecture, feel free to schedule a call here.

Share this post

Want to discuss about DevOps practices, Infrastructure Audits or Free consulting for your AWS Cloud?

Prasanna would be glad to jump into a call

Real-Time RDS Monitoring: Avoiding Downtime with CloudWatch & Slack Alerts

Core Focus Areas

Services involved in RDS Monitoring

Key Benefits of Monitoring the RDS Database

Key Metrics to Monitor

1. CPU Utilization

2. Database Connections

3. Freeable Memory & Free Local Storage

4. Write Latency & Read Latency

Conclusion

Want to discuss about DevOps practices, Infrastructure Audits or Free consulting for your AWS Cloud?

Signup for EzyInfra Knowledge base

Typeflo