Why Every Client Should Adopt the AWS Well-Architected Framework for Cloud Architecture

The secret sauce behind resilient, high‑performance cloud builds.

Many businesses are moving to AWS for its flexibility, speed, and cost benefits. But with so many options, it’s easy to feel overwhelmed or miss important details.

That’s where the AWS Well-Architected Framework (WAFR) helps. It gives a clear set of best practices to build secure, reliable, and cost-effective systems. We use it to help our clients build smarter, not just bigger.

Before vs After: Why the Well-Architected Framework (WAFR) Mattered

Let’s imagine you run a fast-growing SaaS startup called FBasket, a grocery delivery platform operating across India.

Your dev team is pushing features fast, and your AWS bill just hit 35,000 $ last month. Suddenly, your CTO is being grilled in a board meeting:

  • "Why are we spending so much?"

  • "Are we secure enough to onboard enterprise clients?"

  • "What’s our backup plan if the Mumbai region goes down?"

You realize your cloud infrastructure is a bit like a messy kitchen: everything works, but no one knows where the ingredients are, some appliances are always on (even when unused), and you're winging it every night.

When the Reality Hits Hard, when it was we audited the problems with the Current Architecture

Problem

Real-World Analogy

What It Looked Like

No clear tagging policy

Sticky notes are missing on food jars

Developers couldn't tell who launched which EC2 instance, leading to orphaned resources

IAM roles are overly permissive

Everyone has keys to the safe

Interns had access to production S3 buckets (Scary)

NO cost governance

Grocery bills piling up

Idle RDS, underutilized EC2s, and high data transfer charges (backbone of humongous bills)

NO resilience (HA) plan

One kitchen, no backup stove

If Mumbai went down, so did your whole app (Humpty Dumpty had a great fall)

Still manually provisioning

Chopping veggies every night from scratch

Every infra change was made manually, leading to errors and no audit trail. (Worrysome)

But we sure have not lost the hope & as they have put the faith in us, we would surely help them out to get them out of this facade

After around a month (4 weeks) of time, we helped them identify their mistakes, which were piling up the bill using the AWS Well-Architected Framework.

Solution

What We Did

Outcome

Implemented tagging enforcement

AWS Config + Terraform to enforce Owner, Project, Env tags

Clean billing reports, ownership clarity

IAM least privilege model

Scoped IAM roles, MFA, and audit logging

No more over-permissioned users, improved security posture

Introduced auto-scheduling

Scheduled dev EC2 and RDS to stop after 10 PM

Possible saving of ₹35,000/month

Set up cross-region backup

Deployed cross-region S3 replication + RDS snapshots

Reduced RTO to <15 mins

IaC with Terraform

Replaced manual setup with Terraform modules and GitOps via Atlantis

Reliable, repeatable infra with audit trails

Resizing Instances

Moved Intel-based processors to Gravitron based processors in EC2

Slashed the EC2 billing by at least 18% less than X86-based processors

That helped to calm the chaos and set up a cost-effective and maintainable setup to calm the nerves of the higher management.

The Pillars of the (WAFR), & Why deal with them

Operational Excellence

It's advised to organize small teams following the modular approach, focusing on each module of the business aspect, aggregating to the more sustained and efficient final business outcome.

To gain the top view of what's going on, leverage the observability to make an informed decision & take prompt actions when the business outcomes are at risk.

Reducing the operational burden using the AWS managed services wherever possible.

Security

It encompasses the ability to protect the data, systems, and assets to take advantage of AWS to improve security.

It's advisable to follow the Least Privilege Principle for all users, ensuring that the users don't have access to what they don't require. eg, Interns having access to prod S3 Buckets.

Should incorporate the proper tagging of all the resources to get proper tracking of the creation and maintainability point of view of the resources.

Reliability

It deals with the ability of the workload to perform its intended function correctly & consistently when it is expected to.

It's better to have multiple smaller resources than to have one larger resource to reduce the impact of Single Point Failure (SPF).

Testing out the Disaster Recovery Strategies by simulating a failover scenario that has happened before, or a chance to happen when the worst-case scenario hits. It helps to expose the dead roots of the DR plans and can be worked upon to make them more efficient.

Cost Optimization

It deals with the ability to run the systems to deliver a business value proposition at the lowest price point.

Adapting to the consumption model, if targeting to exist for a longer duration, then adapt the long-duration savings plan to save tons of cost.

Scheduling the resources to go offline during the weekends and the off-hours on the weekdays helps in saving the unnecessary consumption of the resources.

Sustainability

It focuses on the environmental impacts, especially dealing with the envionmental impacts, especially energy consumption and efficiency.

Right-sizing of the instances plays a critical role in adapting for sustainability goals, wherein we can minimize idle resources, processing, and storage to reduce the total energy required to power your workload.

Making use of the managed Services to minimize the impact.

eg> making use of the S3 Bucket policies to move the data to the infrequent access tier if that comes within the budget constraints.

Performance Efficiency

Ability to use the cloud resources efficiently to meet performance requirements

Can make use of the serverless architectures, removing the need to maintain a physical server for the short-lived processes.

Going Global for the workloads using the AWS regions for lower latency and a better experience for the customer at a minimal cost.

Terraform to the Rescue: Snippets We Used

Prerequisites:

Terraform to Configure AWS to Periodically Start an EC2 Instance


resource "aws_scheduler_schedule" "my_scheduler" {
  name       = "my-scheduler"
  group_name = "default"

  flexible_time_window {
    mode = "OFF"
  }

  # Run it each Monday
  schedule_expression = "cron(0 9 ? * MON *)"

  target {
    # This indicates that the event should be send to EC2 API and startInstances action should be triggered
    arn      = "arn:aws:scheduler:::aws-sdk:ec2:startInstances"
    role_arn = aws_iam_role.my_role.arn
    
    # And this block will be passed to startInstances API
    input = jsonencode({
      InstanceIds = [
        aws_instance.my_ec2.id
      ]
    })
  }

Terraform to Configure and Enforce the Tags on the Created Resources

Create the SNS Topic:

Here we must confirm our Subscription on the given email!

resource "aws_sns_topic" "config_alerts" {
  name = "config-noncompliance-alerts"
}

resource "aws_sns_topic_subscription" "email_alert" {
  topic_arn = aws_sns_topic.config_alerts.arn
  protocol  = "email"
  endpoint  = "[email protected]" # Replace with your email
}


Add the AWS Config Rule

resource "aws_config_config_rule" "required_tags" {
  name = "required-tags"
  source {
    owner             = "AWS"
    source_identifier = "REQUIRED_TAGS"
  }

  input_parameters = jsonencode({
    Tag1Key = "Owner"
    Tag2Key = "Project"
    Tag3Key = "Environment"
  })
  scope {
    compliance_resource_types = ["AWS::EC2::Instance"]
  }
  depends_on = [aws_sns_topic_subscription.email_alert]
}


CloudWatch Event Rule to Detect Non-Compliance

resource "aws_cloudwatch_event_rule" "config_noncompliant" {
  name        = "config-rule-noncompliance"
  description = "Trigger when AWS Config rule becomes NON_COMPLIANT"

  event_pattern = jsonencode({
    source       = ["aws.config"]
    "detail-type": ["Config Rules Compliance Change"]
    detail       = {
      newEvaluationResult = {
        complianceType = ["NON_COMPLIANT"]
      }
      configRuleName = [aws_config_config_rule.required_tags.name]
    }
  })
}

Add the CloudWatch Event Target

resource "aws_cloudwatch_event_target" "sns_target" {
  rule      = aws_cloudwatch_event_rule.config_noncompliant.name
  target_id = "send-to-sns"
  arn       = aws_sns_topic.config_alerts.arn
}



Terraform Script for AWS Monthly Budgets

# Setting up a cost budget in AWS using Terraform

# Define a cost budget using AWS Budgets
resource "aws_budgets_budget" "monthly_budget" {
  name         = "monthly-cost-budget" 
  budget_type  = "COST"               
  limit_amount = "2000"                
  limit_unit   = "INR"                 
  time_unit    = "MONTHLY"            

  # Optional: Apply budget only to a specific AWS linked account (in case of consolidated billing)
  cost_types {
    include_credit             = true 
    include_discount           = true 
    include_other_subscription = true 
    include_recurring          = true 
    include_refund             = true 
    include_subscription       = true 
    include_support            = true 
    include_tax                = true 
    include_upfront            = true 
  }

  # Define the notification settings
  notification {
    comparison_operator        = "GREATER_THAN" 
    threshold                  = 90           
    threshold_type             = "PERCENTAGE"  
    notification_type          = "ACTUAL"      
    subscriber_email_addresses = "[email protected]"
  }
}

Conclusion

The AWS Well-Architected Framework gave us a clear path to build smarter, run safer, and spend better. It’s like having a blueprint that grows with your business.

If you're building on AWS, don’t just build; build it well.

Want to secure Your Terraform Infrastructure ?

Learn how implementing tfsec, Checkov, and TFLint can significantly enhance your security ?!

EzyInfra.dev – Expert DevOps & Infrastructure consulting! We help you set up, optimize, and manage cloud (AWS, GCP) and Kubernetes infrastructure—efficiently and cost-effectively. Need a strategy? Get a free consultation now!

Share this post

Want to discuss about DevOps practices, Infrastructure Audits or Free consulting for your AWS Cloud?

Prasanna would be glad to jump into a call
Loading...