Why Every Client Should Adopt the AWS Well-Architected Framework for Cloud Architecture

Many businesses are moving to AWS for its flexibility, speed, and cost benefits. But with so many options, it’s easy to feel overwhelmed or miss important details.

That’s where the AWS Well-Architected Framework (WAFR) helps. It gives a clear set of best practices to build secure, reliable, and cost-effective systems. We use it to help our clients build smarter, not just bigger.

Before vs After: Why the Well-Architected Framework (WAFR) Mattered

Let’s imagine you run a fast-growing SaaS startup called FBasket, a grocery delivery platform operating across India.

Your dev team is pushing features fast, and your AWS bill just hit 35,000 $ last month. Suddenly, your CTO is being grilled in a board meeting:

"Why are we spending so much?"
"Are we secure enough to onboard enterprise clients?"
"What’s our backup plan if the Mumbai region goes down?"

You realize your cloud infrastructure is a bit like a messy kitchen: everything works, but no one knows where the ingredients are, some appliances are always on (even when unused), and you're winging it every night.

When the Reality Hits Hard, when it was we audited the problems with the Current Architecture

Problem	Real-World Analogy	What It Looked Like
No clear tagging policy	Sticky notes are missing on food jars	Developers couldn't tell who launched which EC2 instance, leading to orphaned resources
IAM roles are overly permissive	Everyone has keys to the safe	Interns had access to production S3 buckets (Scary)
NO cost governance	Grocery bills piling up	Idle RDS, underutilized EC2s, and high data transfer charges (backbone of humongous bills)
NO resilience (HA) plan	One kitchen, no backup stove	If Mumbai went down, so did your whole app (Humpty Dumpty had a great fall)
Still manually provisioning	Chopping veggies every night from scratch	Every infra change was made manually, leading to errors and no audit trail. (Worrysome)

But we sure have not lost the hope & as they have put the faith in us, we would surely help them out to get them out of this facade

After around a month (4 weeks) of time, we helped them identify their mistakes, which were piling up the bill using the AWS Well-Architected Framework.

Solution	What We Did	Outcome
Implemented tagging enforcement	AWS Config + Terraform to enforce `Owner`, `Project`, `Env` tags	Clean billing reports, ownership clarity
IAM least privilege model	Scoped IAM roles, MFA, and audit logging	No more over-permissioned users, improved security posture
Introduced auto-scheduling	Scheduled dev EC2 and RDS to stop after 10 PM	Possible saving of ₹35,000/month
Set up cross-region backup	Deployed cross-region S3 replication + RDS snapshots	Reduced RTO to <15 mins
IaC with Terraform	Replaced manual setup with Terraform modules and GitOps via Atlantis	Reliable, repeatable infra with audit trails
Resizing Instances	Moved Intel-based processors to Gravitron based processors in EC2	Slashed the EC2 billing by at least 18% less than X86-based processors

That helped to calm the chaos and set up a cost-effective and maintainable setup to calm the nerves of the higher management.

The Pillars of the (WAFR), & Why deal with them

Operational Excellence

It's advised to organize small teams following the modular approach, focusing on each module of the business aspect, aggregating to the more sustained and efficient final business outcome.

To gain the top view of what's going on, leverage the observability to make an informed decision & take prompt actions when the business outcomes are at risk.

Reducing the operational burden using the AWS managed services wherever possible.

Security

It encompasses the ability to protect the data, systems, and assets to take advantage of AWS to improve security.

It's advisable to follow the Least Privilege Principle for all users, ensuring that the users don't have access to what they don't require. eg, Interns having access to prod S3 Buckets.

Should incorporate the proper tagging of all the resources to get proper tracking of the creation and maintainability point of view of the resources.

Reliability

It deals with the ability of the workload to perform its intended function correctly & consistently when it is expected to.

It's better to have multiple smaller resources than to have one larger resource to reduce the impact of Single Point Failure (SPF).

Testing out the Disaster Recovery Strategies by simulating a failover scenario that has happened before, or a chance to happen when the worst-case scenario hits. It helps to expose the dead roots of the DR plans and can be worked upon to make them more efficient.

Cost Optimization

It deals with the ability to run the systems to deliver a business value proposition at the lowest price point.

Adapting to the consumption model, if targeting to exist for a longer duration, then adapt the long-duration savings plan to save tons of cost.

Scheduling the resources to go offline during the weekends and the off-hours on the weekdays helps in saving the unnecessary consumption of the resources.

Sustainability

It focuses on the environmental impacts, especially dealing with the envionmental impacts, especially energy consumption and efficiency.

Right-sizing of the instances plays a critical role in adapting for sustainability goals, wherein we can minimize idle resources, processing, and storage to reduce the total energy required to power your workload.

Making use of the managed Services to minimize the impact.

eg> making use of the S3 Bucket policies to move the data to the infrequent access tier if that comes within the budget constraints.

Performance Efficiency

Ability to use the cloud resources efficiently to meet performance requirements

Can make use of the serverless architectures, removing the need to maintain a physical server for the short-lived processes.

Going Global for the workloads using the AWS regions for lower latency and a better experience for the customer at a minimal cost.

Terraform to the Rescue: Snippets We Used

Prerequisites:

Basic Understanding of Terraform
Already setted up an AWS Provisioner
Refer to the Repository for a detailed explanation

Terraform to Configure AWS to Periodically Start an EC2 Instance


resource "aws_scheduler_schedule" "my_scheduler" {
  name       = "my-scheduler"
  group_name = "default"

  flexible_time_window {
    mode = "OFF"
  }

  # Run it each Monday
  schedule_expression = "cron(0 9 ? * MON *)"

  target {
    # This indicates that the event should be send to EC2 API and startInstances action should be triggered
    arn      = "arn:aws:scheduler:::aws-sdk:ec2:startInstances"
    role_arn = aws_iam_role.my_role.arn
    
    # And this block will be passed to startInstances API
    input = jsonencode({
      InstanceIds = [
        aws_instance.my_ec2.id
      ]
    })
  }

Terraform to Configure and Enforce the Tags on the Created Resources

Create the SNS Topic:

Here we must confirm our Subscription on the given email!

resource "aws_sns_topic" "config_alerts" {
  name = "config-noncompliance-alerts"
}

resource "aws_sns_topic_subscription" "email_alert" {
  topic_arn = aws_sns_topic.config_alerts.arn
  protocol  = "email"
  endpoint  = "[email protected]" # Replace with your email
}

Add the AWS Config Rule

resource "aws_config_config_rule" "required_tags" {
  name = "required-tags"
  source {
    owner             = "AWS"
    source_identifier = "REQUIRED_TAGS"
  }

  input_parameters = jsonencode({
    Tag1Key = "Owner"
    Tag2Key = "Project"
    Tag3Key = "Environment"
  })
  scope {
    compliance_resource_types = ["AWS::EC2::Instance"]
  }
  depends_on = [aws_sns_topic_subscription.email_alert]
}

CloudWatch Event Rule to Detect Non-Compliance

resource "aws_cloudwatch_event_rule" "config_noncompliant" {
  name        = "config-rule-noncompliance"
  description = "Trigger when AWS Config rule becomes NON_COMPLIANT"

  event_pattern = jsonencode({
    source       = ["aws.config"]
    "detail-type": ["Config Rules Compliance Change"]
    detail       = {
      newEvaluationResult = {
        complianceType = ["NON_COMPLIANT"]
      }
      configRuleName = [aws_config_config_rule.required_tags.name]
    }
  })
}

Add the CloudWatch Event Target

resource "aws_cloudwatch_event_target" "sns_target" {
  rule      = aws_cloudwatch_event_rule.config_noncompliant.name
  target_id = "send-to-sns"
  arn       = aws_sns_topic.config_alerts.arn
}

Terraform Script for AWS Monthly Budgets

# Setting up a cost budget in AWS using Terraform

# Define a cost budget using AWS Budgets
resource "aws_budgets_budget" "monthly_budget" {
  name         = "monthly-cost-budget" 
  budget_type  = "COST"               
  limit_amount = "2000"                
  limit_unit   = "INR"                 
  time_unit    = "MONTHLY"            

  # Optional: Apply budget only to a specific AWS linked account (in case of consolidated billing)
  cost_types {
    include_credit             = true 
    include_discount           = true 
    include_other_subscription = true 
    include_recurring          = true 
    include_refund             = true 
    include_subscription       = true 
    include_support            = true 
    include_tax                = true 
    include_upfront            = true 
  }

  # Define the notification settings
  notification {
    comparison_operator        = "GREATER_THAN" 
    threshold                  = 90           
    threshold_type             = "PERCENTAGE"  
    notification_type          = "ACTUAL"      
    subscriber_email_addresses = "[email protected]"
  }
}

Conclusion

The AWS Well-Architected Framework gave us a clear path to build smarter, run safer, and spend better. It’s like having a blueprint that grows with your business.

If you're building on AWS, don’t just build; build it well.

Want to secure Your Terraform Infrastructure ?

Learn how implementing tfsec, Checkov, and TFLint can significantly enhance your security ?!

EzyInfra.dev – Expert DevOps & Infrastructure consulting! We help you set up, optimize, and manage cloud (AWS, GCP) and Kubernetes infrastructure—efficiently and cost-effectively. Need a strategy? Get a free consultation now!

Share this post

Want to discuss about DevOps practices, Infrastructure Audits or Free consulting for your AWS Cloud?

Prasanna would be glad to jump into a call