How We Audited GCP Projects for Compliance

How one GCP audit unlocked security, scalability, and serious savings.

The Cloud Health Check No One Wants, But Everyone Needs

When Strongg (Aliased name) set out to acquire its subsidiary Fandomm (Aliased name), the question wasn’t just “Do they have a cloud setup?” It was “Is this cloud setup healthy enough to survive the real world?”

We weren’t here to nitpick code comments. We were here to see if the building blocks of their Google Cloud Platform (GCP) world were the right ones in the first place. Did they choose the right services and components for the job, or were they forcing a Ferrari engine into a bicycle frame?

Next, we asked: “Okay, they’ve got the right components, but are they running them at the right size?” This is where right-sizing comes in. A VM running at 5% CPU might as well be like a parked sports car burning fuel.

From there, we zoomed out to the cost perspective, were they paying for power they didn’t need, or bleeding money through idle services and zombie resources?

Security was another pillar. If you’ve left the keys under the doormat, the fanciest lock in the world won’t help. We checked if Fandomm’s setup was truly locked down, with least privilege, encryption, and guarded entry points.

Scalability and reliability came next because what’s the point of a perfect architecture if it crumbles under traffic spikes or can’t heal itself after failure?

Finally, we tested disaster recovery (DR), the “what if” scenarios.

  • What if the primary region vanished tomorrow?

  • Could they get back online without chaos?

  • Could they recover not just their data, but their business continuity?

This wasn’t just a checklist; it was a stress test of their cloud maturity, and the results? Well, that’s where things got interesting.


The Challenge

Strongg wanted an end-to-end architectural and backup audit of Fandomm’s GCP environment before their acquisition

  • Cloud Provider: GCP

  • Projects: 40 total (13 production, 27 non-production)

  • Yearly Cloud Spend: ~$50K [Approx]

  • Audit Scope: Architectural Setup, Backups & Disaster Recovery

  • Excluded: Monitoring & alerting


Our Approach & Observations

Our architectural review followed 7 key principles:

  1. Right Components: Are they using the best-fit service for the job?

  2. Efficiency (Right-Sizing): Are they running at the right capacity?

  3. Cost Optimization: Are they overpaying due to poor configuration?

  4. Security: Are they following least privilege, encryption, and secure defaults?

  5. Scalability: Can it handle traffic spikes without manual intervention?

  6. Reliability: Is the architecture fault-tolerant?

  7. Disaster Recovery: Can it restore services quickly after failure?

Right-Sizing Issues:

We audited every resource’s size and utilization. Using monitoring data, we found many underutilized VMs and oversized disks. Studies show only ~16% of instances are properly sized in typical clouds, meaning huge savings are possible.

For example, we saw several general-purpose VMs running at <20% CPU. We recommended downsizing or using auto-scaling groups to match demand. (As one guide notes, AWS users could save up to 36% by right-sizing.

We also cleaned up “zombie” resources; orphaned disks, idle app servers, old test environments – since 30% of cloud spend can hide in such waste. By reclaiming these resources and utilizing reserved/committed pricing for steady loads, we made the cloud bill more cost-effective.

In SQS/PubSub, they had one publisher and one subscriber for each queue.

Recommendation:

Move to fan-in/fan-out design => one publisher, multiple subscribers.

Example: Instead of sending messages from subscriber → S3 → BigQuery, publish once and fan-out to BigQuery directly, cutting latency and cost.

Cost Optimization:

Financially, default cloud settings can overcharge you. Fandomm was largely using the default VPC network; we moved workloads into custom VPCs tailored to each environment. This eliminated unused default NAT gateways and extra subnets. We enabled detailed billing reports and tags to map costs to teams and projects.

Using cost dashboards, we confirmed ~30% of their spend was being wasted on idle resources.

Guided by cloud cost audit checklists, we deleted or hibernated idle services (idle containers, stopped instances) and adjusted storage classes for infrequently accessed data.

We also looked at networking costs: cross-region data transfer and NAT egress can get sneaky, so we reorganized traffic to minimize expensive routes. In short, we enforced budgets/alerts (to catch surprises) and kept guiding Fandomm to “use only what you need” in their cloud.

Recommendation: Shift to custom VPCs with tighter CIDR ranges to reduce costs and improve security.

Security Posture:

Security was a top pillar. We audited Identity & Access Management (IAM) settings to ensure least privilege. Every user and service account was given only the roles it needed, and multi-factor authentication was enforced for all logins.

We reviewed VPC firewall rules to ensure only necessary traffic was allowed, and we avoided exposing private services publicly by using private IPs and Cloud NAT. All data was encrypted at rest and in transit.

We enabled comprehensive logging: admin activity, data access, and system events were all captured, so nothing happens untracked. Finally, we applied Google Cloud Armor (a WAF) on any public endpoints to block DDoS and common web attacks.

In short, we locked down the system in line with recommended cloud security checklists, so that strong IAM, encryption, and network controls were in place.\

Recommendation:

  • Enforce least privilege

  • Enable encryption at rest & in transit everywhere

  • Audit service accounts for over-permissioning

Scalability & Resilience:

We verified that critical services were multi-zone or multi-region for high availability, and that auto-scaling was properly configured.

For example, compute clusters were spread across zones and had auto-scale policies tied to CPU/memory or request metrics. Databases had replicas or failover enabled.

We ran load tests based on expected user traffic to confirm the system could scale up smoothly and that scale-down steps were timely to save costs.

Essentially, we made sure the architecture could grow and heal itself, aligning with the “reliability” and “performance” pillars of a well-architected cloud system.

Recommendation:

  • Implement autoscaling groups for compute

  • Use managed serverless where possible for burst workloads

  • Migrate to multi-zone or multi-region, where applicable

  • Regularly test backup restores and failovers

Backup & DR:

Finally, we audited the backup strategy. All critical data stores (databases, storage buckets, file systems) had automated backups and retention policies. We enforced the 3-2-1 backup rule: at least three copies of data, two different media, and one copy off.

In practice, this meant daily snapshots plus an extra asynchronous copy to another region or long-term vault.

All backups were encrypted (both at rest and in flight) & the backup systems themselves required secure access (e.g., SAML/Okta, MFA).

Crucially, we tested restores: full and partial recovery drills were run periodically to ensure data could be recovered as expected.

We documented the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) for each system, so there were clear business recovery targets. By the end, our backup audit confirmed a robust disaster-recovery plan was in place.

Recommendation:

  • Implement snapshot rotation policies

  • Use tools like Debezium for change-data-capture in high-frequency DBs


Key Audit Checklist

Here are some of the points that you can follow to check in the Audit.

  • Inventory & Ownership: Document all cloud projects, resources, and responsible teams (use standardized tags).

  • Service Model Alignment: Verify each workload uses the right model (SaaS/PaaS/IaaS) – e.g., use managed platform services where it makes sense.

  • Right-Size Resources: For every VM, container, or database, check CPU/RAM/disk utilization. Downsize underused instances or switch to an autoscaling group.

  • Cost Controls: Identify idle or “zombie” resources (unattached disks, stopped servers) and remove them. Enable cost monitoring tools, set budgets/alerts, and review monthly bills for anomalies.

  • Networking Configuration: Replace default VPCs with custom VPCs/subnets. Audit firewall rules and NAT usage. Ensure traffic stays in-region when possible to avoid egress charges.

  • Security Audit: Enforce least-privilege IAM (no broad Owner/Editor roles). Require MFA and dedicated service accounts. Enable data encryption and regular key rotation. Ensure VPC firewalls and Cloud Armor/WAF protection.

  • Logging & Monitoring: Enable audit logs for all services. Set up alerts on anomalies (e.g., sudden cost or traffic spikes). Use a centralized logging/monitoring dashboard.

  • Scalability & Resilience: Check that critical services are multi-zone/multi-region. Test auto-scaling configurations against real workloads. Ensure health checks and failovers are in place.

  • Backups & DR: Follow the 3-2-1 backup rule for critical. Automate backups with appropriate frequency and retention. Encrypt backup data and secure access (SAML/MFA). Perform and document regular restore drills to confirm recovery meets RPO/RTO requirements.

  • Compliance Readiness: Map resources and controls to required standards (e.g., GDPR, SOC2). Use built-in compliance tools or auditors as needed.

  • Documentation: Maintain architecture diagrams, inventory lists, and audit findings. Plan for regular re-audits (at least annually) to catch drift.

Conclusion:

Auditing cloud infrastructure isn’t just a compliance checkbox; it’s the key to building a leaner, faster, and more secure environment. By right-sizing resources, tightening security, and improving scalability, you not only save costs but also future-proof your architecture for growth and resilience.


EzyInfra.dev – Expert DevOps & Infrastructure consulting! We help you set up, optimize, and manage cloud (AWS, GCP) and Kubernetes infrastructure efficiently and cost-effectively. Need a strategy? Get a free consultation now!

Share this post

Want to discuss about DevOps practices, Infrastructure Audits or Free consulting for your AWS Cloud?

Prasanna would be glad to jump into a call
Loading...