FinOps in Practice: What Actually Works When Cloud Costs Get Out of Control

Cloud cost problems rarely start with architecture. They start with operations.

Most organizations don’t have a cost problem because they chose the wrong services or instance types. They have a cost problem because no one owns the outcome. Engineers provision resources without cost visibility. Finance tracks spend but can’t influence decisions. Waste accumulates because there’s no accountability.

The pattern is consistent: tooling gets added, dashboards get built, but nothing changes. Costs continue to climb.

Fixing cloud spend is not a tooling problem. It’s an operational discipline problem.

I’ve applied the framework below across healthcare and managed services environments and documented over $127K in annual savings — from licensing audits, environment rationalization, and storage tier optimization — without changing any core architecture. The work is organizational, not technical.

The Reality of Cloud Cost Optimization

Cloud cost optimization is mostly behavioral, not technical.

If the people creating resources don’t understand cost, and the people tracking cost can’t enforce change, optimization will fail. It doesn’t matter how advanced the tooling is.

The foundation requires four things to be true simultaneously:

Visibility exists at the team level
Ownership is clearly defined
Waste has consequences
Cost is treated as an engineering metric — not a finance problem

Without all four, everything else is noise.

Phase 1: Establish Visibility

Before anything can be optimized, it has to be understood. In practice, most environments lack even basic visibility.

The common failure point is tagging.

Without consistent tagging, there is no way to attribute cost. You cannot answer basic questions like:

What does a specific system cost to run?
Which team is responsible for a spend spike?
Where is non-production spend occurring?

Once tagging is enforced and tied to accountability, cost visibility becomes actionable instead of theoretical. A useful forcing mechanism: route untagged spend to a separate cost center owned by the VP of Engineering. Nothing motivates tagging compliance faster than a CFO asking questions.

At the same time, enable anomaly detection early. Unexpected spend is common — forgotten GPU clusters, misconfigured autoscaling, runaway data transfer — and catching it quickly prevents avoidable loss.

This phase is less about optimization and more about creating a reliable view of reality. You cannot improve what you cannot see.

Phase 2: Eliminate Obvious Waste

Once visibility exists, the first meaningful reductions come from removing waste that should never have existed.

Typical patterns in almost every environment:

Waste Category	Common Cause
Idle compute running indefinitely	No lifecycle policy
Detached storage still accruing cost	Forgotten after instance termination
Snapshots with no retention limit	Default settings never changed
Temporary infrastructure never cleaned up	No ownership, no expiry
Non-production environments running 24/7	No scheduled shutdown

These are not complex problems. They are operational gaps.

Cleaning this up does not require architectural changes. It requires discipline, communication, and enforcement. This phase consistently produces fast, defensible cost reduction — typically 15–25% of total spend — with minimal risk.

Real-world example: A licensing audit in a healthcare environment identified $48,000 in annual spend on unassigned or inactive licenses — accounts that had accumulated through turnover, role changes, and service trials that were never cleaned up. In the same environment, an Azure Virtual Desktop deployment audit found a $79,000 annual cost with utilization patterns that didn’t justify the deployment; the legitimate use cases were addressed through better Conditional Access policies at a fraction of the cost. Neither required new tooling. They required visibility and the willingness to act on what the data showed.

Phase 3: Align Spend With Reality

After waste is removed, the next step is aligning pricing models with actual usage.

Cloud providers charge a premium for flexibility. Most production workloads are not fully variable — they have predictable baselines. You are paying on-demand rates for stability you already know you need.

Optimizing this layer involves:

Understanding steady-state usage — measure actual consumption patterns over 60–90 days after rightsizing
Applying commitment-based discounts where usage is stable and predictable
Maintaining flexibility for variable workloads — don’t over-commit

A critical sequencing note: commit to pricing after rightsizing, not before. A common and expensive mistake is locking in commitment-based discounts before workloads are properly sized. That locks in inefficiency at a discount — you still overpay, just less visibly.

Done correctly, this phase can reduce cost 30–70% on committed workloads without changing any architecture — only how it is paid for. Azure Reserved Instances alone offer up to 72% savings versus on-demand pricing on eligible VM families.

Phase 4: Optimize Architecture Where It Matters

The deeper savings require engineering changes. This is where effort increases and trade-offs become real.

Common high-impact areas:

Data transfer costs are frequently the hidden driver of bills that don’t respond to compute optimization. Poor resource placement, cross-AZ traffic, and uncompressed origin responses are typical culprits. Audit data movement before assuming compute is the problem.

Storage tiering is consistently overlooked. Not all data is hot. Object storage lifecycle policies and archival tiers exist for a reason — use them. Most environments keep years of data on the highest-cost tier by default. In a healthcare data environment, implementing lifecycle policies that moved clinical data older than 90 days to cold storage tiers produced $1,500 per month in immediate, ongoing savings — from a one-time configuration effort against data that had been accumulating for years.

Managed service economics cut both ways. Sometimes managed services save money once operational overhead is factored in. Sometimes they’re egregiously overpriced at scale compared to self-hosted alternatives. Evaluate each category on its own merits rather than applying a blanket policy.

Over-engineered systems carry overhead that compounds. A distributed system with more moving parts than the problem requires costs more to run and more to maintain. Simplicity is a cost optimization strategy.

These optimizations are selective, not universal. Focus effort where it materially impacts total spend, not where it’s technically interesting.

What Makes It Stick

Most cost optimization efforts fail because they are treated as one-time projects instead of ongoing operational practices. The cost comes back. Sometimes faster than it left.

Sustainable outcomes require structural change:

Teams see and own their cost — weekly visibility, not quarterly reports
Cost is reviewed regularly, not reactively after a budget breach
Optimization is built into engineering cycles — not a separate initiative
Cross-team collaboration exists to share patterns and decisions
Cost sits alongside performance and reliability as a first-class engineering metric

When cost becomes part of how systems are built and operated, improvements persist. When it doesn’t, regression is guaranteed — usually within two quarters of the project being declared complete.

Bottom Line

Cloud cost optimization is not a technical challenge. It is an operational one.

If ownership, visibility, and accountability are not in place first, no amount of tooling or architectural sophistication will produce lasting results. I’ve seen organizations spend six figures on FinOps platforms while their bill continued to grow — because the platform couldn’t fix the fact that no one was accountable for the outcome.

Get the operational foundation right. The technical optimizations are straightforward once you do.

The question to answer before starting any cost reduction effort: Who is accountable for this number, and what can they actually change? If you can’t answer that clearly, start there.

Have a cost problem that’s resisted previous optimization attempts? The cause is almost always organizational, not architectural. I’m happy to talk through it.

Tagged: FinOps cloud costs Azure cost optimization reserved instances tagging licensing

Darren Bell

Senior Cloud Architect & IT Leader

Senior Cloud Architect with 10+ years leading technology operations across healthcare, managed services, and regulated industries. Specializing in Microsoft 365, Azure, identity architecture, and IT cost optimization. I write about what it actually takes to build resilient, compliant, and cost-efficient operations.

LinkedIn → GitHub → About Me →