Skip to main content

Command Palette

Search for a command to run...

How a SaaS platform cut infrastructure costs by 40% while improving response times

Published
5 min read
How a SaaS platform cut infrastructure costs by 40% while improving response times
B
We design, manage and optimize infrastructure for businesses that depend on uptime, performance and reliability.

Breaking the performance vs cost trade-off: A SaaS infrastructure optimization case study

Infrastructure optimization often feels like a zero-sum game between performance and cost. Faster systems cost more, right? A recent project with a European CRM platform proved this assumption wrong, achieving 40% cost reduction while dramatically improving response times.

The challenge: scaling pains at 50,000 users

The platform had grown to 50,000 active users processing 2.3 million daily API requests, but success brought new problems. Monthly infrastructure costs had reached €18,000, yet performance was deteriorating:

  • Average database query time: 450ms
  • P95 response times: 2.8 seconds during peak hours
  • Customer complaints about slow loading increasing weekly

Their traditional infrastructure approach consisted of three oversized virtual machines, a monolithic database server, and basic load balancing. Traffic analysis showed 78% of daily load concentrated during European business hours (9 AM to 6 PM CET), yet infrastructure remained fully provisioned 24/7.

The engineering team had attempted vertical scaling twice in six months, increasing CPU and RAM allocation. Each time, costs rose while performance problems persisted. This led to the common misconception that performance improvements require infrastructure investment.

Technical audit findings: efficiency gaps everywhere

A comprehensive infrastructure audit revealed multiple performance bottlenecks that were simultaneously driving unnecessary costs.

Database inefficiencies

Query pattern analysis exposed severe N+1 problems: 73% of queries followed inefficient patterns hitting identical tables repeatedly. A single user dashboard load generated 47 individual database queries, when 3 optimized queries could deliver the same data.

The database server, provisioned with 32 CPU cores and 128GB RAM, was CPU-bound due to query inefficiency rather than data volume constraints.

Application server waste

Application servers averaged 12% CPU utilization while consuming full allocated resources. The load balancer distributed traffic evenly across three identical instances, ignoring actual load patterns and time-based traffic variations.

Caching underutilization

Redis cache hit rates sat at 23%, far below the 80%+ typical in optimized systems. The configuration used default settings, treating Redis as simple key-value storage and missing opportunities for query result caching and session management.

Synchronous bottlenecks

Network analysis revealed synchronous external API calls during user requests, adding 200-800ms latency to operations that should run as asynchronous background jobs.

Strategic approach: simultaneous optimization

Rather than treating performance and cost as competing priorities, we identified three areas where improvements would benefit both: query optimization, intelligent caching, and demand-based scaling.

Database query optimization

Addressed N+1 patterns through eager loading and query consolidation. This approach reduces database server load and response times while potentially allowing resource downsizing.

Implementation involved replacing inefficient query patterns with batch operations and strategic JOIN statements. The typical user dashboard query transformation:

-- Original: 47 separate queries
SELECT * FROM users WHERE id = ?;
SELECT * FROM projects WHERE user_id = ?;
-- Repeated for each related entity...

-- Optimized: 3 consolidated queries
SELECT u.*, p.*, t.*, COUNT(c.id) as comment_count
FROM users u 
LEFT JOIN projects p ON u.id = p.user_id 
LEFT JOIN tasks t ON p.id = t.project_id
LEFT JOIN comments c ON t.id = c.task_id
WHERE u.id = ? AND p.status = 'active'
GROUP BY u.id, p.id, t.id;

Multi-tier caching strategy

Implemented comprehensive Redis caching to reduce database load and external service costs:

  • Query result caching: 15-minute TTL for user-specific data, 1-hour TTL for shared reference data
  • API response caching: 30-minute TTL for external service calls with asynchronous refresh
  • Session caching: User authentication and preference data to eliminate repeated database lookups

Cache keys used query hashing to ensure identical queries served from cache rather than hitting the database repeatedly.

Horizontal auto-scaling implementation

Replaced continuously running oversized instances with demand-responsive scaling:

Auto Scaling Configuration:
  MinSize: 1
  MaxSize: 4
  DesiredCapacity: 2
  ScaleUpPolicy:
    MetricName: CPUUtilization
    Threshold: 70
    ComparisonOperator: GreaterThanThreshold
    EvaluationPeriods: 2
    Period: 300
  ScaleDownPolicy:
    MetricName: CPUUtilization  
    Threshold: 30
    ComparisonOperator: LessThanThreshold
    EvaluationPeriods: 3
    Period: 300

Instance sizing changed from three c5.2xlarge instances (8 vCPU, 16GB RAM each) to dynamic c5.large instances (2 vCPU, 4GB RAM each) scaling with actual demand.

Asynchronous processing architecture

Moved external API calls, report generation, and email operations to background job processing using dedicated worker instances. This eliminated 200-800ms latency from user-facing operations while enabling independent scaling of background tasks.

Measurable results and impact

Performance improvements

  • Database query times: 450ms → 89ms average (80% reduction)
  • API response times: 2.8s → 1.1s p95 (61% improvement)
  • Cache hit rates: 23% → 87%
  • Page load times: 4.2s → 2.1s average
  • Time to first byte: 890ms → 312ms average

Cost optimization outcomes

Monthly infrastructure spending decreased from €18,000 to €11,000 (39% reduction):

  • Application servers: 64% cost reduction through rightsizing and auto-scaling
  • Database server: 47% cost reduction via optimization and downsizing
  • External API costs: 31% reduction through caching and request optimization

Operational efficiency gains

  • System uptime: Maintained 99.97% availability
  • Resource utilization: Improved from 12% to 45-65% CPU utilization on properly sized instances
  • Manual intervention: 40% reduction in on-call incidents due to auto-scaling handling traffic spikes
  • Peak capacity: Successfully handled 3.7M daily API requests (60% increase) three months post-optimization

Key lessons and recommendations

Implementation insights

  1. Gradual migration approach: Implementing all database optimizations simultaneously made measuring individual impact difficult. A phased approach provides better insights into optimization value.

  2. Cache warming strategies: Initial cache miss rates during traffic spikes caused temporary performance issues. Implementing cache pre-warming for predictable traffic patterns would improve consistency.

  3. Monitoring and alerting: Enhanced monitoring during optimization phases helps identify unexpected behavior patterns early.

Strategic takeaways

  • Profile before scaling: Performance bottlenecks often stem from inefficiency rather than resource constraints
  • Optimize for usage patterns: Match infrastructure provisioning to actual demand cycles
  • Cache strategically: High cache hit rates eliminate expensive operations and reduce external service dependencies
  • Embrace async processing: Remove blocking operations from user request paths
  • Monitor resource utilization: Low utilization on expensive resources indicates optimization opportunities

Conclusion

This optimization project demonstrated that performance and cost efficiency aren't mutually exclusive goals. By focusing on query optimization, intelligent caching, and demand-responsive scaling, the platform achieved significant improvements in both areas.

The key insight: before adding more infrastructure, optimize existing resource utilization. Many performance problems stem from inefficient patterns rather than insufficient capacity. Addressing these inefficiencies often enables better performance at lower cost.

For engineering teams facing similar challenges, start with comprehensive profiling to identify actual bottlenecks before implementing solutions. The most expensive infrastructure changes may not address the root causes of performance issues.

Originally published on binadit.com

More from this blog

B

binadit

42 posts