How a SaaS platform cut infrastructure costs by 40% while improving response times

Breaking the performance vs cost trade-off: A SaaS infrastructure optimization case study
Infrastructure optimization often feels like a zero-sum game between performance and cost. Faster systems cost more, right? A recent project with a European CRM platform proved this assumption wrong, achieving 40% cost reduction while dramatically improving response times.
The challenge: scaling pains at 50,000 users
The platform had grown to 50,000 active users processing 2.3 million daily API requests, but success brought new problems. Monthly infrastructure costs had reached €18,000, yet performance was deteriorating:
- Average database query time: 450ms
- P95 response times: 2.8 seconds during peak hours
- Customer complaints about slow loading increasing weekly
Their traditional infrastructure approach consisted of three oversized virtual machines, a monolithic database server, and basic load balancing. Traffic analysis showed 78% of daily load concentrated during European business hours (9 AM to 6 PM CET), yet infrastructure remained fully provisioned 24/7.
The engineering team had attempted vertical scaling twice in six months, increasing CPU and RAM allocation. Each time, costs rose while performance problems persisted. This led to the common misconception that performance improvements require infrastructure investment.
Technical audit findings: efficiency gaps everywhere
A comprehensive infrastructure audit revealed multiple performance bottlenecks that were simultaneously driving unnecessary costs.
Database inefficiencies
Query pattern analysis exposed severe N+1 problems: 73% of queries followed inefficient patterns hitting identical tables repeatedly. A single user dashboard load generated 47 individual database queries, when 3 optimized queries could deliver the same data.
The database server, provisioned with 32 CPU cores and 128GB RAM, was CPU-bound due to query inefficiency rather than data volume constraints.
Application server waste
Application servers averaged 12% CPU utilization while consuming full allocated resources. The load balancer distributed traffic evenly across three identical instances, ignoring actual load patterns and time-based traffic variations.
Caching underutilization
Redis cache hit rates sat at 23%, far below the 80%+ typical in optimized systems. The configuration used default settings, treating Redis as simple key-value storage and missing opportunities for query result caching and session management.
Synchronous bottlenecks
Network analysis revealed synchronous external API calls during user requests, adding 200-800ms latency to operations that should run as asynchronous background jobs.
Strategic approach: simultaneous optimization
Rather than treating performance and cost as competing priorities, we identified three areas where improvements would benefit both: query optimization, intelligent caching, and demand-based scaling.
Database query optimization
Addressed N+1 patterns through eager loading and query consolidation. This approach reduces database server load and response times while potentially allowing resource downsizing.
Implementation involved replacing inefficient query patterns with batch operations and strategic JOIN statements. The typical user dashboard query transformation:
-- Original: 47 separate queries
SELECT * FROM users WHERE id = ?;
SELECT * FROM projects WHERE user_id = ?;
-- Repeated for each related entity...
-- Optimized: 3 consolidated queries
SELECT u.*, p.*, t.*, COUNT(c.id) as comment_count
FROM users u
LEFT JOIN projects p ON u.id = p.user_id
LEFT JOIN tasks t ON p.id = t.project_id
LEFT JOIN comments c ON t.id = c.task_id
WHERE u.id = ? AND p.status = 'active'
GROUP BY u.id, p.id, t.id;
Multi-tier caching strategy
Implemented comprehensive Redis caching to reduce database load and external service costs:
- Query result caching: 15-minute TTL for user-specific data, 1-hour TTL for shared reference data
- API response caching: 30-minute TTL for external service calls with asynchronous refresh
- Session caching: User authentication and preference data to eliminate repeated database lookups
Cache keys used query hashing to ensure identical queries served from cache rather than hitting the database repeatedly.
Horizontal auto-scaling implementation
Replaced continuously running oversized instances with demand-responsive scaling:
Auto Scaling Configuration:
MinSize: 1
MaxSize: 4
DesiredCapacity: 2
ScaleUpPolicy:
MetricName: CPUUtilization
Threshold: 70
ComparisonOperator: GreaterThanThreshold
EvaluationPeriods: 2
Period: 300
ScaleDownPolicy:
MetricName: CPUUtilization
Threshold: 30
ComparisonOperator: LessThanThreshold
EvaluationPeriods: 3
Period: 300
Instance sizing changed from three c5.2xlarge instances (8 vCPU, 16GB RAM each) to dynamic c5.large instances (2 vCPU, 4GB RAM each) scaling with actual demand.
Asynchronous processing architecture
Moved external API calls, report generation, and email operations to background job processing using dedicated worker instances. This eliminated 200-800ms latency from user-facing operations while enabling independent scaling of background tasks.
Measurable results and impact
Performance improvements
- Database query times: 450ms → 89ms average (80% reduction)
- API response times: 2.8s → 1.1s p95 (61% improvement)
- Cache hit rates: 23% → 87%
- Page load times: 4.2s → 2.1s average
- Time to first byte: 890ms → 312ms average
Cost optimization outcomes
Monthly infrastructure spending decreased from €18,000 to €11,000 (39% reduction):
- Application servers: 64% cost reduction through rightsizing and auto-scaling
- Database server: 47% cost reduction via optimization and downsizing
- External API costs: 31% reduction through caching and request optimization
Operational efficiency gains
- System uptime: Maintained 99.97% availability
- Resource utilization: Improved from 12% to 45-65% CPU utilization on properly sized instances
- Manual intervention: 40% reduction in on-call incidents due to auto-scaling handling traffic spikes
- Peak capacity: Successfully handled 3.7M daily API requests (60% increase) three months post-optimization
Key lessons and recommendations
Implementation insights
Gradual migration approach: Implementing all database optimizations simultaneously made measuring individual impact difficult. A phased approach provides better insights into optimization value.
Cache warming strategies: Initial cache miss rates during traffic spikes caused temporary performance issues. Implementing cache pre-warming for predictable traffic patterns would improve consistency.
Monitoring and alerting: Enhanced monitoring during optimization phases helps identify unexpected behavior patterns early.
Strategic takeaways
- Profile before scaling: Performance bottlenecks often stem from inefficiency rather than resource constraints
- Optimize for usage patterns: Match infrastructure provisioning to actual demand cycles
- Cache strategically: High cache hit rates eliminate expensive operations and reduce external service dependencies
- Embrace async processing: Remove blocking operations from user request paths
- Monitor resource utilization: Low utilization on expensive resources indicates optimization opportunities
Conclusion
This optimization project demonstrated that performance and cost efficiency aren't mutually exclusive goals. By focusing on query optimization, intelligent caching, and demand-responsive scaling, the platform achieved significant improvements in both areas.
The key insight: before adding more infrastructure, optimize existing resource utilization. Many performance problems stem from inefficient patterns rather than insufficient capacity. Addressing these inefficiencies often enables better performance at lower cost.
For engineering teams facing similar challenges, start with comprehensive profiling to identify actual bottlenecks before implementing solutions. The most expensive infrastructure changes may not address the root causes of performance issues.
Originally published on binadit.com





