The Challenge
Our client, an innovative FinOps SaaS startup, was building a platform to use Machine Learning to recommend cloud cost optimizations for enterprise clients. However, they were blocked by the sheer complexity of data ingestion.
To make intelligent recommendations, their platform needed real-time read access to client AWS, Azure, GCP, and DigitalOcean accounts.
- Data Standardization: Every cloud provider formats billing data, CPU utilization, and instance metadata completely differently.
- Scale: Ingesting this data across hundreds of connected client cloud accounts resulted in massive, unpredictable bursts of data.
- Security: Handling cross-account API keys and IAM roles securely was a logistical nightmare.
Our Approach
We built a highly scalable, serverless ingestion engine capable of normalizing multi-cloud data autonomously.
1. Serverless Ingestion Pipelines
We utilized AWS EventBridge and AWS Lambda to create a robust, asynchronous queuing system. The system runs scheduled jobs that securely assume temporary cross-account IAM roles (for AWS) or generate short-lived OAuth tokens (for GCP/Azure) to pull Cost and Usage Reports (CUR).
2. Real-Time Data Normalization Layer
We built a custom transformation layer using Python. This layer takes heavily fragmented provider data—like AWS EC2 usage versus GCP Compute Engine usage—and maps it to a single, unified GraphQL schema. This guaranteed that the proprietary Machine Learning models received clean, standardized data regardless of the cloud origin.
3. Highly Scalable Storage
The normalized data is streamed into Amazon Timestream, a purpose-built time-series database. This allows the AI models to rapidly query historical resource utilization to identify abandoned resources, over-provisioned VMs, and optimal reserved instance purchases.
The Result
The backend unlocked the startup's ability to onboard large enterprise clients rapidly.
- Unified Visibility: Users get a single dashboard translating complex multi-cloud billing into plain English.
- Massive Scale: The serverless architecture scales automatically to process terabytes of billing data seamlessly at the end of every month.
- Impactful AI: Thanks to clean, standardized data, the AI engine successfully identified over $1M+ in actionable cloud savings for the platform's beta cohort.