Dynamic Cost-Aware Language Models: A Real-Time Framework for Optimizing Cloud Resource Recommendations
Abstract
This research introduces a dynamic framework for optimizing cloud resource recommendations by integrating real-time cost metrics into language model outputs. The framework employs a three-tier architecture comprising a cost-aware attention mechanism that incorporates AWS Pricing API signals into the transformer architecture, a dynamic programming-based resource allocation optimizer that ensures Pareto efficiency in performance-cost trade-offs, and a reinforcement learning layer that refines recommendations based on actual usage patterns. Extensive evaluations on AWS workloads demonstrate a 31% reduction in cloud costs while maintaining performance requirements. Leveraging a modified boto3 SDK with custom middleware for real-time metric collection and a caching strategy that reduces API latency by 76%, the system was tested over six months on 150 AWS account configurations, proving its scalability, robustness, and effectiveness in real-world scenarios.