With cloud-native designs relying more and more on APIs to enable microservices, applications, and third-party integrations to communicate, these APIs have become high-priority attack vectors. This paper suggests an adaptive security model specifically designed for cloud-based API environments, highlighting real-time threat detection, behavioral anomaly inspection, and policy-as-code enforcement in multi-tenant, serverless, and containerized settings. Due to the changing and intricate nature of cloud infrastructures like Amazon Web Services (AWS), static security controls are not sufficient anymore. To mitigate such shortcomings, we present a dynamic policy management system using reinforcement learning (RL). We make use of Proximal Policy Optimization (PPO) and Deep Q-Networks (DQN) specifically for optimizing and improving continuously, as well as adapting security controls in the form of firewall rules and Identity and Access Management (IAM) policies. Through the use of GuardDuty cloud monitoring data (e.g., AWS CloudTrail logs, GuardDuty network traffic data, and threat intelligence feeds), our RL-based system learns in real-time to provide maximum threat mitigation with minimal resource effect. Experimental results verify that GuardDuty, using adaptive policies, has a much higher intrusion detection rate (92% vs. 82% using static policies) and minimizes incident detection and response time by 58%. In addition, the system is highly compliant with security policies while not wasting resources. These results highlight the high potential for reinforcement learning as an effective tool for dynamic management of cloud security policies.