Recent coverage of Amazon’s internal handling of several AI-related incidents is frequently presented as a story about AI-generated code. However, the broader media reporting, and particularly the strong reactions to it, indicate a deeper issue. The situation is rapidly evolving into a discussion about how the software industry supervises and validates changes influenced by AI.
The following is a condensed timeline of the events surrounding the Amazon controversy:
- July 2025: Amazon introduced Kiro, an agentic AI-powered IDE, and reportedly set an internal objective of achieving 80% weekly engineer usage.
- Mid-December 2025: A Kiro-assisted production change reportedly triggered a 13-hour AWS Cost Explorer outage in mainland China after the tool deleted and recreated an environment.
- Late 2025: A separate incident involving Amazon Q Developer reportedly caused disruption to an internal service under similar circumstances.
- February 20, 2026: The Financial Times published an investigation into the Kiro incident. Amazon responded by describing the event as “user error” and misconfigured access controls rather than an issue caused by AI itself.
- March 5, 2026: Amazon’s retail website and shopping application were unavailable for roughly six hours due to what the company described as an erroneous software code deployment.
- March 10, 2026: Reporting by the Financial Times indicated that Amazon conducted a mandatory deep-dive review after four Sev-1 incidents occurred within a single week. An internal briefing note, later deleted, reportedly linked “Gen-AI assisted changes” to a wider trend of incidents.
Amazon’s public position has been that these incidents resulted from user error and misconfigured access controls rather than AI itself.
That explanation has nevertheless been met with widespread skepticism. However, the real lessons do not lie in the details of any single incident.
The more significant story is that software organizations are expanding their use of AI more quickly than the supervisory and validation mechanisms required to control that use.
Code assistants represent only part of this landscape. Agentic workflows, automated change recommendations, and systems capable of acting with extensive permissions within development and production environments are also involved. The underlying issue is not the presence or use of AI itself. The problem is that many organizations continue to overlook validation in favor of deployment speed.
That approach was already fragile before AI became widely integrated into development processes. With AI now embedded across the workflow, the model becomes even harder to justify and can become genuinely dangerous.
Faster software changes raise the cost of weak validation
The Amazon story matters, and so do the reactions to it, because both reflect a broader trend across the industry. Many teams are adopting AI to remove friction from software delivery. In doing so, they are discovering that some of that friction previously performed important control functions.
Code review slows development, yet it also identifies flawed assumptions. Restrictive permissions may appear inefficient during development, but they can limit the blast radius when problems occur in production. Independent testing requires time and resources, yet it reveals whether a change introduces a real issue in a live environment.
Once organizations begin using AI to accelerate code generation, propose fixes, implement infrastructure changes, or operate across systems, these controls become even more important. The question is no longer limited to whether an AI tool can generate insecure code. The central question becomes whether the organization can validate the quality and consequences of AI-influenced changes across the entire software lifecycle.
That lifecycle includes not only functionality but also reliability, availability, and security. Each of these aspects can fail when validation is treated as an afterthought.
This isn’t only about coding assistants
One risk in this debate is that “AI coding” becomes a convenient label for a much wider group of control failures. If a development team relies on AI-generated code without meaningful review, the problem is validation. If an AI agent is granted excessive permissions to modify systems or workflows, the problem is supervision. If teams rely on automated outputs without confirming what is actually running in production, the problem is assurance. None of these failures are unique to AI-generated code.
Furthermore, these failure modes are interconnected and can easily overlap. A functionality defect can develop into an availability issue. A reliability problem can create a security gap. An overly powerful agent can make a flawed recommendation, execute an incorrect action, and do so at machine speed. None of these scenarios require speculative or futuristic thinking. They arise directly from unchecked automation that grants excessive autonomy to systems while leaving oversight at pre-AI levels or even lower.
For that reason, the technology industry should resist the temptation to turn every story that mentions AI and software into another debate about whether AI is “good” or “bad” at writing code. That remains a legitimate discussion. However, the more urgent question is whether organizations have established a sufficiently strong validation layer around everything AI systems are asked to do.
The industry is trying to automate judgment without proving outcomes
Many organizations are not simply using AI to accelerate code delivery. They are doing so while compressing review capacity, stretching experienced teams more thinly, and still expecting the same or improved outcomes.
When human oversight becomes thinner at the same moment that automation and AI autonomy grow stronger, validation must become more precise to compensate. Otherwise, failures become inevitable.
Simply stating that humans remain accountable for the final decision is insufficient. Although such statements appear responsible, they often carry little practical meaning when reviewers are overloaded, permissions are too broad, or testing processes cannot keep pace with release velocity.
In software development, outcomes matter more than intentions. A rapid code suggestion that introduces a defect remains a defect. An efficient AI-assisted workflow that contributes to an outage still disrupts availability. A fully automated change that exposes an application or API to attack still creates risk. When the final result is a major global incident, efficiency in reaching that outcome offers no benefit.
This is precisely why validation is essential. Claims about increased productivity are easy to present. Claims about safety, resilience, and security require demonstrable verification.
Security teams should keep the conversation anchored in runtime reality
Although the Amazon incidents are not directly security-related, they reinforce several practical lessons for AppSec teams. As AI accelerates both the pace and scale of change, security controls must remain tightly connected to what applications and APIs actually do in operation. Aside from integrating security testing into AI assistants within IDE environments—something that Mend.io tools such as SAST and SCA can support—it remains essential to validate runtime behavior using DAST to confirm real exposure.
This is particularly important because AI-originated failures can manifest in multiple ways. AI may occasionally generate insecure code directly, but it can also significantly amplify ordinary operational mistakes.
At AI-driven speeds, identifying and anticipating every possible failure mode in advance becomes extremely difficult. As a result, greater importance must be placed on continuously validating what is real, what is reachable, and what demands immediate attention.
Automated security testing and continuous verification therefore have a clear role in the broader discussion about AI security. Their function is not to slow innovation or impose compliance burdens on developer productivity. Instead, they help ensure that software organizations remain grounded in actual operational outcomes.
Final thoughts
Amazon represents an appealing target for retrospective criticism as another example of AI enthusiasm colliding with operational reality. Yet the specific details of the story are less important than the central lesson it highlights. Situations like this can arise when supervision and validation fail to keep pace with the adoption of AI.
The problem is not limited to a single company, one coding assistant, or one category of incident. It reflects a broader industry pattern of placing excessive trust in AI outputs, recommendations, and automated actions before sufficient discipline is established around how those outputs are reviewed, tested, and constrained.
As organizations integrate AI more deeply and more widely into software creation and operations, validation must move closer to the center of development processes. Ultimately, the challenge is to ensure that software quality keeps pace with the AI-accelerated release cadence. Organizations that succeed in this balance may not attract headlines for maintaining reliability, availability, and security. However, they are far more likely to avoid appearing in reports about major outages and breaches.
If you would like to test Invicti (DAST) and Mend.io (SAST, SCA) for free, which integrate seamlessly, please leave your contact details below and the team will get in touch.







