AI agents are quietly generating chaos engineering failures enterprises don’t track yet

There is a category of production incident that engineering teams are not tracking yet — because it doesn’t fit any existing postmortem template.

The agent initiated an action. The action was technically correct given the agent’s context. The context was incomplete. The infrastructure cascaded. And, by the time the incident review happened, three teams were arguing about whether it was an agent failure or an infrastructure failure, because the frameworks for thinking about these two things have never been connected.

The scale of this exposure is no longer theoretical.

Image provided by author.

Where language models help, and exactly where they fail

Several engineering organizations are now running experiments using large language models (LLMs) to generate chaos hypotheses from dependency graphs and incident postmortem corpora. The results are directionally useful. Language models surface plausible failure modes that experienced SREs recognize as worth testing, and they generate hypotheses faster than manual processes, particularly when working from rich postmortem history.

The limit is dependency graph staleness, and it is a hard limit. A hypothesis generated from a graph that doesn’t reflect last month’s service extraction, or a new shared library dependency added two sprints ago, will propose an experiment with incorrect blast radius assumptions. The problem is not that the model makes a mistake, it’s that the model doesn’t know it’s making one. It will be confidently incorrect about a system boundary that no longer exists, and in chaos engineering, confident incorrectness in production means an unplanned outage.

Stanford’s

AI agents are quietly generating chaos engineering failures enterprises don’t track yet

Where language models help, and exactly where they fail

¿Qué opinas?

Escrito por Redacción - El Semanal

Deja una respuestaCancelar la respuesta

Mantener pulsado el botón ‘123’ es el mejor atajo de Android que no conocía y ahora no puedo dejar de usar

EEUU quiere construir el mayor centro de datos del mundo y muchos expertos temen problemas

primero el Pentágono, luego el resto del mundo

El médico estético Antoni Calmon fallece a los 41 años

¿De qué va ‘Los testamentos’?: así es la secuela de ‘El cuento de la criada’, que ya puedes ver en Disney +

Karl Lagerfeld redefine el prêt-à-porter en su desfile otoño 2026.

La startup mexicana Monthly está redefiniendo las finanzas • Contxto

El talento femenino impulsa a las startups colombianas en los Aurora Tech Award 2026 • Contxto

SPC incorpora a su catálogo B2B la plataforma Cyviz

detectan si llevas el cinturón, si estás con el móvil o si haces giros ilegales

Where language models help, and exactly where they fail

¿Qué opinas?

Deja una respuestaCancelar la respuesta

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections

Suscríbete a El Semanal