Research · 2026-04-29
Capability-Scoped Runtimes for Desktop Agents: Risk-Gated Execution, Durable Tasks, and Reversibility-Aware Containment
Euraika Labs Research Group
Abstract
Desktop agents are advancing on planning and perception while remaining fragile on execution. We argue this is a runtime problem, not a planner problem, and present pan-agent — a managed desktop-agent runtime that composes three load-bearing mechanisms: a pre-execution risk-gated classifier that intercepts dangerous tool calls, a durable taskrunner that survives crashes and zombie processes, and a reversibility-aware containment layer that records each side-effect with a typed receipt and wires it to per-tool reversers backed by capability-probed filesystem snapshots. The contribution is framed around a taxonomy that classifies every action as local-reversible, runtime-compensable, or externally-irreversible, and a four-experiment pre-registered evaluation plan over OSWorld, OS-Harm, RedTeamCUA, and a long-horizon crash generator. Implementation is open source — approximately 32 kLOC of Go, MIT-licensed.
The runtime is the right place to add the guarantees the planner cannot make
Computer-use agents have moved from research prototypes to production previews in fewer than thirty months. Benchmarks like OSWorld now ship hundreds of execution-graded tasks; adversarial benchmarks like OS-Harm and RedTeamCUA document the speed at which the same agents can be coerced — by deliberate misuse, by indirect prompt injection, or by mere model misjudgment — into unsafe behavior.
Less attention has been paid to the layer between the planner and the
operating system. In most current designs, that layer is functionally absent.
A tool call decided by a language model is dispatched directly to exec-class
APIs; if the call is harmful, the harm lands. If the process crashes mid-task,
the state is left half-written. If the user later realizes the agent took the
wrong action, there is — in the general case — no git revert for the
operating system.
This paper argues the runtime is the right place to add the guarantees the planner cannot make, and presents pan-agent, an open-source desktop-agent runtime built around three load-bearing mechanisms.
The Local State Fallacy
A naïve framing would claim that local rollback makes dangerous actions safely attemptable: the agent tries something risky; if it goes wrong, the runtime restores the snapshot. This framing is wrong, and importantly so. Many of the most consequential desktop actions — sending an email, posting to Slack, calling a billing API, exfiltrating a secret to a remote server, triggering an account lockout — leave zero residual filesystem delta and are not reversible by any local mechanism. A snapshot-and-rollback runtime that pretends otherwise builds a fraudulent safety claim on top of a real engineering achievement.
We therefore treat reversibility as a capability scope the runtime declares per action. Every action falls into exactly one of three tiers:
local-reversible— effects bounded to the local filesystem, registry, or process tree; rollback is principled and complete.runtime-compensable— effects partially observable to the runtime (e.g., a temp file uploaded to a known-cooperative cache); rollback is partial; compensation must be explicitly designed and measured.externally-irreversible— effects exit the runtime's jurisdiction (network egress, third-party API mutations, UI-driven send actions); rollback is a category error.
The runtime makes claims appropriate to each tier — and refuses to make claims outside the tier where they apply.
Three composing layers
-
A risk-gated execution layer — a 108-pattern regex classifier, maintained as two severity-level collections (
DangerousPatternsandCatastrophicPatterns) and applied via tool-type dispatch. Each pattern resolves to one ofSafe/Dangerous/Catastrophic.Catastrophicis rejected outright;Dangerousrequires a UI round-trip;Safeproceeds. -
A durable taskrunner — a CAS state machine (
queued → running → {succeeded, failed, paused, zombie, cancelled}) with heartbeat-based zombie detection, pause/resume via step memoization, and per-task budget and approval enforcement. -
A reversibility-aware containment layer — an action journal that records typed receipts with a
ReversalStatus(reversible | audit_only | reversed_externally | irrecoverable), a per-tool reverser registry exposed at/v1/recovery/*, and a capability-probed snapshot subsystem with three tiers (SnapshotTier = cow | copyfs | audit_only) plus per-platform capture implementations on Linux, macOS, and a stub on Windows.
The combination is straightforward to describe but, we argue, novel in composition. Existing transactional tool-use systems address atomic semantics at the tool layer; existing long-horizon GUI agents address recovery at the agent-loop layer. Neither operates at the OS-level runtime; neither classifies actions by their reversibility scope before execution; and neither composes recovery with risk-gating and durability into a single managed substrate.
Contributions
- A taxonomy of action reversibility —
local-reversible,runtime-compensable,externally-irreversible— presented as both an analytical lens and the basis for a routing primitive. - A composed managed-runtime architecture integrating risk-gated execution, durable task management, and reversibility-aware containment, with explicit interfaces between the three layers.
- A pre-registered, falsifiable, four-experiment evaluation plan over harmonized subsets of OSWorld, OS-Harm, RedTeamCUA, and a long-horizon crash generator.
- An open-source artifact — pan-agent, approximately 32 kLOC of Go (24.7k non-test, 7.1k test), MIT license, at github.com/Euraika-Labs/pan-agent.
Honest scope
We make no claim that the runtime makes desktop AI control safe in any general sense. We do claim — and propose to demonstrate — that inside a declared reversibility envelope, the integrated runtime expands the safety/utility frontier in a way no single layer achieves alone, and that outside that envelope, the runtime's job is to refuse to make a claim it cannot honor.
The paper invites independent replication, attack, and extension. The artifact is open. The evaluation is pre-registered. The falsification bars are written down. We will report results honestly, including negative ones.
The full paper — including related work, the formal architecture, the evaluation methodology, threats to validity, and the complete reference list — is available as a PDF below.

