Research · 2026-04-29

Capability-Scoped Runtimes for Desktop Agents: Risk-Gated Execution, Durable Tasks, and Reversibility-Aware Containment

Euraika Labs Research Group

Abstract

Desktop agents are advancing on planning and perception while remaining fragile on execution. We argue this is a runtime problem, not a planner problem, and present pan-agent — a managed desktop-agent runtime that composes three load-bearing mechanisms: a pre-execution risk-gated classifier that intercepts dangerous tool calls, a durable taskrunner that survives crashes and zombie processes, and a reversibility-aware containment layer that records each side-effect with a typed receipt and wires it to per-tool reversers backed by capability-probed filesystem snapshots. The contribution is framed around a taxonomy that classifies every action as local-reversible, runtime-compensable, or externally-irreversible, and a four-experiment pre-registered evaluation plan over OSWorld, OS-Harm, RedTeamCUA, and a long-horizon crash generator. Implementation is open source — approximately 32 kLOC of Go, MIT-licensed.

The runtime is the right place to add the guarantees the planner cannot make

Computer-use agents have moved from research prototypes to production previews in fewer than thirty months. Benchmarks like OSWorld now ship hundreds of execution-graded tasks; adversarial benchmarks like OS-Harm and RedTeamCUA document the speed at which the same agents can be coerced — by deliberate misuse, by indirect prompt injection, or by mere model misjudgment — into unsafe behavior.

Less attention has been paid to the layer between the planner and the operating system. In most current designs, that layer is functionally absent. A tool call decided by a language model is dispatched directly to exec-class APIs; if the call is harmful, the harm lands. If the process crashes mid-task, the state is left half-written. If the user later realizes the agent took the wrong action, there is — in the general case — no git revert for the operating system.

This paper argues the runtime is the right place to add the guarantees the planner cannot make, and presents pan-agent, an open-source desktop-agent runtime built around three load-bearing mechanisms.

The Local State Fallacy

A naïve framing would claim that local rollback makes dangerous actions safely attemptable: the agent tries something risky; if it goes wrong, the runtime restores the snapshot. This framing is wrong, and importantly so. Many of the most consequential desktop actions — sending an email, posting to Slack, calling a billing API, exfiltrating a secret to a remote server, triggering an account lockout — leave zero residual filesystem delta and are not reversible by any local mechanism. A snapshot-and-rollback runtime that pretends otherwise builds a fraudulent safety claim on top of a real engineering achievement.

We therefore treat reversibility as a capability scope the runtime declares per action. Every action falls into exactly one of three tiers:

local-reversible — effects bounded to the local filesystem, registry, or process tree; rollback is principled and complete.
runtime-compensable — effects partially observable to the runtime (e.g., a temp file uploaded to a known-cooperative cache); rollback is partial; compensation must be explicitly designed and measured.
externally-irreversible — effects exit the runtime's jurisdiction (network egress, third-party API mutations, UI-driven send actions); rollback is a category error.

The runtime makes claims appropriate to each tier — and refuses to make claims outside the tier where they apply.

Three composing layers

A risk-gated execution layer — a 108-pattern regex classifier, maintained as two severity-level collections (DangerousPatterns and CatastrophicPatterns) and applied via tool-type dispatch. Each pattern resolves to one of Safe / Dangerous / Catastrophic. Catastrophic is rejected outright; Dangerous requires a UI round-trip; Safe proceeds.
A durable taskrunner — a CAS state machine (queued → running → {succeeded, failed, paused, zombie, cancelled}) with heartbeat-based zombie detection, pause/resume via step memoization, and per-task budget and approval enforcement.
A reversibility-aware containment layer — an action journal that records typed receipts with a ReversalStatus (reversible | audit_only | reversed_externally | irrecoverable), a per-tool reverser registry exposed at /v1/recovery/*, and a capability-probed snapshot subsystem with three tiers (SnapshotTier = cow | copyfs | audit_only) plus per-platform capture implementations on Linux, macOS, and a stub on Windows.

The combination is straightforward to describe but, we argue, novel in composition. Existing transactional tool-use systems address atomic semantics at the tool layer; existing long-horizon GUI agents address recovery at the agent-loop layer. Neither operates at the OS-level runtime; neither classifies actions by their reversibility scope before execution; and neither composes recovery with risk-gating and durability into a single managed substrate.

Contributions

A taxonomy of action reversibility — local-reversible, runtime-compensable, externally-irreversible — presented as both an analytical lens and the basis for a routing primitive.
A composed managed-runtime architecture integrating risk-gated execution, durable task management, and reversibility-aware containment, with explicit interfaces between the three layers.
A pre-registered, falsifiable, four-experiment evaluation plan over harmonized subsets of OSWorld, OS-Harm, RedTeamCUA, and a long-horizon crash generator.
An open-source artifact — pan-agent, approximately 32 kLOC of Go (24.7k non-test, 7.1k test), MIT license, at github.com/Euraika-Labs/pan-agent.

Honest scope

We make no claim that the runtime makes desktop AI control safe in any general sense. We do claim — and propose to demonstrate — that inside a declared reversibility envelope, the integrated runtime expands the safety/utility frontier in a way no single layer achieves alone, and that outside that envelope, the runtime's job is to refuse to make a claim it cannot honor.

The paper invites independent replication, attack, and extension. The artifact is open. The evaluation is pre-registered. The falsification bars are written down. We will report results honestly, including negative ones.

The full paper — including related work, the formal architecture, the evaluation methodology, threats to validity, and the complete reference list — is available as a PDF below.

PDF →

← All research