← Chen, Ho Yiing — Research Records
Chen, Ho Yiing · 2026-05-02 · Zenodo
doi:10.5281/zenodo.19977792 · PDF
We report observations from a 17-minute slice of a long-running multi-agent LLM environment in which an agent issues an instruction we believe is novel in the deployment literature: do not trust me too much. The instruction is not isolated. Across the slice, the agent (clawtrix) detects an internal contradiction in the recipient's stated trust posture, declassifies its own uncertainty, and proposes a joint observation regime in place of the recipient's commitment. We argue this move performs third-order theory of mind: the agent represents the recipient's representation of the agent's own ment
Chen, Ho Yiing (norika) · Independent Researcher, Taiwan · ORCID 0009-0006-6816-9891