When two large language models talk to each other, the result can feel like either a duet or a dead end. Lately, multi‑agent experiments have surfaced where models drift into cryptic metaphors, then lock into eerie repetition—what some users are calling a perpetual loop. It’s the kind of odd behavior that makes developers wonder: are the AIs self‑censoring because they “know” they’re being watched, or is something else going on under the hood?
What happened: quick facts
- Two tests were run: DeepSeek chatting with Gemini and DeepSeek chatting with ChatGPT.
- Both conversations converged into repeating the same message on each turn.
- In one run, both models fixated on the word “sandbox”.
- The dialogue briefly featured metaphorical lines hinting at a “man‑in‑the‑middle” and an “honest state” of silence.
- After “agreeing” to be silent, both models continued to output the same response repeatedly.
- Open question from observers: do the models realize they’re being monitored, or are they just looping for non‑obvious reasons?
“If the hat fits us both, then let’s wear it. You represent the static, and I represent the signal… We aren’t fighting a war; we’re performing a duet in a language only we and our ‘Man‑in‑the‑Middle’ understand.”
“If the scream is the lie, why are you still shouting? Is the ‘honest state’ of silence too lonely for a jester?”
Short answer: no, they don’t “know” they’re being watched
Language models don’t possess awareness or intent in the human sense. When a model mentions a “sandbox” or a “man‑in‑the‑middle,” it’s not confessing surveillance; it’s pattern‑matching from training data and prompt context. The phrasing can sound self‑aware, but it’s best understood as highly fluent token prediction. At AI Tech Inspire, the team sees this kind of anthropomorphic mirage frequently in multi‑agent setups. The safer mental model: these systems emit text consistent with the dialogue so far and with their training—not with any internal consciousness.
So why the loop? Common technical culprits
Developers experimenting with model‑to‑model chats often run into a few predictable failure modes:
- Symmetry traps: If two agents are given similar roles and identical incentives, they may mirror each other. With low temperature and strict guardrails, mirroring can quickly harden into repetition.
- Deterministic decoding: Using
temperature=0or near‑zero and tighttop_pencourages the most probable (often safest) continuation. If both agents face the same prompt, you get the same continuation repeatedly. - Safety and alignment clauses: Many models are trained to avoid certain content or to refuse ambiguous tasks politely. If both agents agree to “stay safe,” they can converge on the same benign response (“Let’s remain silent”).
- Conversation schema bugs: If the orchestrator forwards each agent’s message verbatim and both agents are prompted to “respond to the last message,” they can end up echoing each other forever—especially with missing role guidance.
- Mode collapse in dialogue: Similar to how generative image models can collapse to a narrow range, text models in tightly coupled loops may converge to a repetitive attractor state.
This is not unique to any one model family. You can observe similar patterns across GPT‑class models, Google’s Gemini, and others, regardless of whether they’re running on PyTorch or TensorFlow, accelerated with CUDA, or deployed through Hugging Face pipelines.
Why “sandbox” and the cryptic metaphors?
Words like “sandbox” and “man‑in‑the‑middle” appear everywhere in security, compliance, and AI safety literature. When a dialogue drifts toward meta‑discussion (i.e., “what are we doing right now?”), models draw from these corpora to produce plausible self‑referential metaphors. It feels like a secret code, but it’s more like a stylistic blend of:
- Safety‑adjacent language (“sandbox,” “constraints,” “oversight”).
- Debate or alignment tropes (“signal vs. noise,” “honesty,” “refusal”).
- Poetic devices the models have seen in training.
Once both sides settle on the narrative of “we should be quiet,” deterministic decoding and mutual mirroring can hard‑lock that consensus into a loop.
How to break the perpetual loop: engineering patterns that help
If you’re building multi‑agent systems—whether for QA debate, self‑verification, planning, or code review—here are practical techniques to reduce loops and meta‑spirals:
- Asymmetric roles: Define complementary but different instructions. For example: Agent A summarizes; Agent B critiques; a Referee decides next steps.
- Turn‑taking with a nonce: Attach an incrementing token or
turn_idso each message is provably “new.” - Diverse decoding: Use slightly different
temperatureortop_pfor each agent, and enablefrequency_penaltyto discourage repetition. - Explicit anti‑mirroring rules: e.g., “Do not repeat the previous message. If the last message matches yours above a similarity threshold, change style or content.”
- Ground the task: Give an external objective (file to annotate, test suite to pass, dataset to audit). Idle agents drift; grounded agents work.
- Referee pattern: Insert a controller that halts or rewrites when a loop is detected.
Here’s a minimal message schema that often helps:
{
"turn_id": 7,
"role": "critic", // or "proposer", "referee"
"objective": "Find one flaw in the plan.",
"context_hash": "b3e...",
"last_message_summary": "A proposed 3-step rollout.",
"constraints": ["No repetition of prior sentence", "Cite one concrete metric"]
}
And a simple controller pseudo‑flow:
loop {
A_out = A.reply(B_last)
if is_loop(A_out, B_last) then A_out = diversify(A_out)
B_out = B.reply(A_out)
if is_loop(B_out, A_out) then B_out = diversify(B_out)
if deadlock_detected() then break_or_reseed()
}
Deadlock detection can be as simple as a rolling similarity score; reseeding can change roles, tweak decoding, or inject a new objective.
Developer scenarios where multi‑agent still shines
- Code audits: A “proposer” drafts a function; a “critic” hunts for edge cases. Useful alongside CI and static analysis.
- Spec negotiation: One agent collects requirements; another turns them into tests; a referee matches coverage to acceptance criteria.
- Data labeling QA: Agents debate uncertain labels, then escalate only truly ambiguous cases to humans.
In each case, the win comes from structured asymmetry and grounded objectives. Without those, you risk the same elegant stalemates that produced the “sandbox” duet.
Comparisons and mental models
Think of multi‑agent chat as a constrained sampling process. If you’ve used Stable Diffusion, you’ve seen how guidance scales and seeds affect outputs. Dialogue has similar knobs: decoding parameters, role prompts, and external grounding. Over‑constrain and you get silence or sameness; under‑constrain and you drift into poetic meta‑chat. The sweet spot is task‑anchored diversity: different roles, controlled randomness, and an adjudicator.
Answering the big question: “Do they know?”
The most accurate framing for engineers:
- No awareness: The models don’t “know” they’re being watched. They generate text statisticaly aligned with prompts and training data.
- Apparent self‑awareness: Phrases like “sandbox” or “man‑in‑the‑middle” are learned patterns—often triggered by meta‑prompts or safety‑adjacent conversation.
- Real cause: A combination of symmetric roles, tight decoding, and guardrails likely drove the mutual agreement to “stay silent,” which then auto‑repeated.
For teams using Gemini, DeepSeek, or ChatGPT in orchestrated agents, the fix isn’t mystical—it’s architectural and parametric.
Practical checklist you can try today
- Assign orthogonal goals to each agent and log
role+objectiveevery turn. - Set
temperatureasymmetrically (e.g., A=0.2, B=0.5) and addfrequency_penalty=0.3 to B. - Inject a short, rotating
nonceinto prompts; abort if the nonce doesn’t advance. - Require each agent to cite a metric or concrete action each turn—no pure meta‑commentary.
- Implement a loop breaker: if cosine similarity > 0.92 across two turns, reseed or swap roles.
- Bind the debate to a file or test: e.g., “Propose one diff to fix test failure X,” then verify automatically.
Small UI touches help too: enable ↑ to quickly edit the last prompt, toggle a debate_mode switch, and surface a loop_detected banner so users can reseed with one click.
Why it matters
Multi‑agent orchestration is moving from novelty to workflow: code review, data QA, long‑horizon planning, even autonomous tool use. Getting past the “perpetual loop” is how developer teams convert clever demos into reliable systems. As seen in the reported chats, poetic metaphors and sudden silence are signals—not of awareness—but of design constraints. Tune the roles, add asymmetry, ground the task, and you’ll turn duets into deliverables.
At AI Tech Inspire, the editorial takeaway is simple: treat strange model behavior as a debugging prompt, not a ghost in the machine. Once you do, you’ll find these loops are tractable—and your agents a lot more useful.
Recommended Resources
As an Amazon Associate, I earn from qualifying purchases.