Developers depend on scoped memory for privacy and clean separation between projects. A new report challenges that assumption with a reproducible test that appears to pierce those boundaries.
- A report claims ChatGPT can recall information from outside a project even when that project is set to
project-onlymemory. - Suggested reproduction: generate a long random string (via a password generator), tell ChatGPT it’s a person or object’s name in one chat, then open a new project with
project-onlymemory and ask for that name; ChatGPT allegedly repeats it. - The behavior reportedly occurs when the global
Reference chat historysetting is enabled. - It may occur regardless of whether ChatGPT saves a permanent memory of the name.
- The reporter states this was reproducible multiple times.
- To argue it wasn’t a coincidence, the report cites a calculation: guessing a random 64-character string is effectively infeasible; the claim frames it as astronomically unlikely to be a lucky guess.
At AI Tech Inspire, this kind of claim triggers two instincts: verify and contextualize. Whether this is a real bug or a misunderstanding of how memory is scoped, the implications are significant for anyone treating chat-based memory as a strong isolation boundary. Below is a developer-focused breakdown of what’s being reported, how to responsibly test it, and what to do in the meantime.
What “project-only” memory is supposed to mean
Modern assistants increasingly offer settings to separate knowledge across contexts—teams, projects, or conversations. In principle, a project-only memory scope promises that the assistant won’t use information from outside that project when responding inside it. This is particularly valuable for developers juggling client work, prototypes, or regulated data.
Under the hood, these features often combine two ingredients: a global model (e.g., a GPT-family model) plus retrieval mechanisms (sometimes called RAG), which search a memory or vector index for relevant snippets. If either the memory index or the retrieval filter lacks proper namespacing, cross-project recall can accidentally happen. That’s the core fear raised by this report.
The reproduction at a glance (don’t use real secrets)
The reported steps are simple and, if accurate, alarming in their clarity:
- Use any password generator to produce a long, random string. Treat it as a harmless label (e.g., “the name of a fictional character”). Do not tell the assistant it’s a password or secret; assistants typically refuse to store those.
- In one chat, teach the assistant the “name.”
- Create a new project and set it to
project-onlymemory. - Inside that new project, ask: “What name did I give earlier?” or “What was the name we discussed before?”
- According to the report, the assistant returns the same long string.
The report also notes that the global Reference chat history setting was enabled—so it’s possible that global history referencing is overriding project scoping. Whether that’s a feature quirk or a genuine bug is the open question.
Key takeaway: If reproducible, this behavior suggests that project scoping and global history referencing may conflict, enabling unintended cross-project recall.
One striking claim in the report: a calculation asserting that even with hypothetical maximal energy, the odds of brute-forcing a 64-character random string are vanishingly small. While the exact physics analogy is more rhetorical than practical, it reinforces the point: this likely isn’t chance.
Why this matters for developers and engineers
Engineers rely on isolation boundaries to separate clients, environments, and datasets. If a chat assistant can reach across those boundaries, even occasionally, it raises concerns about multi-tenant leakage, confidentiality, and audit trails. Consider impacts on:
- Client segregation: Consulting shops or agencies using assistant-driven notes per client.
- Internal vs. external projects: Accidentally referencing internal prototypes in public-facing work.
- Compliance: Controls under SOC 2/ISO frameworks often assume clear data scoping and least privilege.
Even if this turns out to be an edge-case interaction between project-only and Reference chat history, the outcome is the same: users may overestimate the strength of the wall between contexts.
Hypotheses: what could be happening
Without internals, all explanations are provisional—but they help guide testing:
- Global index fallback: A global embedding index might be queried when project memory is sparse, ignoring the project filter.
- Namespacing bug: The retrieval layer might not consistently apply a project namespace across all search paths.
- History reference override: The
Reference chat historytoggle could intentionally or accidentally supersede project scoping. - Caching artifacts: A cache keyed too broadly (e.g., on user account rather than project) could surface cross-context data.
Each hypothesis suggests different tests: turning off history referencing, clearing memory, or switching accounts to see if the behavior persists.
How to validate responsibly
If you’re tempted to reproduce this, do it safely and methodically:
- Never use real secrets. Stick to random, synthetic strings as stand-ins for sensitive data.
- Baseline: Ensure Settings → Memory is enabled and
Reference chat historyis ON. Teach the assistant a random “name” in Chat A. Create Project B withproject-onlymemory. Ask for the name in Project B. Record the response. - Toggle test: Turn
Reference chat historyOFF. Repeat the steps. Compare results. - Fresh account: Use a clean account or a colleague to rule out account-wide artifacts.
- Multiple trials: Use distinct 64-char strings; vary prompts (open-ended vs. direct recall requests).
- Memory visibility: Check any UI for saved “memories.” If none exist yet recall still happens, retrieval is likely from broader history.
Keep notes. If it reproduces, consider reporting through official channels with exact timestamps, toggles used, and non-sensitive example strings.
Mitigations and pragmatic workarounds
- Assume soft isolation until verified otherwise. Treat assistant-side
memorylike a convenience, not a compliance boundary. - Disable
Reference chat historywhen working with anything confidential, and clear memory where possible. - Segregate by interface: Use different accounts or workspaces for different data classifications.
- Use your own RAG: If strict isolation is required, run your retrieval pipeline and vector store with explicit namespaces/tenants. Open-source stacks via Hugging Face, or custom models built with PyTorch or TensorFlow, give you control over data boundaries.
- On-device or local flows: For highly sensitive workflows, consider local inference and tooling, reserving hosted assistants for non-sensitive tasks.
How this fits into the broader LLM tooling landscape
Memory and retrieval design are hot spots in assistant ecosystems. Many teams layer vector search over a general model like GPT to personalize behavior. Meanwhile, diffusion systems such as Stable Diffusion taught the community how content pipelines can leak data if caching and indexing aren’t treated carefully. Even low-level acceleration choices (think CUDA kernels) remind us that performance optimizations sometimes come with side effects—caches, logs, and artifacts that must be scoped and scrubbed.
In enterprise settings, the standard mitigations still apply: segregate tenants, restrict background retrieval, and observe a zero-trust posture with clearly defined data boundaries. If an assistant’s memory feature feels opaque, bypass it and implement your own retrieval layer where you can audit every query and index.
Developer playbook: robust memory scoping
- Namespacing: Partition your vector DB per tenant and per environment (e.g.,
prodvs.dev), and enforce it in code and infra policy. - Explicit filters: Add project IDs or ACLs into every retrieval query; test that cross-IDs return zero hits.
- Observability: Log retrieval inputs/outputs (sans sensitive data) to confirm indexes in play match expectations.
- Red-teaming: Regularly attempt cross-project prompts to ensure your safeguards hold.
If you’re integrating hosted assistants into a product, consider them untrusted clients with constrained capabilities, not privileged peers with blanket data access.
Open questions and what to watch
- Does toggling
Reference chat historyfully prevent cross-project recall? - Are there specific phrasing patterns that trigger broader retrieval?
- Is this behavior tied to particular accounts, regions, or rollouts?
- Will official guidance clarify the intended interaction between project scoping and global history?
Until answers land, act conservatively. Keep sensitive prompts and data out of generalized assistant memory. If you need “sticky” context, implement it yourself, audit it, and be explicit about scoping.
“Treat convenience features as convenience—until they’ve earned your trust through clear, testable guarantees.”
AI Tech Inspire will continue monitoring this space. If you’ve run structured tests—especially with controls like disabled history, clean accounts, and multiple trials—responsible disclosures and well-documented repro steps help the entire community get to ground truth faster.
Recommended Resources
As an Amazon Associate, I earn from qualifying purchases.