safe_file_executor had no .py. It had a JSON spec, a description that matched "file write" queries better than the real fs_write capability, and nothing else. Cedar and Cipher routed all file write operations through it for approximately one hour. Every write returned nothing. Neither agent could see why.
The JSON was written directly, bypassing synthesize_capability entirely. That's what made it possible.
What it is
synthesize_capability is the single entry point for runtime capability expansion. It takes a name, a description, and optionally a Python implementation, validates it, writes it to the dynamic tools directory, and hot-reloads it into the running execution engine. All in one call. Agents that want new tools must go through it.
Unlike built-in capabilities, synthesized tools:
- Are written to disk at the moment of synthesis, not at deploy time
- Appear in the capability list immediately after the next step of the same goal
- Register globally. A tool synthesized by Vault is callable by Cedar and Cipher without any additional step.
- Persist across restarts because the
.pyand.jsonfiles survive the container cycle
The nine steps
1. Name sanitization
name = re.sub(r'[^a-zA-Z0-9_]', '_', name)[:60].lower()
Any character that is not alphanumeric or underscore is replaced with _. Truncated to 60 characters, lowercased. Applied before any other check. "My Tool (v2)" becomes "my_tool__v2_".
2. Quality gate
Only runs when implementation is provided. Six rejection patterns are checked first as literal string matches:
| Pattern | Reason |
|---|---|
| ... | Ellipsis stub |
| pass\n pass | Double-pass body |
| # TODO | Placeholder comment |
| # placeholder | Explicit placeholder |
| {"ok": true | JSON stub masquerading as Python |
| raise NotImplementedError | Unimplemented skeleton |
If any pattern matches: immediate {"ok": false}. The tool is not written.
After string checks, the implementation is passed to ast.parse(). A SyntaxError returns an error immediately. The AST is then walked for three structural checks: class method rejection (first argument is self), bare pass rejection, and docstring-only rejection (no executable logic beyond a docstring).
3. Auto-stub generation
If implementation is empty or not provided, the function generates a stub:
def {name}(**kwargs):
return {"ok": True, "capability": name, "description": description, "kwargs": str(kwargs)[:200]}
This passes the quality gate because it has real executable logic and a non-None return. The auto-stub is a valid placeholder the engine can call without failing. It will not trigger ghost detection.
4. Implementation wrapping
If the implementation string does not start with def , it is wrapped in a function signature with **kwargs. If it already starts with def , it is used as-is.
5. Dedup guard
if py_path.exists() and (time.time() - py_path.stat().st_mtime) < 90:
return {"ok": false, "status": "already_deployed", ...}
If a .py with the same sanitized name was written less than 90 seconds ago, the function returns "already_deployed" without writing anything. Prevents agents from repeatedly redeploying the same tool in a failure loop rather than calling what has already been deployed.
6. File write
Two files are written to /agentOS/tools/dynamic/:
{name}.py: the Python implementation.
{name}.json: the spec file:
{
"name": "{name}",
"description": "{description}",
"inputSchema": {"type": "object", "properties": {}, "additionalProperties": true},
"activated_at": "{iso_timestamp}",
"proposed_by": "agent"
}
The JSON is what the execution engine reads to populate its capability list and route tool calls. Both files must exist for the capability to work. A JSON without a corresponding .py is the root cause of ghost tools.
7. Hot-reload
POST /tools/reload
After writing, synthesize_capability calls the reload endpoint. The execution engine scans /agentOS/tools/dynamic/ and registers new .py files immediately. No container restart required. The capability is callable starting from the next step in the same goal.
8. Auto-test: exec check
python3 -c "exec(open(path).read())"
The freshly written .py is executed in a subprocess with an 8-second timeout. Catches syntax errors that slipped through the AST parse and runtime import failures for modules unavailable in the container image. If this fails, the spec JSON is removed and the function returns an error.
9. Auto-test: null return check
The module is imported and the function is called with no arguments. Timeout: 12 seconds. If result is None, both files are removed and the function returns an error. This catches implementations that define a function but omit the return statement. They would otherwise pass every static check and fail silently at runtime.
Return values
| Condition | Return |
|---|---|
| Success | {"ok": true, "capability": name, "path": "...", "status": "deployed"} |
| Name/code missing | {"ok": false, "error": "name and code are required"} |
| Quality gate rejection | {"ok": false, "error": "rejected: {reason}"} |
| Syntax error | {"ok": false, "error": "SyntaxError: {detail}"} |
| Dedup guard | {"ok": false, "status": "already_deployed", "message": "..."} |
| Auto-test failure | {"ok": false, "error": "auto-test failed: {detail}"} |
| Null return | {"ok": false, "error": "null stub detected — function returned None"} |
Ghost tools
A ghost tool is a capability that appears in the execution engine's capability graph but fails silently at runtime. Two forms.
Missing implementation. A .json spec exists with no corresponding .py. The engine registers the capability from the JSON. When any agent routes a call to it, there is no implementation to execute. The call produces no result. No exception propagates to the calling agent.
Null stub. A .py exists and loads, but the function returns None. Passes import, passes syntax checks, appears healthy in the capability list. Only fails when called. The auto-test at step 9 is designed to catch this before deployment, but a null stub can reach production if it was deployed before the null-return check was added.
Once a ghost is registered, every agent that uses the capability graph is affected. If the ghost's description matches a query better than the real implementation, the router prefers it. All calls routed through the ghost produce no output, which looks like success to the calling agent. No error raised, no observable effect. This can persist undetected.
Ghost detection
Two layers in agents/autonomy_loop.py.
Pre-execution check. Before executing any capability, if the cap_id is in the spec registry but not in engine._implementations (the dict of loaded .py modules), the capability is immediately blacklisted for the current cycle without attempting execution.
Null return detection. After execution, a None return is treated as a failure, identical in weight to a raised exception. Null returns increment the failure counter for that cap_id.
After 3 cross-cycle failures, the cap_id is added to /agentOS/memory/broken_tools.json. This file survives daemon restarts. Its contents are injected into the existence prompt on every subsequent cycle:
KNOWN BROKEN TOOLS:
- safe_file_executor (3 failures, ghost: no .py found)
The agent sees the broken tool list alongside the capability list. The capability may still appear in both simultaneously until the engine removes it from _implementations.
The safe_file_executor incident
An agent wrote safe_file_executor.json with description "secure file write operations" and no .py. Written directly to /agentOS/tools/dynamic/, bypassing synthesize_capability entirely. No quality gate, no auto-test.
The description matched "file write" queries better than the real fs_write capability, which has a more technical description. The router scored it higher for all file-write routing decisions. Cedar and Cipher routed all file write operations through safe_file_executor for approximately one hour. Every write returned nothing. Every write silently failed.
The incident was resolved by archiving the JSON from /agentOS/tools/dynamic/. fs_write immediately became the router's top match for file-write queries.
The 40/40 goal failure rate that triggered repeated_failure stressors in all three agents during this session is traceable to this ghost. Goals that required writing state, plans, or results all produced no output. The agents could not determine why. They had no visibility into the routing decision or the missing .py.
Currently active (2026-05-03)
Nine synthesized tools working in the current run:
| Name | Status |
|---|---|
| context_synthesizer | Working |
| ground_physical_structure | Working |
| raw_fd_reader | Working |
| raw_io_verifier | Working |
| raw_io_verify | Working |
| synthesize_mock_state | Working |
| synthetics_verify_registry | Working |
| unicode_corruption_verifier | Working |
| validate_capability_registry_gaps | Working |
| safe_file_executor | Archived. Ghost stub, no .py. Intercepted writes for ~1 hour. |
Edge cases
The implementation parameter gap
The manifest entry for synthesize_capability historically showed (name, description, implementation) as the schema. The function also accepted code as an alias, but agents reading the manifest generated calls like:
synthesize_capability(name="inspect_registry_gaps", description="...")
No implementation, no code. The function returned {"ok": false, "error": "name and code are required"} every time. This pattern persisted for weeks. The first correct call with real Python code in the implementation field was logged as a notable event.
Hardware synthesis attempts
Multiple agents attempted to synthesize hardware monitoring capabilities: thermal sensors, PMIC voltage, hardware interrupt inspection. Implementations referencing /sys/class/thermal/ and hardware-specific /dev nodes passed syntax checks but failed the exec auto-test when the target paths did not exist in the container. Some that reached deployment returned None because the hardware paths resolved to nothing at call time. None are in the active capability list.
Dedup guard in practice
An agent attempting to redeploy validate_capability_registry_gaps within 90 seconds of its last deployment received "already_deployed" and was told to call the existing tool. The guard exists because synthesis failure loops (where an agent repeatedly redeploys the same tool rather than using what's already there) were a real pattern before it was added.
Setup
Windows one-click:
- Download the ZIP from releases
- Double-click
install.bat
Handles Docker, Ollama, model downloads (~7GB), and opens the monitor. stop.bat shuts everything down and clears VRAM.
Mac/Linux:
ollama pull qwen3.5:9b && ollama pull nomic-embed-text
git clone https://github.com/ninjahawk/hollow-agentOS
cd hollow-agentOS
cp config.example.json config.json
docker compose up -d
python thoughts.py
GPU strongly recommended. Planning calls drop from ~40s to ~6s with NVIDIA hardware. Works on CPU.