
OpenSquilla has launched a self-hostable open-source AI agent runtime designed to cut enterprise token costs through intelligent model routing, multi-tier memory, and secure syscall-level isolation.
OpenSquilla has released the first public version of its self-hostable, open-source AI agent runtime, positioning it as a cost-efficient alternative to conventional AI agent stacks for long-horizon enterprise workloads. Released under the Apache-2.0 licence, the Python 3.12+ framework is available for self-hosting on GitHub.
The project’s primary focus is reducing unnecessary AI token expenditure. OpenSquilla claims its coordinated routing and optimisation stack can lower token spending by 60–80% compared to flat single-model deployments. Built-in quota hooks and per-call cost tracking are designed to automatically detect and throttle overspending.
In a local benchmark, three prompts processed a combined 279,762 tokens at a total session cost of $0.0094. Around 222,848 tokens — nearly 80% of all input tokens — were served from cache through context reuse across sessions.
The runtime uses an ML classifier that evaluates request complexity using message length, code blocks, keyword patterns, and embedding-based semantic features. Simpler tasks are routed to lower-cost models, while deep reasoning is disabled for lightweight prompts to reduce compute overhead.
OpenSquilla also introduces a four-tier cognitive memory architecture comprising working, episodic, semantic, and raw memory layers, alongside vector-semantic and BM25 retrieval. Local ONNX inference keeps embeddings on-device.
On security, the framework uses syscall-level isolation through Bubblewrap on Linux and Seatbelt on macOS, alongside policy-based execution controls and prompt injection protections. Its microkernel-style architecture further enables lightweight plugin creation without mandatory SDKs or manifest files.














































































