Lemony.ai has open sourced Cascadeflow, a prompt routing tool aiming to reduce AI development costs by up to 85%.
Lemony.ai, operating as Uptime Industries Inc., has released Cascadeflow, an open-source dynamic prompt routing tool designed to help developers reduce artificial intelligence application and API usage costs. Cascadeflow determines the most cost-effective language model for each query while maintaining output quality and performance.
Cascadeflow works by routing prompts through a cascading pipeline. A request is initially processed by a smaller, low-cost model and then evaluated against defined quality metrics, including completeness and correctness. If the result does not meet the required threshold, the prompt is automatically escalated to a larger model. This approach applies speculative execution techniques to large language model inference, preventing unnecessary usage of higher-cost flagship models.
“Cascadeflow lets developers run smarter, not bigger, by dynamically choosing the right model for every task,” said Sascha Buehrle, Co-Founder and Chief Executive of Lemony.ai. He added, “You don’t need a flagship model to answer ‘what’s 2 plus 2.’” The software is designed to run in cloud environments, on local machines, or on edge devices. “You can run it wherever your AI application runs,” Buehrle noted.
The tool tracks token usage across providers, offers per-query spending caps, and allows developers to define local pricing based on contract differences. Early internal benchmarks suggest that approximately 85% of prompts can be handled using smaller or domain-specific models.
Supporting commercial and open-source models from OpenAI, Anthropic, Hugging Face, Groq, Together, vLLM, Ollama and LightLLM, Cascadeflow also integrates with agent frameworks, the Model Control Protocol and the n8n automation platform.
Buehrle emphasised the importance of transparency in cost control, stating, “It’s important to push the core of Lemony out as open source. It’s important to build a community and to get from the companies using it.”
Cascadeflow is available now on GitHub.














































































