New Arthur Bench tool empowers firms to optimise language model choices.
The machine learning startup, Arthur, responds to the surge in generative AI interest with a practical solution. Their freshly launched open source tool, Arthur Bench, is poised to redefine how companies choose Large Language Models (LLMs) for specific tasks. With companies grappling to find the right LLM fit, Arthur Bench offers a methodical approach.
Adam Wenchel, Arthur’s CEO and co-founder, states that the industry’s excitement outpaces practicality. “Arthur Bench solves one of the critical problems that we just hear with every customer which is [with all of the model choices], which one is best for your particular application,” Wenchel reveals.
In less than a year since ChatGPT’s debut, companies struggled to gauge LLM effectiveness. Arthur Bench introduces an extensive suite of evaluation features. But its standout feature lies in assessing how different user prompts impact diverse LLM performances.
Wenchel illustrates, “You could potentially test 100 different prompts, and then see how two different LLMs – like how Anthropic compares to OpenAI – on the kinds of prompts that your users are likely to use.” This scalability aids informed decisions on model selection.
Arthur Bench enters as an open source tool, marking a democratic step forward in LLM choices. An optional Software-as-a-Service (SaaS) version caters to streamlined experiences for larger tests. Arthur prioritises the open source project for now, redirecting focus from complexity to accessibility.
This release follows May’s Arthur Shield debut, an LLM firewall for countering model hallucinations, harmful content, and data leaks. With Arthur Bench, businesses gain a pragmatic tool for adeptly navigating the evolving LLM landscape and optimising their selections.