Hugging Face, an AI startup, and ServiceNow Research, the company’s R&D arm, today unveiled BigCode, a new initiative with the goal of creating “state-of-the-art” AI systems for code in a “open and accountable” manner.
A tantalising glimpse of what is currently achievable with AI in the world of computer programming can be seen in code-generating systems like DeepMind’s AlphaCode, Amazon’s CodeWhisperer, and OpenAI’s Codex, which powers GitHub’s Copilot service. However, only a small number of these AI systems have so far been made publicly accessible and open sourced, which is in line with the financial objectives of the corporations who are developing them.
The objective is to eventually release a dataset big enough to train a code-generating system, which will then be used to create a prototype using ServiceNow’s internal graphics card cluster, a 15 billion parameter model that is bigger than Codex (12 billion parameters) but smaller than AlphaCode (41.4 billion parameters).
The components of an AI system that are learned from past training data are known as parameters in machine learning. These components effectively define the system’s expertise on a given challenge, such as writing code.
BigCode is working to resolve some of the issues that have come up regarding the use of AI-powered code generation, particularly with regard to fair use, by jointly creating a code-generating system that will be open sourced under a licence that will permit developers to reuse it under certain conditions.
In order to train and profit from Codex, GitHub and OpenAI used public source code, some of which were not covered by permissive licences, according to the charity Software Freedom Conservancy and others. While Copilot is now available through GitHub’s premium API, Codex is only accessible through OpenAI’s paid API. According to GitHub and OpenAI, neither Codex nor Copilot violate any of the provisions of their respective licences.
The BigCode organisers promise to make every effort to ensure that the aforementioned training dataset exclusively contains files from repositories with permissive licences. They claim that along the way, they’ll work to develop “responsible” AI procedures for teaching and exchanging code-generating systems of all kinds, seeking input from pertinent parties before announcing any policy.
According to the organisers, BigCode will be accessible to anyone with a background in professional AI research and the time to dedicate to the project. It was inspired by Hugging Face’s BigScience initiative to open source highly complex text-generating systems. This afternoon, the application form became accessible.
Hugging Face and ServiceNow could not specify a date for when the project might be finished. However, they anticipate it investigating a number of code generation techniques over the coming few months, including auto-completion and code synthesis systems that operate across a wide range of domains, tasks, and programming languages.
AI-powered coding tools might significantly reduce development costs while freeing up programmers to work on more imaginative projects, assuming the ethical, technological, and legal challenges are one day resolved. A University of Cambridge study found that developers spend at least half of their time troubleshooting rather than actively producing, which is predicted to cost the software industry $312 billion annually.