What will RLHF-based PaLM apps be able to accomplish? With the model’s expanding scale, performance across all activities keeps becoming better, creating more opportunities. Up to 540 billion parameters can be used with PaLM. Comparatively, GPT-3 only has about 175.
The first open source ChatGPT equivalent appears to have appeared. On top of Google’s PaLM architecture, which includes 540 billion parameters, is an implementation of RLHF (Reinforcement Learning with Human Feedback). The developer in charge of reversing engineering closed-sourced AI systems like Meta’s Make-A-Video released PaLM + RLHF, ChatGPT Equivalent, which is now open source and functions similarly to ChatGPT. It’s referred to as a work-in-progress. The system combines PaLM, a sizable language model from Google, with a technique called Reinforcement Learning with Human Feedback, or RLHF, to build a system that can practically execute every function that ChatGPT can, including email authoring and code recommendations.
RLHF + PaLM are not pre-trained. In order for the system to actually work, it must have acquired the crucial training utilising sample data from the web. After downloading PaLM + RLHF, a ChatGPT-like experience won’t arise suddenly; that would require producing gigabytes of text from which the model can learn and finding hardware able to handle the training demand. PaLM + RLHF won’t be able to take the place of ChatGPT unless a well-funded initiative (or individual) goes to the trouble of educating and making it available to the general public.
The good news is that several similar projects to mimic ChatGPT, including one operated by the research team CarperAI, are emerging swiftly. Along with start-ups Scale AI and Hugging Face, open AI research organisation EleutherAI, and CarperAI, the first ready-to-use ChatGPT-like AI model trained with human feedback will be made public. A project to replicate ChatGPT using the most modern machine learning techniques is being led by the nonprofit LAION. The first dataset needed to train Stable Diffusion was given by LAION.
Both ChatGPT and PaLM + RLHF share a secret sauce called Reinforcement Learning with Human Feedback, which aims to better match language models with what users want them to do. In RLHF, a language model is fine-tuned using a dataset that includes prompts that correspond to what volunteers’ expectations for the model are, such as “Machine learning is a form of AI…” The language model utilised in PaLM + RLHF is PaLM. The volunteers rank each response from best to worst after inputting the aforementioned cues into the refined model, which generates multiple responses. After that, a “reward model” is trained using the rankings, which takes the responses from the first model and arranges them in accordance with preference while filtering for the procedure of gathering training data is expensive.
Training also doesn’t come cheap. The language model’s 540 billion parameters, or language model’s constituent parts, were learned from the training set. A 2020 study estimated that building a text-generating model with only 1.5 billion parameters may cost up to $1.6 million. The open source model Bloom, which has 176 billion parameters, was trained over the course of three months using 384 Nvidia A100 GPUs, each of which costs thousands of dollars. It is also difficult to run a trained model with the PaLM + RLHF size.