Facebook Open-sources Pythia for Vision and Language Multimodal AI Models


Facebook’s AI research division has announced the release of Pythia, a deep learning framework that supports multitasking in the vision and language domain, on GitHub.

This modular plug-and-play design enables data scientists to quickly build, reproduce and benchmark AI models.

As Facebook explains in a blog post, Pythia – which is built on top of the company’s open-source PyTorch framework – is designed for vision and language tasks, such as answering questions related to visual data and automatically generating image captions.

It can show how previous state-of-the-art models achieved related benchmark results and quickly gauge the performance of new models.

What else it can do?

In addition to multitasking, Pythia also supports distributed training and a variety of datasets, as well as custom losses, metrics, scheduling, and optimizers.

“Pythia smooths the process of entering the growing subfield of vision and language and frees researchers to focus on faster prototyping and experimentation,” Facebook writes in the blog post.

Facebook plans to expand on Pythia’s initial open source release by adding tools, tasks, data sets and reference models.



