Yahoo open sources its big data-supported search tech

search technology

search technology

Months after being acquired by Verizon, Yahoo has decided to open source its big data processing and serving engine called Vespa. The technology was previously exclusive search queries on key Yahoo products, including Yahoo News and Flickr among others.

Verizon-owned Oath, which serves as the parent company of Yahoo, claims that Vespa processes and serves content and ads almost 90,000 times every second with latencies in the tens of milliseconds. It is even touted to handle keyword and image searches on the scale with a few hundred queries per seconds on tens of billions of images.

Having said that, developer teams can leverage Vespa to pick content through SQL-like queries and text search, organise matches and generate data-driven pages and write data in real-time. The technology is capable of distributing data and computation over several machines at once. Also, it helps to organise matches to generate data-driven pages.

“By releasing Vespa, we are making it easy for anyone to build applications that can compute responses to user requests, over large datasets, at real-time and at internet scale — capabilities that up until now, have been within reach of only a few large companies,” Vespa’s distinguished architect Jon Bratseth wrote in a blog post.

Oath considers a similarity between the release of Vespa engine and the arrival of Hadoop, which Yahoo had developed under its venture Hortonworks in June 2011. However, the market of open source has evolved largely since the Hadoop launch.

The demand for a big data processing and serving engine like Vespa would not be as high as the requirement of a storage framework like Hadoop. Similarly, Hadoop was debuted originally as an open source offering, while the Vespa code has been released around 12 years after being an internal part of Yahoo’s bouquet. The technology was derived by the company in 2005, as an initial result of the acquisition of AlltheWeb.

Vespa can be run on-premises or in the cloud and comes both in Docker images and rpm packages. Its code is available in a GitHub repository along with a detailed documentation.


Please enter your comment!
Please enter your name here