Yahoo is in a move to bring distributed Google’s deep learning innovation to the market of big data clusters through its open source framework TensorFlowOnSpark. The latest TensorFlow-Spark mix is highly inspiration from a Caffe solution that the Internet giant launched last year.
TensorFlowOnSpark solves the problem of deploying deep learning on big data clusters in a distributed form. This is not a completely new deep learning model but instead an upgrade to the existing frameworks that were requiring the development of multiple programs for deploying intelligence on big data clusters and giving a space for unwanted system complexity as well as end-to-end learning latency.
Yahoo’s Big ML team has taken some notes from the development of its CaffeOnSpark, which was launched last year as a solution to enable Caffe users with distributed deep learning and big data processing on Spark and Hadoop clusters.
“We are taking a page from our own playbook and doing for TensorFlow for what we did for Caffe,” the Big ML team, including Lee Yang, Jun Shi, Bobbie Chern and Andy Feng, writes in a blog post.
Unlike CaffeOnSpark, TensorFlowOnSpark enables distributed TensorFlow execution on Spark and Hadoop clusters. It also works with SparkSQL, MLlib and other Spark libraries in a single pipeline or program. Further, the framework can be used to modify any TensorFlow program.
“Process-to-process direct communication enables TensorFlowOnSpark programs to scale easily by adding machines. TensorFlowOnSpark does not involve Spark drivers in tensor communication, and thus achieves similar scalability as stand-alone TensorFlow clusters,” the Big ML team defines in the post.
Yahoo engineers have uplifted the original TensorFlow that Google developed in 2015 to enable support for remote direct memory access (RDMA). Its presence can be seen from TensorFlow’s C++ layer. An alpha-quality code has also been released to let the open source community leverage the efforts made by the Big ML team.