Microsoft today announced the release of SynapseML (previously MMLSpark), an open source library designed to simplify the creation of machine learning pipelines.
With the new release, Microsoft says, developers can build scalable and intelligent systems for solving challenges in domains such as anomaly detection, computer vision, deep learning, model interpretability and so on.
“Over the past five years, we have worked to improve and stabilise the SynapseML library for production workloads. Developers who use Azure Synapse Analytics will be pleased to learn that SynapseML is now generally available on this service with enterprise support [on Azure Synapse Analytics]. They can now build large-scale ML pipelines using Azure Cognitive Services, LightGBM, ONNX, and other selected SynapseML features,” Microsoft software engineer Mark Hamilton wrote in a blog post.
The company says it observed building production-ready distributed ML pipelines can be difficult, even for the most seasoned developer. Composing tools from different ecosystems often requires considerable “glue” code, and many frameworks aren’t designed with thousand-machine elastic clusters in mind.
SynapseML aims to resolve this challenge by unifying several existing ML frameworks and new Microsoft algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.
Microsoft explains that to make SynapseML’s integration with Azure Cognitive Services fast and efficient, it has introduced several new tools into Apache Spark. In particular, SynapseML automatically parses common throttling responses to ensure that jobs don’t overwhelm backend services.
SynapseML enables developers to use models from many different ML ecosystems through the Open Neural Network Exchange (ONNX) framework and runtime. It also enables developers not only to use existing models and services, but also to build and train their own. It introduces new algorithms for personalised recommendation and contextual bandit reinforcement learning using the Vowpal Wabbit framework, an open source ML system library developed at Yahoo Research.
Adding to the features, SynapseML also introduces several new capabilities for unsupervised responsible AI. With Microsoft’s new tools for understanding dataset imbalance, the company said researchers can detect whether sensitive dataset features, such as race or gender, are over- or under-represented and take steps to improve model fairness. Furthermore, SynapseML’s distributed isolation forest enables researchers to detect outliers and anomalies in their datasets without needing labelled training data.