Shopify has open sourced its internal ML platform, Tangle, helping developers cut redundant compute and build faster, reproducible pipelines.
Shopify has open sourced Tangle, its internal machine-learning experimentation platform designed to reduce repetition, enforce reproducibility, and accelerate development cycles. The platform originated from challenges faced by Shopify’s search and discovery teams, which work with millions of products and billions of queries.
Before Tangle, engineers often rebuilt identical datasets, reran long preprocessing steps, and struggled to reproduce historical results. According to Shopify, “Machine learning development shouldn’t work this way, but it does. 80% of development time is spent on data engineering, not algorithms.”
The platform has already saved more than a year of compute time internally. “The CPU time savings alone are ridiculous,” said Mikhail Parakhin, CTO of Shopify. A 10-hour pipeline can now complete in just 20 minutes when only one component changes.
Tangle features a visual pipeline interface backed by content-based caching. Pipelines are built as directed acyclic graphs composed of components, which are language-agnostic units that wrap arbitrary CLI programs. Components run in isolation inside containers, ensuring deterministic behaviour and automatic artefact reuse. Shopify notes: “Tangle’s cache operates globally across all users… all three pipelines share the artefact—even for still-running executions.”
The platform is language- and environment-neutral, supporting Python, JavaScript, Rust, or any file-based workflow, across cloud or on-prem setups. The visual editor provides real-time execution status, cached steps, logs, and performance insights, while storing complete lineage for reproducibility.
Tobi Lutke, CEO of Shopify, said: “Tangle is a major piece of our Shopify data and ML system. It makes complex things easy and automatically avoids doing things more than once, saving an insane amount of waste.”
By open sourcing Tangle, Shopify enables the broader developer community to reduce redundant compute, build reproducible ML workflows, and integrate existing code without constraints, promoting best practices in machine-learning engineering.














































































