OpenAI Releases Programming Language ‘Triton’ For AI Workload Optimisation


OpenAI today release Triton, an open source Python like programming language that allows researchers to write GPU code for AI workloads. Triton is said to support programmers in reaching peak hardware performance with little effort. Open AI claims, Triton can produce code on par with what expect could achieve in few lines. It is said to deliver substantial ease of use benefits over coding in CUDA for neural network tasks. 

“Triton is for machine learning researchers and engineers who are unfamiliar with GPU programming despite having good software engineering skills,” said Philippe Tillet, Triton’s original creator and OpenAI staff. 

Deep neural networks have emerged as an important type of AI model that can achieve state-of-the-art performance across many domains including natural language and  computer vision. The hierarchical structure form the strength of these models, which can generates work suitable for multicore hardware like GPUs. Though frameworks for general-purpose GPU computing like CUDA , OpenCL have made the development of high-performance programs easier, optimising GPUs remain to be a challenge. Triton is aimed to automate optimisations to allow developers focus on code logic. 

“Novel research ideas in the field of Deep Learning are generally implemented using a combination of native framework operators. While convenient, this approach often requires the creation or movement of many temporary tensors, which can hurt the performance of neural networks at scale. These issues can be mitigated by writing specialized GPU kernels, but doing so can be surprisingly difficult due to the many intricacies of GPU programming. And, although a variety of systems have recently emerged to make this process easier, we have found them to be either too verbose, lack flexibility or generate code noticeably slower than our hand-tuned baselines,” said Tillet in a blog.

While observing the cause of considerations in architecture of modern GPUs while programming, Triton is said to simplify the code through its programming model and high level system architecture. 

Triton exposes intra-instance parallelism via operations on blocks—small arrays whose dimensions are powers of two—rather than a Single Instruction, Multiple Thread execution model. In doing so, Triton effectively abstracts away all the issues related to concurrency within CUDA thread blocks…It can greatly simplify the development of more complex GPU programs, reads the blog.

Triton’s documentation website explains the main challenged posed by the paradigm is on how the work done by each program instance should be partitioned for efficient execution of modern GPUs. “To address this issue, the Triton compiler makes heavy use of block-level data-flow analysis, a technique for scheduling iteration blocks statically based on the control- and data-flow structure of the target program. The resulting system actually works surprisingly well: our compiler manages to apply a broad range of interesting optimisation automatically.”

The software is offered as an open-source with conditions to include copyright notice and permission in any distribution of substantial copies of the code. 


Please enter your comment!
Please enter your name here