We're releasing Triton 1.0, an open-source Python-like programming language which enables researchers with no CUDA experience to write highly efficient GPU codemost of the time on par with what an expert would be able to produce. Triton makes it possible to reach peak hardware performance with relatively little effort; for example, it can be used to write FP16 matrix multiplication kernels that match the performance of cuBLASsomething that many GPU programmers can't doin under 25 lines of code. Our researchers have already used it to produce kernels that are up to 2x more efficient than equivalent Torch implementations, and we're excited to work with the community to make GPU programming more accessible to everyone.
Novel research ideas in the field of Deep Learning are generally implemented using a combination of native framework operators. While convenient, this approach often requires the creation (and/or movement) of many temporary tensors, which can hurt the performance of neural networks at scale. These issues can be mitigated by writing specialized GPU kernels, but doing so can be surprisingly difficult due to the many intricacies of GPU programming. And, although a variety of systems have recently emerged to make this process easier, we have found them to be either too verbose, lack flexibility or generate code noticeably slower than our hand-tuned baselines. This has led us to extend and improve Triton, a recent language and compiler whose original creator now works at OpenAI.
The architecture of modern GPUs can be roughly divided into three major componentsDRAM, SRAM and ALUseach of which must be considered when optimizing CUDA code:
Basic architecture of a GPU.
Reasoning about all these factors can be challenging, even for seasoned CUDA programmers with many years of experience. The purpose of Triton is to fully automate these optimizations, so that developers can better focus on the high-level logic of their parallel code. Triton aims to be broadly applicable, and therefore does not automatically schedule work across SMs -- leaving some important algorithmic considerations (e.g. tiling, inter-SM synchronization) to the discretion of developers.
Compiler optimizations in CUDA vs Triton.
Out of all the Domain Specific Languages and JIT-compilers available, Triton is perhaps most similar to Numba: kernels are defined as decorated Python functions, and launched concurrently with different program_ids on a grid of so-called instances. However, as shown in the code snippet below, the resemblance stops there: Triton exposes intra-instance parallelism via operations on blockssmall arrays whose dimensions are powers of tworather than a Single Instruction, Multiple Thread (SIMT) execution model. In doing so, Triton effectively abstracts away all the issues related to concurrency within CUDA thread blocks (e.g., memory coalescing, shared memory synchronization/conflicts, tensor core scheduling).
Vector addition in Triton.
While this may not be particularly helpful for embarrassingly parallel (i.e., element-wise) computations, it can greatly simplify the development of more complex GPU programs.
Consider for example the case of a fused softmax kernel (below) in which each instance normalizes a different row of the given input tensor $X in mathbb{R}^{M times N}$. Standard CUDA implementations of this parallelization strategy can be challenging to write, requiring explicit synchronization between threads as they concurrently reduce the same row of $X$. Most of this complexity goes away with Triton, where each kernel instance loads the row of interest and normalizes it sequentially using NumPy-like primitives.
Fused softmax in Triton.
Note that the Triton JIT treats X and Y as pointers rather than tensors; we felt like retaining low-level control of memory accesses was important to address more complex data structures (e.g., block-sparse tensors).
Importantly, this particular implementation of softmax keeps the rows of $X$ in SRAM throughout the entire normalization process, which maximizes data reuse when applicable (~<32K columns). This differs from PyTorchs internal CUDA code, whose use of temporary memory makes it more general but significantly slower (below). The bottom line here is not that Triton is inherently better, but that it simplifies the development of specialized kernels that can be much faster than those found in general-purpose libraries.
A100 performance of fused softmax for M=4096.
The lower performance of the Torch (v1.9) JIT highlights the difficulty of automatic CUDA code generation from sequences of high-level tensor operations.
Fused softmax with the Torch JIT.
Being able to write fused kernels for element-wise operations and reductions is important, but not sufficient given the prominence of matrix multiplication tasks in neural networks. As it turns out, Triton also works very well for those, achieving peak performance with just ~25 lines of Python code. On the other hand, implementing something similar in CUDA would take a lot more effort and would even be likely to achieve lower performance.
Matrix multiplication in Triton.
One important advantage of handwritten matrix multiplication kernels is that they can be customized as desired to accommodate fused transformations of their inputs (e.g., slicing) and outputs (e.g., Leaky ReLU). Without a system like Triton, non-trivial modifications of matrix multiplication kernels would be out-of-reach for developers without exceptional GPU programming expertise.
V100 tensor-core performance of matrix multiplication with appropriately tuned values for BLOCK$_M$, BLOCK$_N$, BLOCK$_K$, GROUP$_M$.
The good performance of Triton comes from a modular system architecture centered around Triton-IR, an LLVM-based intermediate representation in which multi-dimensional blocks of values are first-class citizens.
High-level architecture of Triton.
The @triton.jit decorator works by walking the Abstract Syntax Tree (AST) of the provided Python function so as to generate Triton-IR on-the-fly using a common SSA construction algorithm. The resulting IR code is then simplified, optimized and automatically parallelized by our compiler backend, before being converted into high-quality LLVM-IRand eventually PTXfor execution on recent NVIDIA GPUs. CPUs and AMD GPUs are not supported at the moment, but we welcome community contributions aimed at addressing this limitation.
We have found that the use of blocked program representations via Triton-IR allows our compiler to automatically perform a wide variety of important program optimizations. For example, data can be automatically stashed to shared memory by looking at the operands of computationally intensive block-level operations (e.g., tl.dot)and allocated/synchronized using standard liveness analysis techniques.
The Triton compiler allocates shared memory by analyzing the live range of block variables used in computationally intensive operations.
On the other hand, Triton programs can be efficiently and automatically parallelized both (1) across SMs by executing different kernel instances concurrently, and (2) within SMs by analyzing the iteration space of each block-level operation and partitioning it adequately across different SIMD units, as shown below.
Element-wise
FP16 matrix multiplication
Vectorized
Tensorized
SM
GPU
Element-wise
FP16 matrix mult.multiplication
Vectorized
Tensorized
SM
GPU
Automatic parallelization in Triton. Each block-level operation defines a blocked iteration space that is automatically parallelized to make use of the resources available on a Streaming Multiprocessor (SM).
We intend for Triton to become a community-driven project. Feel free to fork our repository on GitHub!
If youre interested in joining our team and working on Triton & GPU kernels, were hiring!
Go here to see the original:
Introducing Triton: Open-Source GPU Programming for Neural Networks
- Calls to Ban Open Source are Misguided and Dangerous - The New Stack - June 26th, 2024
- Delving the Risks and Rewards of the Open-Source Ecosystem - InformationWeek - June 26th, 2024
- Enhancing security through collaboration with the open-source community - Help Net Security - June 18th, 2024
- It's time to face the open source security problem - ITPro - June 18th, 2024
- Mistral AI just launched 'Codestral', its own competitor to Code Llama and GitHub Copilot and it's fluent in over 80 ... - ITPro - June 2nd, 2024
- Open-source cybersecurity could derail the internet as we know it - Quartz - May 15th, 2024
- Developer Experience Influenced by Open Source Culture - InfoQ.com - May 15th, 2024
- BLint: Open-source tool to check the security properties of your executables - Help Net Security - May 15th, 2024
- Modular Open-Sources Mojo: The Programming Language that Turns Python into a Beast - MarkTechPost - April 2nd, 2024
- Meet the 21-Year-Old Creator of Devika, the Indian Open Source Devin Alternative - Analytics India Magazine - April 2nd, 2024
- Is Open Source Under Threat or Primed to Go to the Next Level? - The New Stack - March 13th, 2024
- Where is Technology Headed in 2024? - Open Source For You - March 13th, 2024
- A Detailed Conversation on Open-Source AI Frameworks for MLOps Workflows and Projects - AiThority - March 5th, 2024
- Everything you need to know about GitHub's new push protection changes - ITPro - March 5th, 2024
- StarCoder 2 is a code-generating AI that runs on most GPUs - TechCrunch - March 5th, 2024
- Is the future of open source software at risk due to protestware? - Tech Xplore - February 25th, 2024
- Google unveils new family of open-source AI models called Gemma to take on Meta and othersdeciding open-source AI aint so bad after all - Fortune - February 25th, 2024
- Jim Zemlin and the Linux Foundation share not-so-secret open-source sauce - ZDNet - February 25th, 2024
- Open source vs closed source AI: What's the difference and why does it matter? - Euronews - February 25th, 2024
- Biden administration to debate whether all AI systems should be open-source or closed - Firstpost - February 25th, 2024
- Some Linkerd service mesh users will soon have to pay - TechTarget - February 25th, 2024
- A lone developer just open sourced a tool that could bring an end to Nvidia's AI hegemony AMD financed it for ... - TechRadar - February 25th, 2024
- Scoping Out the Software-Defined Vehicle: The Benefits of OTA Updates & Open Source - Embedded Computing Design - February 25th, 2024
- The importance and limitations of open source AI models - TechTarget - February 9th, 2024
- 15+ Popular Python IDEs in 2024: Choosing The Best One - Simplilearn - February 9th, 2024
- Balancing Innovation and Security: The Open-Source Conundrum - BNN Breaking - February 9th, 2024
- VCs and startups love open-source AI models but how will they make money? - Sifted - February 9th, 2024
- How better and cheaper software could save millions of dollars while improving Canada's health-care system - The Conversation Indonesia - February 9th, 2024
- Best of 2023: Are We Witnessing the End of Open Source? - DevOps.com - December 28th, 2023
- What comes after open source? Bruce Perens is working on it - The Register - December 28th, 2023
- 200 GB of GTA 5 source code is about to get leaked, making it an open source: Report - Sportskeeda - December 28th, 2023
- Never was so much owed by so many to so few - a look at the unheralded heroes of the open source world - TechRadar - December 28th, 2023
- Rockstar hit with another cyberattack, leaked GTA 5 source code reveal cancelled DLC plans - Times of India - December 28th, 2023
- What is open source software? - Android Police - December 20th, 2023
- Feds Warn Health Sector to Watch for Open-Source Threats - BankInfoSecurity.com - December 11th, 2023
- OpenTofu: Open-source alternative to Terraform - Help Net Security - December 11th, 2023
- AWS exec: 'Our understanding of open source has started to change' - The Register - December 11th, 2023
- Mark Jelic Rings in 40 Years Since the TEC-1 Launch with a New, Open Source, Upgraded TEC-1G SBC - Hackster.io - December 11th, 2023
- AI's future could be 'open-source' or closed. Tech giants are divided as they lobby regulators - Tech Xplore - December 11th, 2023
- Cyber Security Today, Nov. 24, 2023 A warning to tighten security on Kubernetes containers, and more - IT World Canada - November 25th, 2023
- This AI Paper Proposes ML-BENCH: A Novel Artificial Intelligence Approach Developed to Assess the Effectiveness of LLMs in Leveraging Existing... - November 25th, 2023
- Generative AI is a genuine breakthrough unlike most fads in tech: Zerodha CTO Kailash Nadh on the current waves in tech - The Hindu - October 27th, 2023
- Meet RedPajama: An AI Project to Create Fully Open-Source Large Language Models Beginning with the Release of a 1.2 Trillion Token Dataset -... - April 25th, 2023
- Hashtag Trending Apr.24th- Cybersecurity workers burnout; Code generated by ChatGPT and Googles Bard not very secure; Execs would want a robot to make... - April 25th, 2023
- This AI Project Brings Doodles to Life with Animation and Releases Annotated Dataset of Amateur Drawings - MarkTechPost - April 17th, 2023
- EU shares best practices with Ukrainian law enforcers on Open Source Intelligence and Criminal Analysis to - EIN News - April 8th, 2023
- 'I've never seen anything like this:' One of China's most popular apps has the ability to spy on its users, say experts - CNN - April 8th, 2023
- With Just ~20 Lines of Python Code, You can Do Retrieval Augmented GPT Based QA Using This Open Source Repository Called PrimeQA - MarkTechPost - March 5th, 2023
- Daily Crunch: Hundreds of Salesforce workers laid off in January just discovered they were out of work today - TechCrunch - February 7th, 2023
- Unlocking the power of Open AI: how to automate information extraction - The Hindu - February 7th, 2023
- Is composable business most essential technology trend to meet challenges of 2023 and beyond? - ComputerWeekly.com - January 30th, 2023
- Open Definition & Meaning | Dictionary.com - January 22nd, 2023
- 529 Synonyms & Antonyms of OPEN - Merriam-Webster - January 22nd, 2023
- Open Definition & Meaning - Merriam-Webster - January 22nd, 2023
- Can Wazuh Become The Worlds Largest Open Source Cybersecurity Platform And IPO Without VC Funding? - Forbes - January 6th, 2023
- 8 Free/Open Source Code Review Tools for 2022 - SoftwareSuggest - December 28th, 2022
- Finding the next Log4j OpenSSFs Brian Behlendorf on pivoting to a risk-centred view of open source development - The Daily Swig - December 28th, 2022
- Nithin Kamath says FOSS is the 'pillar' on which Zerodha has been built. What is it? - Business Today - December 28th, 2022
- How Dogeliens Will Take Over the Metaverse Like Bitcoin and Stellar Took Over the Crypto World. - newsbtc.com - December 28th, 2022
- Intrinsic Buys Open Robotics' Commercial Arm, But Leaves ROS and Gazebo with the Foundation - Hackster.io - December 20th, 2022
- Open-source code is everywhere; GitHub expands security tools to help ... - December 20th, 2022
- Security Of Enterprise Code: What Companies Using Open-Source Software Should Know About Binary Code Verification - Forbes - December 20th, 2022
- Open Source - Apple Developer - December 12th, 2022
- Your Code of Conduct | Open Source Guides - December 12th, 2022
- Code of Conduct | Meta Open Source - Facebook - December 12th, 2022
- From the creator of Homebrew, Tea raises $8.9M to build a protocol that helps open source developers get paid - TechCrunch - December 12th, 2022
- Consortium of Japan partners successfully promote domestic production and cost reduction for 5G core technology, the basis for next-generation... - November 25th, 2022
- GitHub Vulnerability Allows Hackers to Hijack Thousands of Popular Open-Source Packages - CPO Magazine - November 17th, 2022
- GitHubs Octoverse report finds 97% of apps use open source software - VentureBeat - November 17th, 2022
- Microsoft sued for open-source piracy through GitHub Copilot - BleepingComputer - November 7th, 2022
- The White House Memorandum on Securing the Software Supply Chain: What It Means for Your Organization - Security Boulevard - November 7th, 2022
- First Timers Only - Get involved in Open Source and commit code to your ... - October 23rd, 2022
- List of free and open-source software packages - Wikipedia - October 23rd, 2022
- What is open source? - Red Hat - October 23rd, 2022
- Comparison of open-source and closed-source software - October 23rd, 2022
- Java 19 Brings New Patterns to Open Source Programming Language - October 23rd, 2022
- API series - OctoML: ML APIs need to take a lesson from their ancestors - ComputerWeekly.com - October 23rd, 2022
- Benefits of working with open source data quality solutions - TechRepublic - October 15th, 2022
- Microsoft's GitHub Copilot AI is making rapid progress. Here's how its human leader thinks about it - CNBC - October 15th, 2022
- NocoDB takes on Airtable with open source no-code platform that connects to production databases - TechCrunch - October 15th, 2022