User-friendly system can help developers build more efficient simulations and AI models
Neural network artificial intelligence models, used in applications like medical image processing and speech recognition, handle very complex data structures that need a lot of computation. This is why deep-learning models use so much energy.
To make AI models more efficient, MIT researchers developed an automated system that helps developers of deep learning algorithms use two types of data redundancy at the same time. This reduces the computation, bandwidth, and memory storage needed for machine learning tasks.
Existing methods for optimizing algorithms can be complicated and usually only let developers use either sparsity or symmetry—two types of redundancy in deep learning data structures.
By allowing developers to create an algorithm from scratch that uses both redundancies at once, the MIT researchers' method increased computation speed by nearly 30 times in some experiments.
Since the system uses a user-friendly programming language, it can optimize machine-learning algorithms for many applications. It could also assist scientists who aren't deep learning experts but want to improve the AI algorithms they use for data processing. Additionally, the system could be useful in scientific computing.
“For a long time, capturing these data redundancies required a lot of effort. Instead, a scientist can now tell our system what they want to compute in a more abstract way, without specifying exactly how to do it,” says Willow Ahrens, an MIT postdoc and co-author of a paper on the system, which will be presented at the International Symposium on Code Generation and Optimization.
She co-authored the paper with lead author Radha Patel ’23, SM ’24, and senior author Saman Amarasinghe, a professor in the Department of Electrical Engineering and Computer Science (EECS) and a principal researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL).
Cutting Out Computation
In machine learning, data are often represented and manipulated as multidimensional arrays known as tensors. A tensor is like a matrix, which is a rectangular array of values arranged in rows and columns. However, unlike a two-dimensional matrix, a tensor can have many dimensions, or axes, making them more complex to handle.
Deep-learning models perform operations on tensors using repeated matrix multiplication and addition. This process is how neural networks learn complex patterns in data. The large number of calculations needed for these multidimensional data structures requires a significant amount of computation and energy.
Because of the way data in tensors are arranged, engineers can often speed up a neural network by eliminating unnecessary calculations.
For example, if a tensor represents user review data from an e-commerce site, not every user reviews every product, so most values in that tensor are likely zero. This type of data redundancy is known as sparsity. A model can save time and computation by only storing and working with non-zero values.
Additionally, sometimes a tensor is symmetric, meaning the top half and bottom half of the data structure are the same. In this case, the model only needs to work on one half, reducing the amount of computation. This type of data redundancy is called symmetry.
“But when you try to use both of these optimizations, the situation becomes quite complex,” Ahrens says.
To make the process easier, she and her collaborators developed a new compiler, which is a computer program that translates complex code into a simpler language that a machine can process. Their compiler, called SySTeC, can optimize computations by automatically using both sparsity and symmetry in tensors.