Monday, 14 November, 2022 UTC


Summary

In a world of Stories and Novels,
She just started to learn the Alphabet.
Deep learning is a subset of machine learning where artificial neural networks, algorithms inspired by the brain, learn from large amounts of data. It has been used to achieve state-of-the-art results in many fields such as computer vision, natural language processing, and robotics.
DeepMind is a leading artificial intelligence company. It was founded in 2010 and acquired by Google in 2014. DeepMind is best known for creating the artificial intelligence program AlphaGo, which defeated a professional human Go player in 2016.
Credits: Aurich Lawson/Getty Images
AlphaTensor(October 5, 2022) is an AI model based on AlphaZero which is tasked with discovering algorithms to solve arbitrary matrix multiplication problems.
But why is Matrix Multiplication so important?

Matrix Multiplication and AI

“During the Islamic Golden Age, Persian mathematician Muhammad ibn Musa al-Khwarizmi designed new algorithms to solve linear and quadratic equations. In fact, al-Khwarizmi’s name, translated into Latin as Algoritmi, led to the term algorithm. But, despite the familiarity with algorithms today — used throughout society from classroom algebra to cutting-edge scientific research — the process of discovering new algorithms is incredibly difficult, and an example of the amazing reasoning abilities of the human mind.”
— Quoting from DeepMind PR
Matrix Multiplication
Matrix multiplication is a key operation in many numerical algorithms, especially in deep learning. It is an operation on two matrices that produces a third matrix as a result. The matrix product of two matrices A and B is denoted as AB. If A is an n×n matrix and B is an m×m matrix, then AB is an m×n matrix.
We know this much but let us simply understands the Cost, Time, and Accuracy of Models dependent on Matrix multiplication.
Let us consider two 2×2 Matrix A and B with any simple Values (Matrix of Ones)
Now let's try to calculate the Multiplication of two matrices (A×B).
This will result in
Now, let's count how many multiplications were needed to calculate the final values. It took 8 multiplications and 4 addition operations. What if increasing the Dimension (the number of rows by the number of columns) of the Matrix?
For two 3×3 Matrices, there will be a total of 27 Multiplication Operations.
For two 4×4 Matrices, there will be a total of 64 Multiplication Operations.
So, we can see as the dimensions of Matrices increase the number of multiplication operations increases. This way of multiplying two N×N matrices together requires N^3 multiplications along the way.

How data is Fed to Neural Network?

Data is fed to Neural Network for calculations in a variety of ways, depending on the type and the specific application. For example, image data can be fed into a CNN for image classification by first converting the image data into an embedding vector. This can be done using a pre-trained convolutional neural network (CNN) that encodes the image data into a low-dimensional vector (Matrices). The CNN can be trained on a large dataset of images and then used to generate embeddings for new images. The transformer can then be trained on the CNN embeddings to learn to classify images.
Text data can be fed into a transformer for text classification in a similar way, by first converting the text data into an embedding vector. This can be done using a pre-trained word embedding model, such as Word2Vec or GloVe. The word embedding model can be trained on a large corpus of text and then used to generate embeddings for new text data. The transformer can then be trained on the word embeddings to learn to classify text.
In general, any type of data can be fed into a Neural Network for calculations by first converting the data into a Vector or Tensor. The specific method for doing this will depend on the type of data and the application.
How Matrix Multiplication is relevant to Neural Networks?
The converted embedding vectors are nothing but N-dimension Tensors or Matrices. All types of Neural network use Matrix multiplications on every step like Feed Forward, Backpropagation, Weight Updating, etc.
If you are new to Deep Learning, please read my Blog Series on Deep Learning. The link to the Introduction is below:
Why Attention is all you need: Introduction

Use of Matrix Multiplication in AI

Matrix multiplication is a key operation in deep learning. It is used in many ways, including:
- Forward propagation in neural networks: In a neural network, matrix multiplication is used to compute the output of each layer from the input.
- Backpropagation: Backpropagation is an algorithm used to calculate the gradient of a loss function with respect to the weights of a neural network. Matrix multiplication is used in the backpropagation algorithm to compute the gradients of the loss function with respect to the weights.
- Optimization: Matrix multiplication is used in many optimization algorithms, such as gradient descent, to update the weights of a neural network.
- Convolution: Convolution is a mathematical operation that is used in many deep learning applications, such as image processing and computer vision. Convolution is typically performed using matrix multiplication.
- Recurrent neural networks: Recurrent neural networks are a type of neural network that can process sequences of data. Matrix multiplication is used in recurrent neural networks to update the hidden state of the network at each time step.
Matrix multiplication is a fundamental operation in deep learning. It is used in many different algorithms and applications. Understanding matrix multiplication is essential for understanding deep learning.

DeepMind AlphaTensors?

Now that we have established that Matix Multiplications are an important part of AI. We need to understand why DeepMind AlphaTensor is so a breakthrough discovery.
In the Research Paper, discovering faster matrix multiplication algorithms with reinforcement learning, DeepMind introduced a new AI system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices.
Credits: DeepMind

Automated Algorithmic Discovery

AlphaTensor agent using reinforcement learning to play the game, starting without any knowledge about existing matrix multiplication algorithms. Through learning, AlphaTensor gradually improves over time, re-discovering historical fast matrix multiplication algorithms such as Strassen’s, eventually surpassing the realm of human intuition and discovering algorithms faster than previously known.
Credits: DeepMind
For example, if the traditional algorithm taught in school multiplies a 4x5 by 5x5 matrix using 100 multiplications, and this number was reduced to 80 with human ingenuity (Strassen’s algorithm), AlphaTensor has found algorithms that do the same operation using just 76 multiplications.
Credits: Author
The research paper also published multiple results based on known Rank and Discovered rank for multiple Tensor configurations(recipe) and again AlphaTensors have either been the same or improved the previously known methods. Note that these results are marked where the size of Tensor(Matix) n, m, p is less than 12.
Credits: DeepMind Paper
Way more than just Multiplication
AlphaTensor finds efficient matrix multiplication algorithms, tailored to specific hardware, with zero prior hardware knowledge. To do so, we modify the reward of AlphaTensor: they provided an additional reward at the terminal state (after the agent found a correct algorithm) equal to the negative of the runtime of the algorithm when benchmarked on the target hardware.
This is the result showing time saved by AlphaTensor on two different GPU setups between Strassen’s algorithm and AlphaTensor’s Algorithm on varying Matrix Sizes. The Higher the percentage, the more faster or optimized the result is. We can clearly see AlphaTensor is a clear winner.
Credits: DeepMind Paper
This means that AlphaTensor can find the different optimal ways to multiply two matrices on different GPUs that too when it doesn't know at the start which GPU (Hardware) it is running on.

Future of AI

Tensors can represent any bilinear operation, such as structured matrix multiplication, polynomial multiplication, or more custom bilinear operations used in machine learning. Alpha Tensor can be applied to custom bilinear operations and yield efficient algorithms leveraging the problem structure.
Because matrix multiplication is a core component in many computational tasks, spanning computer graphics, digital communications, neural network training, and scientific computing, AlphaTensor-discovered algorithms could make computations in these fields significantly more efficient.

References

  1. Discovering faster matrix multiplication algorithms with reinforcement learning
  2. Discovering novel algorithms with AlphaTensor
  3. Research Paper
  4. GitHub
https://arvrjourney.com/
Breakthrough Discovery of Decade: AlphaTensor was originally published in AR/VR Journey: Augmented & Virtual Reality Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.