Transformers

We have entered a golden era for natural language processing.

OpenAI’s release of GPT-3, the most powerful language model ever built, captivated the technology world this summer. It has set a new standard in NLP: it can write impressive poetry, generate functioning code, compose thoughtful business memos, write articles about itself, and so much more.

GPT-3 is just the latest (and largest) in a string of similarly architected NLP models—Google’s BERT, OpenAI’s GPT-2, Facebook’s RoBERTa and others—that are redefining what is possible in NLP.

The key technology breakthrough underlying this revolution in language AI is the Transformer.

Transformers were introduced in a landmark 2017 research paper. Previously, state-of-the-art NLP methods had all been based on recurrent neural networks (e.g., LSTMs). By definition, recurrent neural networks process data sequentially—that is, one word at a time, in the order that the words appear.

Transformers’ great innovation is to make language processing parallelized: all the tokens in a given body of text are analyzed at the same time rather than in sequence. In order to support this parallelization, Transformers rely heavily on an AI mechanism known as attention. Attention enables a model to consider the relationships between words regardless of how far apart they are and to determine which words and phrases in a passage are most important to “pay attention to.”

Why is parallelization so valuable? Because it makes Transformers vastly more computationally efficient than RNNs, meaning they can be trained on much larger datasets. GPT-3 was trained on roughly 500 billion words and consists of 175 billion parameters, dwarfing any RNN in existence.

Transformers have been associated almost exclusively with NLP to date, thanks to the success of models like GPT-3. But just this month, a groundbreaking new paper was released that successfully applies Transformers to computer vision. Many AI researchers believe this work could presage a new era in computer vision. (As well-known ML researcher Oriol Vinyals put it simply, “My take is: farewell convolutions.”)

While leading AI companies like Google and Facebook have begun to put Transformer-based models into production, most organizations remain in the early stages of productizing and commercializing this technology. OpenAI has announced plans to make GPT-3 commercially accessible via API, which could seed an entire ecosystem of startups building applications on top of it.

Expect Transformers to serve as the foundation for a whole new generation of AI capabilities in the years ahead, starting with natural language. As exciting as the past decade has been in the field of artificial intelligence, it may prove to be just a prelude to the decade ahead.

Neural Network Compression

Generative AI

“System 2” Reasoning