Artificial intelligence has become shockingly capable in the past year. The latest chatbots can conduct fluid conversations, craft poems, even write lines of computer code while the latest image-makers can create fake “photos” that are virtually indistinguishable from the real thing.
It wasn’t always this way. As recently as two years ago, AI created robotic text riddled with errors. Images were tiny, pixelated and lacked artistic appeal. The mere suggestion that AI might one day rival human capability and talent drew ridicule from academics.
A confluence of innovations has spurred growth. Breakthroughs in mathematical modeling, improvements in hardware and computing power, and the emergence of massive high-quality data sets have supercharged generative AI tools.
[Quiz: Did AI make this? Test your knowledge.]
While artificial intelligence is likely to improve even further, experts say the past two years have been uniquely fertile. Here’s how it all happened so fast.
AI-generated responses
How three AI systems responded to the following text prompt
A clever name for a play about dinosaurs is
2018
“to quote the most brilliant of examples : ‘who can say for sure they weren’t aliens or aliens or aliens? they had the appearance of aliens ; they wore black leather pants and they walked with much”
This model does not follow the sentence structure. (OpenAI GPT)
2019
“Dino-Brawl. Dinosaur-Brawl — which can be viewed in the player’s perspective — starts with Mr. D, a dino-hunter, chasing down a lone”
This model gave a name but followed it with a confusing sentence. (GPT-2)
2023
“Dino-Mite: The Mesozoic Extravaganza!”
This model used a pun for the name and provided a subtitle. (Chat-GPT)
A training transformation
Much of this recent growth stems from a new way of training AI, called the Transformers model. This method allows the technology to process large blocks of language quickly and to test the fluency of the outcome.
It originated in a 2017 Google study that quickly became one of the field’s most influential pieces of research.
To understand how the model works, consider a simple sentence: “The cat went to the litter box.”
Previously, artificial intelligence models would analyze the sentence sequentially, processing the word “the” before moving onto “cat” and so on. This took time, and the software would often forget its earlier learning as it read new sentences, said Mark Riedl, a professor of computing at Georgia Tech.
The transformers model immediately processes the relationships between words — a method called attention. New AI models can examine “cat” alongside “litter” and “box.”
To make sure the AI performs correctly, the transformers model builds in a testing step. It masks a word in the sentence to see if the AI can predict what’s missing. Additionally, companies such as OpenAI have humans rate the quality of the response. For example, if the word “cat” is masked and the computer offers “the dog went to the litter box,” it’s likely to get a thumbs down.
The model allows AI tools to ingest billions of sentences and quickly recognize patterns, resulting in more natural-sounding responses.
Another new training method, called diffusion, has also improved AI image generators such as Dall-E and Midjourney, allowing nearly anyone to create hyper-realistic photos with simple, even nonsensical, text prompts, such as: “Draw me a picture of a rabbit in outer space.”
Researchers feed these AI models billions of images, each paired with a text description, teaching the computer to identify relationships between images and words.
The diffusion method then layers “noise” — visual clutter that looks like TV static — over the images. The AI system learns to recognize the noise and subtract it until the image is once again clear.
[ AI can now create images out of thin air. See how it works.]
This process of corrupting and regenerating images teaches the AI to remove imperfections, fine tuning each response until it is crisp and sharp. It also learns the relationship between neighboring pixels, making the generated image more realistic.
AI-generated images
Images that three AI systems generated from the following prompt
A picture of a very clean living room
2016
This model generates an image so small the details are impossible to see. (Reed et al.)
2022
This model generates an image that resembles a living room, but the furniture and mirror are disfigured. (Stable Diffusion)
2023
This model generates an image with clean lines and reflections, but details like the coffee table’s legs are deformed. (Dall-E 2)
The rise of the supercomputer
These bigger, more complicated AI models require advanced computer systems. This hardware, too, has become much more powerful.
Graphics processing units, or “GPUs” — circuits commonly used for video games — offer immense computing power and excel at running tasks simultaneously, a key requirement for training artificial intelligence models.
Researchers can string thousands of GPUs together in super computers that operate in the cloud. This lets people access GPUs without building their own supercomputer — a system that makes high-powered computing cheaper and readily available to the masses.
Better quality data
Over the past several years, the data sets used to train AI have gotten bigger and better, too.
Previous data sets had lower quality content, Riedl said, like addresses or phone numbers, which aren’t helpful for learning natural speech.
Now, data sets include more news articles and book transcripts — content that shows how humans write and talk — which helps teach the AI to “speak” more fluidly.
As AI companies have trained models, they have gotten better at knowing how to create high-quality data, Riedl said. Entire organizations are now dedicated to building these data sets.
[Inside the secret list of websites that make AI like ChatGPT sound smart]
Scale AI, a San Francisco company, curates and labels training data sets for AI software. ImageNet, a data set started by Stanford University researchers, contains millions of labeled images. Common Crawl, a nonprofit that creates an open repository of web data, has developed text data sets from more than 12 years of internet scrapes. Some data sets have grown to contain billions of words.
More data sets are multimodal, meaning they combine images with text, allowing artificial intelligence systems to understand the relationship between the two. There also has been a rise in medical imaging data sets to fuel AI’s ability to spot diseases and ailments.
AI-generated responses
How three AI systems responded to the following text prompt
This is a haiku about baseball:
2018
“a man who’s going to lose, a man who doesn’t need to lose. he wants the people to win by doing something he wants, rather than because they’re being manipulated. they’re playing the game”
This model begins in a somewhat poetic place but quickly stops making sense. (OpenAI GPT)
2019
“Fantasy Baseball: In the late 1800s the Yankees and his friends had a tournament where pitcher Francisco Gohara came to San Francisco from Texas at the height of what was called the Great Depression.”
This model invents a sentence about a fake baseball player, seemingly ignoring the word “haiku.” (GPT-2)
2023
“In summer’s embrace,
Bats crack, balls soar through the air,
Baseball’s timeless grace.”
This model’s output follows the traditional 5-7-5 syllable style, though the first and third line rhyme. (Chat-GPT)
What’s next?
Experts say it’s hard to predict how much better AI will get. Major obstacles stand in the way of further development. These models are expensive to run and exact a staggering environmental toll. They confidently churn out wrong, nonsensical and sometimes biased answers, while creating lifelike images that could sow confusion.
As tech giants such as Google and Microsoft race to incorporate AI into their products, a slew of companies are trying to expand AI’s capabilities to generate video, music and create detection tools to screen artificially generated content. Most people are likely to interact with this new technology in the near future. But how useful it will be and what impact it will have on society remains to be seen.