This week, Google unveiled Gemini, which already looks like a scarily smart rival to OpenAI’s GPT-4.
Gemini consists of three different models that vary in size and capability. Its most advanced model, Gemini Ultra — which is not available to the public yet, but Google says is designed for “highly complex tasks” — outsmarts GPT-4 in several areas, from knowledge of subjects like history and law to generating code in Python to tasks that require multi-step reasoning, Google said in its announcement.
Google said that Gemini outperformed GPT-4 on the Massive Multitask Language Understanding test, or MMLU, which is one of the most popular methods to gauge the knowledge and problem-solving skills of AI models.
You could compare it to the “SATs for AI models,” Kevin Roose said on The New York Times tech podcast Hard Fork. The MMLU, however, is a bit more advanced than a typical college prep exam. It covers 57 subjects, including math, physics, history, law, medicine, and ethics, to test for both world knowledge and problem-solving abilities, according to Google’s announcement.
Gemini Ultra scored 90% on the MMLU, while GPT-4 scored 86.4%, according to Google.
But Gemini Ultra’s more impressive feat might be that it’s also the first model to outperform human experts on the MMLU. Human experts scored about 89.8%, Google said in a technical report on Gemini.
“I think if you went back even two or three years and told AI researchers that Google will have a model that gets a 90 percent on the MMLU, that is better than the sort of benchmark threshold for human experts, they would have said, well, that’s AGI,” Roose said. AGI, or artificial general intelligence, is a hypothetical form of artificial intelligence that can process complex human capabilities like common sense and consciousness.
GPT-4 did beat out Gemini Ultra by several percentage points in an evaluation of common sense reasoning abilities for everyday tasks, according to Google.
But one advantage Google says that Gemini has over other models is that it’s natively multimodal, which means it was designed from the ground up to process several types of data, from text to audio to code to images and video. Other multimodal models were created by “stitching together” text-only, vision-only, and audio-only models in a “suboptimal way,” Oriol Vinyals, the vice president of Research for Google’s DeepMind, said in a video announcing Gemini.
As a result, Google says that Gemini’s design allows it to understand inputs better than existing multimodal models. Researchers behind the SemiAnalysis blog also say Gemini will likely “smash” GPT-4 out of sheer computing power.
While Gemini Ultra has certainly set high expectations for its arrival, the jury is still out on how the trio of Gemini models will fare against OpenAI, which already has an advantage in consumer awareness.
Early reactions to the less advanced Gemini Pro, which is accessible through Google’s chatbot Bard, have been positive. However, the model has also had issues with accuracy and hallucinations. It has even told people to resort to Google for answers to controversial questions.
Google and OpenAI did not respond to a request for comment from Business Insider.