Grok-2 brings generations of images: is the world ready?

Grok-2 brings generations of images: is the world ready?


Sign up for our daily and weekly newsletters to stay up to date with the latest updates and exclusive content on industry-leading AI coverage. More information


As expected based on updates and new settings in the mobile app for Elon Musk's social network X, a new large language model (LLM) called Grok-2 from Musk's sister company xAI landed last night — and it's a blast.

Integrated into X itself and available through the Premium ($7 USD/month) and Premium+ ($14/month ad-free) subscription levels, Grok-2 is available, fittingly enough, in two model sizes: Grok-2 and Grok-2 mini. Grok-2 offers state-of-the-art performance across a wide range of tasks, including chat, coding, reasoning, and vision-based applications, while Grok-2 mini is a smaller, faster version optimized for efficiency, suitable for simpler text-based prompts that require faster responses.

Grok-2 not only has image generation capabilities based on a partnership with Black Forest Laboratories and are new and surprisingly photorealistic open-source diffusion AI model Flux.1but it also surprisingly outperforms the AI ​​models of leading rivals including OpenAI (GPT-4o) and Anthropic (Claude 3.5 Sonnet) and even Google (Gemini Pro 1.5) in leading third-party benchmark tests.

A new, surprising leader in multiple benchmarks

Promotional screenshot of a graph comparing the performance of Grok-2 mini and Grok-2 to other leading frontier LLMs from rival companies. Credit: xAI

Notably, Grok-2 and Grok-2 mini outperform all other models in the GPQA, MMLU, MMLU-Pro, MATH, HumanEval, MMMU, MathVista, and DocVQA benchmarks.

Even the lmsys chatbot arena, where many companies secretly test their AI models under alternate names prior to release (including xAI, where Grok-2 was initially called “sus-column-r”), congratulated xAI on the milestone.

As AI influencer and University of Pennsylvania Wharton School of Business professor Ethan Mollick noted on X, “There are now five GPT-4 class models: GPT-4o, Claude 3.5, Gemini 1.5, Llama 3.1, and now Grok 2.”

Musk congratulated his “hardworking xAI team!” on the eponymous social network.

Image generations steal the show

While Grok-2 performs best on all the different benchmarks related to math, writing, coding and other tasks, the main feature that has attracted the most attention from the start is its integration with Black Forest Labs' Flux.1 image generation model.

Before the release of Grok-2, Flux.1 had already caused a stir in AI and AI art circles, particularly in recent weeks as people discovered they could achieve incredibly photorealistic generation with the open source model, enough to mimic familiar situations like a speaker at a TED talkand adjust the model using low rank adaptation (LoRA) to generate their own similarity in different situations.

Now that a version of Flux.1 has been integrated directly into Grok-2, in the same way that OpenAI integrated its DALL-E 3 image generation model directly into ChatGPT, allowing users to simply type text messages into the chatbot and ask it to generate images on command, users are testing this capability in Grok-2 and finding it to be remarkably permissive: it is generating controversial, compromising images, even of public figures like US presidential candidates Kamala Harris and Donald Trump.

Other leading image generators, including Midjourney and DALL-E 3 and Microsoft Designer, have banned the generation of this type of content, particularly in the wake of the controversy earlier this year over Unauthorized Explicit Deepfakes of Popular Musician Taylor Swift (created by prompt engineering around the Designer constraints) — so it's notable that Grok-2 bucks that trend and allows for more freedom and potential risk. That's in keeping with Musk's stated “free speech” ethos for X, though.

Still, users are concerned about what this capability means for the spread of deepfakes and disinformation on the web.

As user @Omiron33 put it well: “Yes, we've had MJ and Flux, but this is the first one that makes it usable and fast. Advertising, propaganda, and everything good or bad that comes with it just happened (in my opinion the good outweighs the bad)”