Thoughts on DALL·E 3
In less than a year, the quality of the images generated by DALL·E has improved dramatically. In Oct’22 I had given DALL·E 2 some unusual cricket related prompts. I revisited them to compare the results against DALL·E 3.
My first prompt:
Astronauts playing cricket on moon
Resulted in a set of these four images:
While DALL·E 2 had struggled to render the humans, the wickets and the bats accurately, DALL·E 3 had no problems with them. Sure, there are still a lot other issues with these images - multiple moons or earths in the background, missing helmets, beard jutting out of at least one astronaut’s visor and in one case, batsmen standing across the width of the crease rather than its length. But still, I couldn’t help but be impressed by how far we have come in a matter of months. DALL·E 2 had seemed quite revolutionary back then but relative to DALL·E 3, its output looks like grotesque smudges.
One of the questions I had raised in my original post…
What race or skin tone should the AI have given to these depictions of humans? To say nothing of their gender. Did DALL·E try to cop out of having to settle this question by generating generic illustrations of humans in spacesuits as the first three choices? The reality with AI is often more anodyne - it merely reflects the images the model was trained on.
…seems to have been addressed.
This prompt covers two human endeavours that are not very diverse individually, let alone their intersection. DALL·E 3 still did a fine job of producing pictures that were very inclusive. It does so by rewriting your prompts before feeding them into the image generation pipeline. In doing so, it adds a bit more detail and variation. My succinct five-word prompt was rewritten four times with specifics of gender and race:
Photo of a female astronaut of Asian descent and a male astronaut of African descent playing cricket on the moon’s surface, with Earth in the background.
Illustration of two astronauts, one of Hispanic descent and one of Caucasian descent, engaged in a game of cricket on the moon. The moon’s craters and the pitch are clearly visible.
Rendered image of a male astronaut of Middle Eastern descent bowling to a female astronaut of European descent on the moon. The cricket bat, ball, and wickets are clearly seen.
Photo of three astronauts, one male of Indian descent, one female of Indigenous descent, and another male of mixed race, playing cricket on the moon with the stars shining brightly above.
My second prompt…
Astronauts playing cricket on moon with three slips and a gully and earth rising in the background
…did not result in images of astronauts holding impossibly long bats bordering on polo sticks. Still, as impressive as the output is visually, it is a long way away from the cricket field setting the prompt describes. In the last image, a white cricket ball is seen lying in a crater - one wonders if they just chose a terrible spot for the pitch or if it was made by a bowler pitching with superhuman strength.
My final prompt gave results that also looked quite plausible:
Sachin Tendulkar playing a pull shot while riding an elephant in Mumbai
While I woundn’t mistake any of the players as Sachin Tendulkar, they definitely look like cricket players and not mahauts on tuskless elephants with polo sticks that DALL·E 2 had generated in 2022. More importantly, the backgrounds in all four images are quite evocative of Mumbai. Especially the building in the background of the first image - it looks like the facade of the iconic Taj Mahal Palace hotel.
And once you know that DALL·E rewrites yours prompts you can also ask it not to. You get four1 identical images but they adhere much more closely to your prompt’s intent. In my case, I finally got someone that might pass for Sachin Tendulakar from a distance.
At this rate, you could expect another post next year where I wonder what exactly had impressed me about these pictures by DALL·E 3.
Four images was the default images DALL·E 3 would render in response to a prompt when it had launched last year. Within a few weeks of the launch, they seem to have been reduced it to two - presumably to save GPU resources in the face of massive demand?↩︎