Thoughts on DALL•E 2 #1
Last week, I tried to move our coffee table (with a few heavy books still on it) to its correct place in the living room. The table is no longer in the wrong place. Unfortunately, now my lower back is. I’ve therefore been at home. A lot.
Fortunately, I also got my DALL•E invite that week which gave me something to fill my time with.
For those of you reading this post who might not have heard about it:
DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.
The Open AI website makes it really simple to use it. You enter your description or “prompt” in a text box, click the generate button, wait for a few seconds and you’ll be presented with four square (1024px by 1024px) images to choose from.
The first prompt I issued to DALL•E was1
“Astronauts playing cricket on moon”
I got these four pictures back:
Three were cartoonish digital art (one had even been titled as if by a first grader - cucket cinn - cricket scene?), while the last one looked sufficiently photo realistic.
Playing cricket professionally requires you to wear a lot of gear. I liked how DALL•E tried to give the players cricket batting pads under their spacesuits. It also retained the original spacesuit helmet - a cricket helmet alone on moon would be very problematic indeed. The bat in the hands of the player in the front looks something futuristic while the bat in the hands of the player behind is more recognisable. Although its handle is either weirdly long (and broken) or the player is holding a bail in his other hand. The face of the player behind is a distorted smudge. The ball looks like a worn white one - so that settles the question of the game’s format on the moon. There is only a single stump behind the main batsman. There is also a weird piece of debris between the two batsmen - random glitch or is it one of those white discs that fast bowlers use to mark their run up? I guess we’ll never know.
What race or skin tone should the AI have given to these depictions of humans? To say nothing of their gender. Did DALL•E try to cop out of having to settle this question by generating generic illustrations of humans in spacesuits as the first three choices?2.
My attempts at getting more specific with this prompt didn’t get me very far:
Astronauts playing cricket on moon with three slips and a gully and earth rising in the background
Again, for the astronaut cricket theme, DALL•E leans towards generating images of low fidelity drawings.
DALL•E makes it really simple to generate variations based on the version you like. Generating variations of the third, somewhat photorealistic image gave results that looked rather disturbing:
For some reason the cricket bats seem to take on impossibly long handles - to the point that they become oars. And cricketers themselves seem to take on tortured, contorted forms with nightmarish faces and missing or extra limbs (or parts thereof). In one variant the astro-batter even seems to be attempting to commit Seppuku with their bat.
Staying on with the cricket theme I tried something closer home:
Sachin Tendulkar playing a pull shot while riding an elephant in Mumbai
Open AI does not allow you to generate faces of celebrities due to the sheer scope of misuse and legal complications that would entail. So no, I wasn’t expecting to see Sachin Tendulkar but some vague likeness of his.
I see Sreesanth in one if I squint really hard but not Sachin.
And I knew from the past experiment that DALL•E has, let’s call it, a serious cricket bat problem, but here it was shockingly inept. Long poles these are, cricket bats these are not. Notice that I did not mention the word “cricket” in my prompt. It still seem to have made that connection somewhat - as evidenced by those white balls flying around and the blue of the Indian cricket jersey.
The elephants look real, even though 6 out of the 8 variations I generated were missing tusks. There is even a reddish stone wall in the background of one of the variants, the sort you are likely to come across at a temple in South India. The AI seem to have made a connection with the likely surrounding of elephants in India (or at least the surrounding that is likely to dominate the pictures in its training set):
The aesthetic of these pictures is definitely redolent of the Subcontinent - there is a small crowd of people in the background in most of them - as there undoubtedly would be were Sachin to venture to do the thing my prompt says. The sunlight is harsh. The colours of clothes is what I’d expect in a random sampling of people from India. And of course, given the setting of our scene, this time the AI seems to have suffered no predicament about the colour of people’s skin. That said, nothing about these images makes me think that are set in Mumbai.
It might be tempting to dismiss DALL•E based on the quality of these results. There is more to DALL•E than generating photorealistic images of cricket in impossibly absurd settings.
I’ll be sharing more examples in the coming days.
Why astronauts? There are a lot of pictures of the sort “astronaut(s) doing something” on DALL•E’s homepage. And this is perhaps what anchored my first prompt. Why cricket? I really don’t know - I guess I was going for something that I’d strongly identify with.↩︎
The reality with AI is often more anodyne - it merely reflects the images the model was trained on.↩︎