You might have heard about DALL•E (a combination of Dali and WALL•E) sometime in the past year or so, a machine-learning AI system by a research team called OpenAI which can take a natural language prompt like “A corgi made of jello dancing on top of a ball” and produce an image based on it. It can mimic a huge range of styles from realistic photographs, to digital art, to hand-drawn sketches or paintings, and has produced some remarkable output. It got some coverage back around June when the team behind it started opening up access to a few people, who started sharing the images they were using it to create. I happened across the form to request access, and, intrigued, signed up, not expecting for a moment that I'd get it without entering a Twitter, Instagram, or LinkedIn profile. Near the end of July I saw an article that OpenAI had opened DALL•E up to a million people, and imagine my surprise when I got an email notifying me I'd been selected.
“A corgi made of jello dancing on top of a ball, realistic photo”, by DALL•E. |
Above you can see one of the images I created with DALL•E. I was trying to explain it to someone as, “That new AI system where you put in a crazy phrase [at which point I tried to come with the craziest phrase I could think of] like, ‘A corgi made of jello dancing on top of a ball,’ and it spits out an image of it.” And then, well, I had to actually try it and see what I got. (DALL•E actually produces four variations every time you give it a prompt, which will look generally similar but have some differences in style, allowing you to choose the one you like best.)
“An Impressionistic painting of the Gemini Observatory on the summit of Mauna Kea in the style of Vincent Van Gogh”, by DALL•E. |
It certainly has some quirks. While the site encourages you to be descriptive and specific in the prompt you give, it's hardly perfect at interpreting what you mean. In the image above I asked for two very specific things (the Gemini Observatory and Maunakea), and while it's done a decent job of an image (I rather like the composition of this one) it's inexplicably given the observatory two domes instead of the single one it has in reality. (DALL•E was weirdly fixated on the notion that Gemini = two domes, as I tried several variations on this prompt and pretty much all of them had that feature.)
“A delicious-looking hamburger in the shape of a Rubik's cube, professional food photography”, by DALL•E. |
One funny article I read recently had the author playing with phrases like “X in the shape of Y” in regards to food, which led me to try the prompt above. (Which, side note, looks scruptious and I would totally eat.) I find, in general, that DALL•E works best if you give it a fairly specific prompt about generic objects, though you can certainly include phrases like “in the style of X artist.”
There's been some hand-wringing online about whether DALL•E might lead to the death of various creative professions in the visual arts, like concept artists. Having played with it, I'm not too worried. Oh, sure, as it rolls out to wider use there'll probably be some changes, in the same way powerful new tools have always produced changes. There will probably be a lot of small jobs that might've been done by hand before that DALL•E will replace (stock photography, in particular, is something that DALL•E could fill in for quite well I think). But as my experiments with it showed, there are things I can see in my mind's eye which I can't figure out a prompt to produce with DALL•E, and if I want to show them to the world I still need to pick up a paintbrush or break out Blender or something to that effect. Sometimes something DALL•E puts out sparks something in my imagination, which is neat; I'm actually half-tempted to take my paints to the summit of Maunakea and attempt an Impressionist painting of the Gemini Observatory myself now. Ultimately we're just on the cusp of AI-generated images from natural language prompts (there are several other models around pursuing similar things), and we'll just have to wait and see where it takes us. I read about someone using DALL•E to produce a logo for a program they wrote, which I thought was a neat use of it.
For now, I'll keep throwing in the occasional crazy prompt I think of. While writing this post I had the thought that perhaps it would be useful for generating images for text-heavy posts where I don't have a photo or something else to break it up. I often find it surprising how well DALL•E can handle fairly abstract or abstruse concepts. We'll see how it goes, but you might start seeing more DALL•E images around here in the future. A hui hou!
No comments:
Post a Comment
Think I said something interesting or insightful? Let me know what you thought! Or even just drop in and say "hi" once in a while - I always enjoy reading comments.