The Computer turns to Art

Author: Amy Smith

Copyright Notice: This article is Copyright AI Factory Ltd. Ideas and code belonging to AI Factory may only be used with the direct written permission of AI Factory Ltd.

The world is used to having artificial intelligence solve tasks that have previously been tackled poorly by computers. Deep learning is a subset of AI algorithms that have learnt to, for example, beat even the most skilled humans at totemic games such as 'Go' - which is known for its difficulty. Other notoriously difficult and complex tasks, such as protein folding, have also been susceptible to this form of computation. However, this success may have wrong footed people when it comes to their expectations when it comes to computers delivering interpretation-driven artistic work. This does not look like the domain within which computers might be able to deliver...but recent new systems are doing this, and they are delivering some significant work. This is so new and unexpected that the world has not really had much chance to determine what this will mean, or how it can be used. However, it is very interesting and still seems extraordinary. This article takes a wide sweep at deep learning in its generative capacity to produce artistic media, and examines the generative Twitter Bot called: 'artbhot' that has been created by the author.


About the writer

I am currently a PhD student with the Intelligent Games and Game Intelligence (IGGI) program based at Queen Mary University London. My project is based around the intersection of visual art and generative deep learning, specifically the affordances of 'text-to-image' generative artificial intelligence architectures. I have a background in Fine Art, and work predominantly in the field of Computational Creativity, with an interest in automating creative interpretation, and exploring what it means for this technology to be placed amongst social media.

Introduction: 'Text-to-image'

It's difficult to deny that generative artificial intelligence algorithms are in the midst of a cultural renaissance. These generative algorithms generally produce 'something' as an output (hence the term 'generative'). This output can be text, image, or other forms of media (such as animations). It is important to note here that the output should always be novel - it won't have been seen before. Algorithms that produce images in particular have recently found their way into mainstream discourse, and this is mostly thanks to the likes of 'DALL.E mini' by Boris Dayma (based on the OpenAI DALL.E 2 algorithm), Midjourney, Wombo Dream and other publicly accessible applications using this technology. These apps harness the power of a particular type of generative deep learning model called: 'text-to-image'. As the name might suggest, the model is able to take a user-given text phrase and produce an image based on that text, that illustrates the given concept. For example, a user might ask for: "a photograph of the blueprints to a time machine from the year 2066" and the model will do its best to produce an image that appropriately illustrates that concept. The text phrase is usually entered into a 'google search' style text box, and once generated, the images are typically loaded beneath. Memes are a well-known art form to people that frequent social media, and so the recent use of DALL.E mini in order to create memes, has launched text-to-image technology firmly into the public domain. Some examples of how this has been used to make memes are shown below:

Even with the smallest of forays into the generative meme landscape, it becomes clear that there is a fascination with the ability to produce images of things that cannot conventionally exist, or easily be depicted by other means. One's imagination is really the only limit to what can be made here. How else does one see Senator Palpatine from Star Wars working at a call center? and all in a matter of minutes. With this in mind, I have created a Twitter bot called '@artbhot' that leverages user tweets as the text input for a text-to-image generative deep learning model. This in essence means that the bot is able to transform user tweets into images and animations.

@artbhot

As mentioned, @artbhot is a twitter bot that processes user tweets - transforming them into never seen before images and animations. The bot is active 24 hours a day and has currently processed around 800 tweets, leveraging CLIP guided VQGAN to generate media. The bot interacts with Twitter through the Twitter API and allows for several different ways in which a user can interact with it. You can ask the bot to make you images and animations by using different hashtags. To start with, to get the bot's attention, one must first '@' the bot account, and then follow this up with the appropriate hashtag for the outcome you wish to receive in reply to your tweet. The bot can create painterly images, images that look drawn or sketched, animated gifs and more:

@artbhot #paintme tweet text goes here
@artbhot #drawme tweet text goes here
@artbhot #makeme tweet text goes here
(more general)
@artbhot #animateme tweet text goes here
@artbhot #useimagetoo tweet text goes here
(add an image to the tweet too)

For example, one might tweet: '@artbhot #makeme a painting of an apple on a table' or '@artbhot #animateme a portrait painting of Dorian Gray'. Some instructions for how to use the bot are also linked in the bio of the Twitter bot's account.

A high-level illustration of the bot's architecture is shown below:

Some examples of images produced by the bot, using some of the different hashtags discussed, can be seen below:

Here we can see examples of the '#makeme', '#drawme' and '#useimagetoo' hashtags at work, where the bot account has replied to the user's original tweet with a response containing the requested media. For the '#drawme' and 'paintme' functionality (below), the system will autonomously engineer the user's tweet text into a text prompt that is optimized for a 'drawn' or 'painterly' aesthetic quality.

When using the '#useimagetoo' hashtag (below), a user can pick an image and add that image to their tweet, along with their tweet text, and send both to the bot. The bot will then augment the two inputs and produce an output that relates to both. As we can see in the example, the output is a combination of the text (an ocean) and the image (colors and texture).

Why is @artbhot interesting?

The social aspect of human creativity is something that is interesting to think about in terms of how it might map to computational creativity. Having a bot producing creative output in the middle of Twitter - a social media platform that to date has nearly 400 million users worldwide, is an exciting test bed for 'socializing' the bot output. Recently, the bot has leveraged Twitter data through the API to autonomously engineer its own prompts - to then send to the image generator as input. It has done this using data on what is trending on Twitter on any particular day. @artbhot has the capacity to tweet an image twice a day, without prompting from a user, that it has generated using what is trending on Twitter in that particular moment. Some examples of this can be seen below:

In a time when it is fast becoming very normal to be able to generate completely novel images within a matter of minutes based on small pieces of imaginative text, the question may become: 'are images with more or less detail perceived as more creative?' and 'what does it mean for the code to creatively interpret text'? I propose that @artbhot can be instrumental in answering such questions in this field. I also have plans to leverage GPT-3 in the autonomous prompt engineering pipeline, as well as in bot responses to users, to create a fuller sense of an interaction with a 'creative agent'. The incorporation of GPT3 outputs will allow for increased expressivity, and also a conversational element to interactions with the bot.

For anyone that is interested in hearing more, I write a blog about various topics related to my PhD project, which can be found here: https://amyelizabethsmith01.medium.com/.

Note:

The bot is currently private, while we iron out plans to keep it as family safe as possible. If anyone reading this article wishes to follow the bot and interact with it please message me quoting this article, at @AmysImaginarium on Twitter, and request to follow the bot account.

DALL.E mini memes:
https://knowyourmeme.com/photos/2386695-dall-e-mini-craiyon
https://www.memedroid.com/memes/tag/dalle
https://huggingface.co/spaces/dalle-mini/dalle-mini/discussions/5105

Amy Smith - August 2022
amyelizabethsmith01@gmail.com