Exploring AI Workflows

Hey, all. Checking in again… For those who don’t know (as there’s been an uptick in activity due to a false update notification), I’m still on hiatus from Interns. The break has been allowing me to destress for a while and spend time exploring things that have currently captured my attention: primarily evaluating the state of AI and its potential in helping with game creation.

Twice now, I’ve had posts written up, but so much has changed since writing them that I’ve had to rewrite and rework things. I’ve finally decided to buckle down and get something out there so people can see what I’ve been up to.

First up, Stable Diffusion as an art tool!

My Requirements

So I’m still spending a lot of my time exploring Stable Diffusion as a tool for making art assets for a game. I have a bit of a mental checklist that I feel needs to be satisfied before I can say that this will be a practical tool.

1. It has to be able to make reasonably attractive art.

So far, it passes with flying colors.

2. I need to be able to design characters to look the way I want them to.

This has been a bit of a challenge. A lot of it comes down to learning how to properly craft prompts and figure out what words the AI understands and produces reliable results for. In general, though, I’ve found that I need to approach with the understanding that the AI is chaotic, and I have to give it a bit of freedom to do what it wants—probably like dealing with a real life artist.

3. I need to be able to get the SAME characters in multiple images.

Right here we have probably the single biggest challenge when dealing with an AI. I’ve been got multiple methods to explore here. I’ll go into that more below, but, suffice it to say, getting the same characters in the same clothes is probably the big downfall of AI art right now.

4. I need to be able to control poses and facial expressions.

If I can’t make the characters do what I want them to do, there’s no much point to it. Posing and showing the proper emotion are important for using art in storytelling.

5. I need to get art in a usable resolution

Stable Diffusion is pretty much built around a 512×512 resolution. Needless to say, most modern gamers are going to find that to be a tad on the small size. While it’s possible to render in larger sizes and different aspect ratios, anything larger than 512×512 -greatly- increases the chances of producing anatomical horrors beyond imagination.

There are tips and tricks for upscaling, or rendering parts of the images and tiling them together, but most methods require more VRAM than I have or produce questionable results. I’m still playing around with my options.

6. I want to be able to control the art style

I’ve been starting off with photorealism, but I would like to be able to produce images that look more like digital illustrations. So far, it looks like the main options for this include changing the prompt tags, using a checkpoint model that has an inherent style baked in, or using LoRA to modify an existing model with a style. I’ll detail my experiments in this area a bit more below.

7. Being able to make porn would be handy…

Given the type of games I’m interested in making (and likely what my audience is interested in), being able to produce NFSW images would certainly be a nice feature. Thus far, though, AI-generated artwork tends to max out at softcore stuff. The lower extremities of the human body seem to be where it gets really confused—and, with as bad as it is with a single person, having two or more people involved is when it can no longer remember what body part should connect to which person…or to anyone at all.

This is really not my problem to solve, however, and plenty of other individuals and groups are putting in the money, time, and expertise to build models and LoRA that should start making this sort of thing possible before long.

In Detail: Reproducible Characters

AI tends to generate some very random people, which makes it a big challenge to use for game characters. I’ve come across a few methods for getting the same person more than once:

Use a name – The AI recognizes a lot of celerity names, but has also made general associations with generic names. It’s possible to get a somewhat unique character just by giving in a randomly generated name.
Prompt blending/morphing – Typically used with a name from above, it’s possible to combine names in different ways in order to create a character that is a blend of two or more people. This does eat up a lot of tokens after a while, and can cause issues when mixed with other tricks.
Character model sheets – I haven’t explored this one much, but it’s possible to convince the AI that it’s building a character turnaround/model sheet with the same person from different angles. It can either generate this entirely from scratch, or combined in InPainting, where part of the image has a character I’ve created, and the AI is told to fill in the rest of the image with a new render.
Training – There are different ways to train the AI on a specific person. Typically this includes training a new checkpoint model, a LoRA (essentially a model add-on), or Textual Inversion. The latter is what I’ve been focusing on.

As I understand it, Textual Inversion is a matter of creating a trigger word which would replaces the combination of prompts required to get a specific look. So, it doesn’t teach the AI anything new…rather, it helps the AI know what collection of traits you want it to pull from its knowledge.

Thus far, I’ve been working on “designing” character inside of Stable Diffusion, so this seems like the ideal method, whereas LoRA training may end up being more useful when it comes to teaching AI to do something it isn’t familiar with—say a character with traits that it’s unable to make, otherwise.

Given how we’re on the bleeding-edge of AI-generated art right now, there’s a ton of misinformation about how to actually train TI’s properly. This has lead to weeks of experimentations on my part, including uninstalling, reinstalling, and modifying my Automatic1111 in order to get the proper combination of software to actually do things properly. The Unstable Diffusion TI community has, fortunately, been a tremendous help in this regard.

My recent attempts have been getting closer to success, with this character being my best TI so far, allowing me to reproduce her in different models:

In Detail: Art Style

While it could be fun to do a game in photorealistic style, I get the feeling that it could be a major challenge to do so; especially given how much AI has trouble with anatomy. From what I’ve seen, anime-style models tend to do a bit better, as I imagine they’ve been trained on a lot of hentai, and the simple art style hides a lot of flaws.

I’ve been exploring my options a little bit, doing large batches of tests of different prompts running on different models, as well as checking out new models that catch my eye.

A lot of the models are capable of doing some really lovely images, but there are also drawbacks. Many only seem to produce good results for limited subject matter, and others tend to give all characters the same face. Finding the right combos of models and keywords for consistent results across all characters might be a challenge.

Here’s our girl from above from some of my tests:

There’s an almost overwhelming amount of things to try when it comes to stylizing the images. Here are a few more samples that came out of my testing (with more NSFW ones in the backer post).

Alternatives: AI as a Filter

Since I wrote the above, I’ve gone back and tried other experiments, which include approaching AI as a very fancy photo filter to be used on top of 3D renders like I’ve used for Interns. No better way to explore this than to actually try with some Interns characters, since I have so many renders on-hand.

With this method, there still needs to be a descriptive prompt to tell the AI what it’s supposed to create, but where the txt2img method starts generating with pure noise, the img2img method begins with a base image. The “Denoise” setting basically controls the strength, with 0.0 giving you back the original image, and 1.0 giving you something that’s all AI.

Here is a grid of Chastity going through a range if Denoise settings:

Too much AI, and I lose all control over what I’m going to get, so it’s a trick to figure out the best point to stop.

It also occurred to me that I could do iterations—finding the versions I liked best and using those as the new base image before running the AI again. That’s the method I found to work the best for the Interns images, with the second run using more keywords to push the image towards looking like a digital illustration.

Cyn was the first I tried. The original model is on the left, with the first AI pass in the middle and the second pass on the right.

AI for Coding

Changing gears from the visual side of things, I did play around with text AI…

I’ve used Novel AI in the past as an aid for writing story segments, but it was very hit-or-miss. With ChatGPT-4, however, text AI has really made major leaps ahead. I’ve found that it can now offer amazing coding support.

After a few early experiments with having it suggest code for various mechanics, I tried an experiment where I provided entire sections of Interns code for it to review and refactor. To my amazement, it actually appeared to understand my code and was able to make suggestions on ways to streamline things considerably.

This was just a half-hearted effort, but it does seem entirely possible that I could work with ChatGPT to improve the existing system and possibly help me figure out how to do things that I’ve been struggling with for years. Needless to say, it’s an exciting prospect.

Backer “Rewards”

I don’t really have any special “behind the scenes”-style content to offer this time around, but I do want to give the people who are backing me a little something extra, so I’m sharing a post full of some of the NSFW samples that have come about during my last few weeks of AI experiments. You can find the post here on Patreon, and here on SubscribeStar.