I Trained AI To Be My Virtual Photographer
Using Stable Diffusion XL in the cloud means you can create stylized profile photos from anywhere
Keeping with the times, I made an AI version of myself! A bunch of versions actually. Along with the image above, here are some of the cooler ones that I created:
It’s fun to play with injecting yourself into any kind of scene that your imagination can come up with!!
In order to make these images, I trained a LoRA of myself and used Stable Diffusion XL to create them via text prompt. Training an image model accurately requires a solid base set of images to provide detail on appearance, so I gathered a bunch of of my profile photos to use.
Here are the images that I used to train the model and create AI photos of myself.
You can see that not all of the generated images look exactly like me, but there is a lot that can be done to increase the accuracy of model training by tweaking your input images and updating parameters. This model is only my first attempt, I’m planning on training an improved version in the future.
Below is a more technical, high level approach to how to train a LoRA for use with Stable Diffusion XL.
Technical Process - Training on RunPod
If you’re looking to create images of yourself, there are a ton of options these days from paid apps like Lensa, PhotoAI and RunwayML. They all have varying degrees of accuracy and style, however, so to have the most control to create images that you want, you can train your own.
For those of you who are interested in the fairly technical process of training your own likenesses for much cheaper and with more power, then here is the process of how I did it.
In using your images you are essentially training a LoRA model that can then be used to describe imagery with your (or anyone / anything else you trained) specific likeness.
LoRA is short for Low-Rank Adaptation of Large Language Models (more info can be found here: https://stable-diffusion-art.com/lora/). It is a technique used in machine learning to modify large pre-trained models like GPT-3 or GPT-4 from OpenAI. It achieves this by only updating a small set of additional parameters, reducing the computational cost and time required for fine-tuning while maintaining the model's performance.
Training models comes with high computational cost in general. In order to successfully train a custom model, you need a powerful computer with a top tier GPU (Graphics Processing Unit).
I personally don’t have a very powerful computer at home, and with the one I do have, I’m constantly using for other things (like writing this article, coding and making music), so I can’t have it locked up training profile photos for days.
Luckily there are plenty of options for cloud processing these days where you can offload your computation for a nominal fee.
The cloud computing provider that I used for this was Runpod.io. Runpod allows you to create instances of servers that have pre-installed configurations ready to suit specific needs, which in this case would be training models and creating images with Stable Diffusion.
Below are the high level steps I took to train the model in the cloud.
High Level Training Steps
If you want to try for yourself, this tutorial link is the one I followed and is a good guide: https://aituts.com/sdxl-lora/
Here are the high level steps:
Gather 20-100 images of one subject with framing and focus on the face (think profile photos). I used about 25 images for this version.
Edit images to center face and crop to 1:1 ratio with dimensions at 1024x1024 pixels if possible, otherwise 512x512 pixels.
Setup Runpod and deploy a Stable Diffusion instance with powerful GPU (template: Stable Diffusion Kohya_ss ComfyUI Ultimate [ashleykza/stable-diffusion-webui:3.6.1])
Upload your images to the server file system.
Caption images thoroughly for use in prompting (important for connecting prompts to aspects of photos, such as smiling or serious, having facial hair, etc.).
Configure and train the LoRA model.
Generate new images from your model using the Stable Diffusion web interface on the server (A1111 or ComfyUI).
That is a super high level overview of the steps, but again, this is the more detailed tutorial that I used if you want to try it yourself: https://aituts.com/sdxl-lora/
These tools and technologies are changing every day, so I hope to try training and updated model sometime in the future. I’m happy with the success of this initial iteration though, as it provides easy access to cool looking social profile posts in any context you could imagine.
If you want to dig even deeper, here’s an advanced approach using the technique of training a model of yourself by including images from celebrities that you look like to expand the set of training data for even more accurate and realistic results.
Video: SDXL LORA Style Training
And here’s the celebrity look-a-like generator it uses, if you just want to see what celebrity you look like to help extend your model training 😆
Apparently I look like somewhat like Charlie Weber. I don’t know who that is, and definitely don’t see it 😂
Anyway, it’s exciting to explore and create through the power of AI. I’m looking forward to doing more experiments in images, video and music in the future.
~ Michael
Exciting Tech of The Week
Stable Diffusion XL + Runpod.io
SDXL - https://stability.ai/stable-diffusion
Runpod - https://www.runpod.io/
Stable Diffusion XL is a state-of-the-art AI model designed for generating images from textual descriptions. It can be used to train and employ LoRA (Low-Rank Adaptation) to interpret and visualize text prompts with high accuracy, making it a valuable tool for creatives seeking to transform their ideas into visual art. This model excels in creating detailed and imaginative images, providing a seamless bridge between textual imagination and visual representation.
For ease of use, especially for those without extensive technical resources, Stable Diffusion XL can be accessed via cloud services like Runpod.io. This approach offers a user-friendly platform for harnessing the power of the AI without the need for advanced hardware or technical expertise. Users can access the power of Stable Diffusion XL on Runpod.io to train and generate images, facilitating a convenient and accessible way for creatives to explore AI-assisted art creation.
My Creative Updates
Here is a sneak peak of one of the songs that I’m working on, set to some visuals that I created in RunwayML and Plazma Punk. There are vocals for it that I haven’t yet recorded, but this is a preview of the verse musical vibe. This is the most chill song, the rest will be more rock focused. As part of my solo music, I really want to explore style, genre and approach to creation, but rock, and ambient electronic like this one, are what I know, so I’m starting with that!