AI Video Rising

Recent releases of impressive generative video models promise to disrupt video creation

Dec 16, 2023

prompt (midjourney): a robot acting like a director holding a camera shooting a scene on a futuristic film set by Enki Bilal --stylize 50 --style expressive --niji 5

It’s been over a year since ChatGPT was released to the public, triggering an avalanche of AI innovation and potential.

It’s been even longer since the first iteration of text-to-image favorite, MidJourney, entered the market. In its time, advancement of quality and accuracy have improved drastically, going from novel, artifact riddled, robotic images to legitimately awe inspiring creations.

These rapid advances in text and image generation have already changed how people create and do business.

AI Video

With the recent release of several new models by leading companies like Stable Video from Stability AI, Pika 1.0, and RunwayML, AI video is now poised to be the next area of innovation.

Currently, there are a few different methods with which to create AI video, each offering different strengths and applications.

Here is an outline of the different methods with some example experiments:

Text To Video

Text to video is the easiest way to generate AI video and the technique that most people are probably familiar with due to it’s prevalence in the AI image space.

It’s as simple as describing an image with as much detail as possible with text, and then sending it to the model through a user interface for generation.

As with all AI prompting, there are techniques and structures of how to best describe what you want to the system, which can be a bit of an art form in itself.

Generally the more descriptive the better.

Usage and Application

In it’s current state, text-to-video generation works really well for scene ideation and brainstorming as well as b-roll and background videos that don’t require too much detail.

As you can see, the images aren’t always accurate, but so far I am just experimenting and haven’t spent the time to fully refine some of these prompts.

My Examples

Here is an example that I made of a Dinosaur Fighter using the following prompt:

prompt (runwayML): a muscular shirtless human man standing still face to face with a carnivorous dinosaur. They are standing away from each other about 50 yards apart on a open field of dirt and rock. Mid day sun brightly shining behind them as dust floats up from the ground. Side camera view. 8k uhd, dslr, high quality, film grain, Fujifilm XT3

It’s not quite a T-Rex, but its an intimidating dinosaur-like creature and captures the drama of the scene.

And here’s another one of a prog metal guitarist shredding arpeggios in 27/8 time on stage made with Pika. This model, and many others up to this point, still can’t get hands and fingers right 🤙😆.

prompt (pika): a mid distance shot of a young metal guitarist playing a guitar solo on stage at a large festival with a crowd in front of the stage and stage lights beaming down

Try It Yourself

Below is a hosted model on replicate.com where you can see the usage of the text-to-video technique using the AnimateDiff extension for Stable Diffusion. This extension for the popular image generation model even has different camera movement parameters that you can use to further control the scene.

Replace the current prompt text and click “Run” at the bottom to try it yourself.

https://replicate.com/zsxkib/animate-diff?prediction=q6l4fjlbjb4teeuupiinnfmncu

Image To Video

Image-to-video generation is just that, converting an image to a video. You input an image to the model and then it reads the contents of the image from style, to form, to characters, and uses that data to inform and calculate movement.

Video is only a sequence of images at its core, so the model takes the context of the initial image and generates a sequence of next images that might make sense based on the input.

Usage and Application

This type of video creation is pretty novel at the moment and tends to lose accuracy the longer the video generation is, but it is definitely impressive to see static images come to life!

Image-to-video generation is great for social media content and ideation like live mood boards or storyboard videos to convey feel and aesthetic for final production in music videos or film.

A great workflow for ideation and storyboarding is prompting images in MidJourney or DALLE•3 then bringing them to life with an image-to-video model like Gen2 from RunwayML, Pika, or Stable Video.

My Examples

Here’s an example from an image of myself I created for a previous article (linked below) using a trained LoRA and Stable Diffusion. I took the generated image and ran it through Stable Video to bring it to life (double AI generation)!

I Trained AI To Be My Virtual Photographer

Michael Meinhart

November 24, 2023

Read full story

And here’s another example of an image that I created in MidJourney for another post brought to life with RunwayML Gen2.

Artificial Intelligence vs. The Government vs. The People

Michael Meinhart

November 3, 2023

Read full story

Lastly, here’s a stand off between an AI Muscle Michael and a T-Rex. This is completely generated from the image with no extra prompt using Pika. It is particularly impressive that the model derived the context of the image enough to make me turn my head and look at the T-Rex!!

Try It Yourself

Below is a link to an image-to-video model that uses the new Stable Video Diffusion. Replace the input image and click run to see your image come to life. (default parameters work well, but feel free to tweak for different results)

https://replicate.com/stability-ai/stable-video-diffusion

Video To Video

Video-to-video generation is where you input an existing video and alter its appearance with AI by changing its style, or various aspects of the original video.

This technique is more advanced as you will need quality source video in order to produce the best results. In that lies flexibility and power though, so I view this method as having a ton of potential and contributing to the most practical impact in the near future.

Usage and Application

With this approach, you can still direct a basic scene, using either a traditional approach with a physical camera or a virtual approach with a 3D engine (such as Unity, Unreal Engine or Blender).

In creating the source video you can have complete control over characters and camera, but not have to worry as much about the style of the shot, such as background, wardrobe and even lighting. Then with AI generation, you can later transform character, setting and style into anything by processing the basic shot.

The best way to demonstrate what I mean is in the examples below.

I’m still learning techniques, so my examples aren’t too impressive, but a little farther down, you can see some impressive professional examples to better understand the potential of this technique.

My Examples

In this first example, I downloaded a pre-made, 3D animated character rig from Mixamo and imported it into the Unity game engine where added a simple camera animation.

From there, I exported the basic video and ran it through Gen1 in RunwayML with an input image reference directing the style. (a prompt for style is also an option instead of reference image)

I believe this kind of approach will be immensely powerful for creating highly stylized renders for music videos and films as it continues to improve. I’m exited to continue to play around with and improve this technique for more videos in the future.

Input Image:

Here’s another older video that I did earlier this year using stock footage as the original video, and then transforming the style a bit with Warpfusion. Not quite as impressive, but you can see how it might be useful to alter the style of existing videos.

Try It Yourself?

As mentioned above, this technique is bit more complicated to dive into, but you can grab video from a stock footage site and try running it through RunwayML Gen1 to transform it.

Better Examples from Pro Creators

My examples are a bit crude for now as I was mostly experimenting and didn’t spend a ton of time tweaking prompts and parameters. AI moves exponentially though, so quality and ease of use will catch up quickly based on current capabilities, and as a result, use cases will become more relevant and realistic.

Here are some really impressive examples of AI video made by professional creators using existing models.

Text-to-Video: Sci-Fi trailer

This entire video is made from text-to-video prompted clips in Pika, and then edited together with music.

Image-To-Video: Various Examples

Here is a video containing various impressive image-to-video results from RunwayML Gen2.

Video-to-video: AniMatrix

This user, MrBoofy, converted an actual scene from The Matrix movie into an Anime version using AnimatedDiff on Stable Diffusion.

AI Video Predictions

The first versions of AI generated video have been pretty rough with a jerky, stutter-like style that most of the internet can recognize by now. Because of the ease of accessibility, it feels like generative AI fatigue has started to set in a bit as more and more content of that style is shared and it’s novelty wears off.

It is my belief that as these tools continue to improve, and new techniques are discovered, we will see better and better quality content created by talented creators and storytellers.

The novelty aspect of “look I entered text and created this!” will fade, and the essence of talent in storytelling and creativity will re-emerge.

In the future I can see creators of all types more easily creating video to accompany their artist story and expression. Think anything from TikTok music videos created by musical artists to Etsy creators featuring physical creations in AI generated story videos, to short films accompanying a podcast or blog story.

There will be a big wave of independent film creators consisting of small teams that will begin to rival bigger studios who have become more and more focused on profit and commercialism than expanding storytelling and art.

There are boundless applications in the commercial space as well. From commercials to product demos, the broader applications of generative AI video are set to revolutionize industries.

Ultimately generative AI is a tool that enhances process and provides new avenues to explore the core of human creativity and storytelling. It will provide independent creators with the ability to explore new limits of innovation without the restrictions of large budgets and studio backing.

And as a result, they will provide audiences with exciting stories and worlds that they have yet to see.

The Strength of Human Creativity

One last mention to touch on a recurring theme within the topic of AI generated content and art: I do strongly believe that the essence of art and expression is very human in nature, and when it comes to artistic connection, nothing can replace the human heart and mind.

More functional and less creative applications and industries will likely be more affected by the emergence of AI, but true creativity is, at its core, a very human trait.

As I mentioned earlier, AI in general is a tool that will serve to augment the vision and and ambition of those who have an idea and the ability to harness this evolving technical power.

As time goes on, novelty will fade and true art will shine through, as it always has.

~ Michael

The AI Video Creation Tools from This Post

I used the amazing tools below for the experiments in this post. They are all in various states of release, with some like RunwayML having an easy to use UI, and others like Stable Video, Stable Diffusion and Warpfusion requiring technical setup to use.

All of these projects are actively iterating new versions that will become easier and easier to use while producing better and better results.

2024 is looking like a big year for AI video, lead by these exciting projects.

RunwayML - https://runwayml.com/
Pika - https://pika.art/
Stable Video - https://stability.ai/stable-video
Warpfusion - https://github.com/Sxela/WarpFusion
AnimatedDiff / Stable Diffusion - https://animatediff.github.io/

My Creative Updates

With the holidays and work on an exciting new AI project that I’ll be announcing soon, my music progress have been a little slow of late. I’m planning on picking up work on the EP again in the new year though, and will share updates as they come.

The Artful Algorithm

I Trained AI To Be My Virtual Photographer

Artificial Intelligence vs. The Government vs. The People

Discussion about this post

Ready for more?

The Artful Algorithm

AI Video Rising

Recent releases of impressive generative video models promise to disrupt video creation

AI Video

Text To Video

Usage and Application

My Examples

Try It Yourself

Image To Video

Usage and Application

My Examples

I Trained AI To Be My Virtual Photographer

Artificial Intelligence vs. The Government vs. The People

Try It Yourself

Video To Video

Usage and Application

My Examples

Input Image:

Try It Yourself?

Better Examples from Pro Creators

Text-to-Video: Sci-Fi trailer

Image-To-Video: Various Examples

Video-to-video: AniMatrix

AI Video Predictions

The Strength of Human Creativity

Exciting Tech of The Week

The AI Video Creation Tools from This Post

My Creative Updates

Discussion about this post

Ready for more?