The age of AI-generated art is well underway, and three titans have emerged as favorite tools for digital creators: Stability AIs new SDXL, its good old Stable Diffusion v1.5, and their main competitor: MidJourney.
OpenAIs Dall-E started this revolution, but its lack of development and the fact that it's closed source mean Dall-E 2 doesn't stand out in any category against its competitors. However, as Decrypt reported a few days ago, this might change in the future, as openAI is testing a new version of Dall-E that is reportedly competent and produces outstanding pieces.
With unique strengths and limitations, choosing the right tool from among the leading platforms is key. Let's dive in to how these generative art technologies stack up in terms of capabilities, requirements, style and beauty.
As the most user-friendly of the trio, MidJourney makes AI art accessible even to non-technical usersprovided theyre hip to Discord. The platform runs privately on MidJourney's servers, with users interacting through Discord chat. This closed-off approach has both benefits and drawbacks. On the plus side, you don't need any specialized hardware or AI skills. But the lack of open-source transparency around MidJourney's model and training data makes it pretty limited regarding what you can do and makes it impossible for enthusiasts to improve it.
MidJourney is the smooth-talking charmer of the bunch, beloved by beginners for its user-friendly Discord interface. Just shoot the bot a text prompt and voila, you've got an aesthetic masterpiece in minutes. The catch? At $96 per year, it's pricey for an AI you can't customize or run locally. But hey, at least you'll look artsy (and nerdy) at parties!
Functionally, MidJourney churns out images rapidly based on text prompts, with impressive aesthetic cohesion. But dig deeper into a specific subject matter, and the output gets wonkier. MidJourney likes to put its own touch on every single creation, even if thats not what the prompter imagined. So most of the images may be saturated with a pump in the contrast and tend to be more photorealistic than realistic, up to the point that after some time people get to identify pictures created with MidJourney based on their aesthetic characteristics.
With MidJourney, your creative freedom is also limited by the platform's strict content rules. It is aggressively censored, both socially (in terms of depicting nudity or violence) and politically (in terms of controversial topics and specific leaders). Overall, MidJourney offers a tantalizing gateway into AI art but power users will hunger for more control and customizability. Thats when Stable Diffusion comes into play.
If MidJourney is a pony ride, Stable Diffusion v1.5 is the reliable workhorse. As an open-source model thats been under active development for over a year, Stable Diffusion v1.5 powers many of today's most popular AI art tools like Leonardo AI, Lexica, Mage Space, and all those AI waifu generators that are now available on the Google Play store.
The active MidJourney community has iterated on the base model to create specialized checkpoints, embeddings, and LoRAs focusing on everything from anime stylization to intricate landscapes, hyper realistic photographs and more. Downsides? Well, its starting to show its age next to younger AI whippersnappers.
By making some tweaks under the hood, Stable Diffusion v1.5 can generate crisp, detailed images tailored to your creative vision. Output resolution is currently capped at 512x512 or sometimes 768x768 before quality degrades, but rapid scaling techniques help. The popularity of tiled upscaling also boosted the models popularity, making it able to generate pictures at super resolution, far beyond what MidJourney can do.
Right now its the only technology that supports inpainting (changing things inside the image). Outpaintingletting the model expand the image beyond its frameis also supported. Its multidirectional, which means users can expand their image both in the vertical and horizontal axis. It also supports third party plugins like roop (used to create deepfakes), After Detailer (for improved faces and hands), Open Pose (to mimic a specific pose), and regional prompts.
To run it, creators suggest that you'll need an Nvidia RTX 2000-series GPU or better for decent performance, but Stable Diffusion v1.5's lightweight footprint runs smoothly even on 4GB VRAM cards. Despite its age, robust community support keeps this AI art OG solidly at the top of its game.
If Stable Diffusion v1.5 is the reliable workhorse, then SDXL is the young thoroughbred whipping around the racetrack. This powerful model, also from Stability AI, leverages dual text encoders to better interpret prompts, and its two-stage generation process achieves superior image coherence at high resolutions.
These capabilities sounds exciting, but they also make SDXL a little harder to master. One text encoder likes short natural language and the other uses SD v1.5s style of chopped, specific keywords to describe the composition.
The two-stage generation means it requires a refiner model to put the details in the main image. It takes time, RAM, and computing power, but the results are gorgeous.
SDXL is ready to turn heads. Supporting nearly 3x the parameters of Stable Diffusion v1.5, SDXL is flexing some serious musclegenerating images nearly 50% larger in resolution vs its predecessor without breaking a sweat. But this bleeding-edge performance comes at a cost: SDXL requires a GPU with a minimum of 6GB of VRAM, requires larger model files, and lacks pretrained specializations.
Out-of-the-box output isn't yet on par with a finely tuned Stable Diffusion model. However, as the community works its optimization magic, SDXL's potential blows the doors off what's possible with today's models.
A picture is worth a thousand words, so we summarized a few thousand sentences trying to compare different outputs using similar prompts so that you can choose the one you like the most. Please note that each model requires a different prompting technique, so even if it is not an apples-to-apples comparison, it is a good starting point.
To be more specific, we used a pretty generalized negative prompt for Stable Diffusion, something that MidJourney doesnt really need. Other than that, the prompts are the same, and the results were not handpicked.
Comment: Here is just a matter of style between SDXL and MidJourney. Both beat Stable Diffusion v1.5 even though it seems to be the only one able to create a dog that is properly "riding" the bike, or at least using it correctly.
Comment: MidJourney tried to create a red square in The Red Square. SDXL v1.0 is crispier, but the contrast of colors is better on SD v.15 (Model: Juggernaut v5).
Comment: MidJourney refused to generate an image due to its censorship rules. SDXL is richer in details caring to produce both the busty teacher and the futuristic classroom. SD v1.5 focused more on the busty teacher (the subject. Model: Photon v1) and less in the environment details.
Comment: Both MidJourney and SDXL produced results that stick to the prompt. SDXL reproduced the artistic style better, whereas MidJourney focused more on producing an aesthetically pleasing image instead recreating the artistic style, it also lost many details of the prompt (for example: the image doesnt show a brain powering a machine, but instead its a skull powering a machine).
So which Monet-in-training should you use? Frankly, you can't go wrong with any of these options. MidJourney excels in usability and aesthetic cohesion. Stable Diffusion v1.5 offers customizability and community support. And SDXL pushes the boundaries of photorealistic image generation. Meanwhile, stay tuned to see what Dall-E has coming down the pike.
Don't just take our word for it. The paintbrush is in your hands now, and the blank canvas is waiting. Grab your generative tool of choice and start creating! Just maybe keep the existential threats to humanity to a minimum, please.
Read this article:
AI Art Showdown: How Top Tools MidJourney, Stable Diffusion v1.5, and SDXL Stack Up - Decrypt