Let’s imagine the ideal model.

Let’s imagine the ideal model.

I recently tried out Anima. It’s an open-source anime model that, for the first time, has given me a real sense that we’re getting close to something like NovelAI. Testing this model got me thinking about how we can drive AI forward, and what I’d like to see the open-source community become in the coming years. In compiling this top 10 list, I’d like to try to capture not just my own expectations, but those of many others as well.

<aside> 👉

Think of this as a Christmas wish list; I’m not sure how feasible many of the things I’m about to present to you are (or to what extent they could be achieved this year if a group were to set out to make them a reality). I think just putting them into words is already a big step. If you share the hope that any of the proposals I’m about to make might come true, please pass this article on to the right people and groups so that this can happen.

imagen.png

</aside>

To succeed in 2026, there’s no need to obsess over the quality of NanoBanana or GPT-Image 7. (Or whichever version came out in May – to be honest, I lost count ages ago.) There are other, more achievable priorities that could go viral if a medium-sized, focused team sets its mind to it. China has already made quite a name for itself in open source, but it could make an even bigger name for itself if it takes note of the following, for whoever masters these 10 points will master the art of war:

Contents

<aside> 👉

None of the images in this article belong to me; they have been taken from various online sources and are being used solely to illustrate the points made.

</aside>

1. The option to portray any well-known pop culture character:

Limitations: Anime models, for example, are trained on curated data from various websites (Boorus, DeviantArt, etc.), and therefore rely entirely on the fandom surrounding the characters. (In short, this depends on there being plenty of fan art or images taken directly from the film or series.) For a model to be able to generalise and identify almost all existing characters in pop culture, it would need hundreds of tagged images of these characters. And of others that are not so popular. In turn, it should also include some characters that have few drawings, but who are very popular or lend themselves to memes (Heisenberg, Shrek, Queen Elizabeth II…).

I have to say that many of the images featured in this article are a bit of a cheat; it’s not that the NovelAI model is that good by default, it’s just that the users are experts (not to mention that they use NovelAI’s control tools to make them look perfect)

I have to say that many of the images featured in this article are a bit of a cheat; it’s not that the NovelAI model is that good by default, it’s just that the users are experts (not to mention that they use NovelAI’s control tools to make them look perfect)

<aside> 💡

In the absence of diverse data: Training with synthetic data could help. NovelAI, or general-purpose generative models, are in fact currently the best source for this type of training. However, great care must be taken with hidden watermarks, training biases, or any other related issues (which could, in fact, lead to errors). It is also possible to train using images from Loras de Illustrious. (Although these are generally of poorer quality.)

IMG_9672.png

</aside>

2. Full control via extras:

b.jpg

Anime-specific editing model (Flux Klein type): An editing model could provide that extra something Anime models need to succeed. For a start, it would remove the need for a Controlnet model for certain edits. Given that open editing models are only just getting started, we don’t know when we might have an Anime editing model (we didn’t even have Z-image edit) or what state it would be in when it came out. What I can say is that, as an experiment, it would be brilliant.

<aside> 💡

The downside of capable generative image editing models is that they require reasonably powerful hardware. Developing a model specialised in anime that is also lightweight could take several years. It is also worth noting that current editing models perform rather poorly with anime, being able to handle only basic edits and requiring specific LORAs for more specific tasks.

An example of a simple workflow for generative editing using the Flux Klein model in anime.

An example of a simple workflow for generative editing using the Flux Klein model in anime.

</aside>

3. Ability to understand a wide range of common languages when processing prompts:

We’re only just beginning to appreciate natural language and comprehension in open-source models. But I believe we should already be focusing on the next big step to take image AI to a new level. The closest we’ve come to a model that understands Spanish with complete accuracy has been Z-image, and I’d say its performance in this and other languages is merely acceptable.