Prompts, Technically Speaking

By Dirty Old Biker

1/10/2026

Table of Contents Introduction Prompt Context Prompt Focus . . ◉ . How Focus Breaks Down . . ◉ . Takeaway What is a prompt? . . ◉ . So What is a Model? . . ◉ . Training Sets . . ◉ . Model Weights . . ◉ . Tokens . . ◉ . What a Prompt Is, Finally Prompt Length Considerations . . ◉ . Tokenizers . . ◉ . Models Final Thoughts . . ◉ . Common Prompt Failure Modes (And Why They Happen) . . ◉ . A Note on Transferability . . ◉ . A Practical Prompting Checklist . . ◉ . And Finally ... Introduction First, I apologize for the length and complexity of this article. I think I've arranged it so that it becomes progressively more complex, rather than hitting you with a piano right at the start. Also, you will no doubt see lots of these: . . ◉ . They are my homemade list indenters. Currently, this blog editor does not properly support lists, numbered or otherwise. To work around this issue, I added my own bullet and forced spacers. I made the bullet extra large so it doesn't get lost in all the periods. Once this editor problem is fixed, I will happily remove the homemade stuff. Prompt Context When you write a prompt, the model doesn’t read it like a shopping list. It reads it more like a story that unfolds from the beginning forward. That means the early part of your prompt quietly sets the rules for everything that comes after it. This is what prompt context really is: the mental frame the model is already in before it sees the next word. Why this matters in practice: If you start with “a watercolour illustration , everything that follows will be pulled toward watercolour — even if you later ask for sharp realism. If you open with “cinematic lighting, dramatic mood” , the model will keep trying to be dramatic, even if you later ask for something plain. If you describe a subject as “small” early on, later details often scale down automatically. The model isn’t checking boxes. It’s building momentum. So when artists say: “It ignored my last line” What usually happened is that the context was already set, and the last line didn’t fit. That’s also why adding more words doesn’t always improve an image. Sometimes it just pushes the model further in the wrong direction. The takeaway for AI artists is simple: The first few words of your prompt matter more than the last few. Good prompting isn’t about saying everything. It’s about setting the right context early, then letting the model fill in the rest. Prompt Focus From an AI artist’s point of view, prompt focus is about how much attention the model can give to any one idea at a time, and what happens when you ask for too many things at once. Every image model has a limited amount of “attention” it can spread around. When your prompt is short and clear, most of that attention goes toward the main subject. When your prompt gets long or crowded, that attention gets split up. Prompt focus is simply this: How concentrated the model’s effort is on what you actually care about most. Why this matters in real use: . . ◉ . If you describe one main subject, it usually looks strong and intentional. . . ◉ . If you describe three subjects, five styles, four lighting setups, and ten details, none of them get enough attention to dominate. . . ◉ . Important features start to weaken, blur, or disappear entirely. This is why artists often say: “It kind of did everything, but nothing really well.” That’s a focus problem, not a creativity problem. How Focus Breaks Down When focus collapses, models don’t fail cleanly. Instead, you’ll see things like: . . ◉ . The main subject looks generic . . ◉ . Secondary details override primary ones . . ◉ . Style becomes inconsistent . . ◉ . Anatomy or structure degrades . . ◉ . The image feels “muddy” or undecided The model isn’t confused, it’s overcommitted. Takeaway Prompt focus is about prioritization, not verbosity. Strong prompts usually: . . ◉ . Establish one clear subject . . ◉ . Support it with a few reinforcing details . . ◉ . Avoid competing ideas unless you intend the blend Once you understand prompt focus, you stop trying to force models to do more — and start deciding what you’re willing to let go of. What is a prompt? You might be inclined to think that a prompt is a set of instructions that tells the model what to draw. Most people would agree, and from the point of view of cause and effect, you'd be right. It's a very useful simplification that describes how things work, from 1,000 feet up. To truly understand what a prompt is, you must understand what a model is. After all, they have to speak the same language, so what better way to start? So what is a model? A model is not an entity that can communicate. It's not even a computer program, not in the usual sense, at least. Simplistically speaking, a model is a huge database that stores a bunch of weight values. When the model is trained, a special program builds the model using the provided training set. Training Sets A training set is a huge number of pictures of everything you could think of, each with its own textual description. Ultimately, humans were the source of all descriptions. What they didn't write themselves was generated by another model based on descriptions they did write. The training program adjusts a set of weights that represent the characteristics seen in the image it is training on. The descriptions are turned into a series of tokens , where each token or group of tokens represents a way to activate the weights. Model Weights The way the model data is used is very complex, as you can no doubt imagine. To simplify it as much as I can, the trainer does not store images or image fragments. It stores configuration values for various shapes based on their features, such as curves, textures, etc. This next part is pretty heavy (even though it is still as simplified as I can make it). The images define the appearance of those features, and tokens bias collections of features. A feature can be many things. For example, a feature can represent a texture, an edge, a shine, a curve, and so on. The tokens that represent the word, dragon, would then need to influence all the features involved in making something that looks like a dragon. The tokens created from the description are tied to embeddings in the model. An embedding is a sequence of numbers that, when fed into the model's weights, activates patterns that resemble a dragon. What is important to remember is that the model does not retain images or perform lookups to get a picture of a dragon. The model literally does not know what a dragon is. It only adjusts internal weights in ways that tend to produce dragon-like shapes. Tokens Every token is a piece of text that's been converted into a single integer number. Tokens represent text that is usually around 4 to 5 characters long. Tokens are snippets of phrases that are assigned a unique number that serves as an identifier or key. Several different algorithms can perform this conversion, each with its own advantages and disadvantages. For example, OpenAI models like ChatGPT, DALL•E 3, and GPT Image 1.5 all use OpenAI's tokenizer. OpenAI has a demo tool that takes text and shows you how it is tokenized. I used it to do up my little prompt demonstration: • 32 - A • 182838 - colossal • 45342 - dragon • 322 - re • 1904 - ars • 869 - up • 402 - on • 1617 - its • 54538 - hind • 23024 - leg • 11 - , • 1617 - its • 107012 - majestic • 45908 - wings • 11402 - spread • 8174 - wide • 2494 - open • 13 - . • 623 - The • 45342 - dragon • 885 - 's • 18965 - massive • 3189 - head • 30082 - pointed • 869 - up • 316 - to • 290 - the • 17307 - sky • 11 - , • 166058 - roaring • 326 - and • 1912 - bel • 46992 - ching • 6452 - fire You might notice above that wherever you see the same word, it has the same token id. You will also notice that some tokens represent much longer words. The 4-to-5 character claim is not a hard-and-fast rule. These algorithms aim to optimize word tokenization so that the most relevant collection of characters is a single token. Depending on what each word is, the likelihood of splitting varies. In this example, the word rears is divided into two tokens, while majestic remains a single token. What a prompt is, finally After all of that, we can now answer definitively what a prompt is. Don't be too disappointed. A prompt is a conditioning mechanism. Your words are converted into embeddings that flow through the model’s fixed weights, where a great deal of complex math determines which learned features are emphasized, suppressed, or combined to produce the final image. Prompt Length Considerations Yes, prompt length matters. It's a delicate balancing act between including enough content in your prompt to describe what you want to generate and exceeding a model's prompt length limit. If that weren't bad enough, where you put the different parts of your prompt actually matters, and makes a difference. Exceeding a prompt limit does not make the model try harder — it makes it decide what to forget. (Chad) Tokenizers The tokenizer is the algorithm that converts your prompt text into tokens. Here are some, along with their details. A tokenizer is a tokenization algorithm coupled with a learned vocabulary and segmentation rules. (Chad) Some general facts about tokenizers 1. Tokens are not words. They may represent: . . ◉ . Whole words . . ◉ . Subwords . . ◉ . Fragments of words . . ◉ . Punctuation . . ◉ . Whitespace . . ◉ . Byte sequences As a result, word count is a poor predictor of token count, and minor wording changes can significantly alter token usage. 2. Token windows are finite. Every tokenizer has a maximum token window. Once exceeded, text is either: . . ◉ . Hard-truncated (silently dropped), or . . ◉ . Compressed and summarized (semantic models) There is no model that "just uses everything". 3. Tokenization happens before prompt logic. All higher-level behaviour — weighting, focus, semantic planning — operates after tokenization. If text never becomes a token, it never influences the image. Language-First BPE (OpenAI / Qwen-style) This tokenizer family treats prompts as language rather than as lists of descriptors. Algorithm . . ◉ . A variant of Byte Pair Encoding (BPE). . . ◉ . Optimized for natural language. . . ◉ . Trained on large text corpora. Characteristics . . ◉ . Variable-length tokens. . . ◉ . Efficient representation of grammar and syntax. . . ◉ . Excellent handling of prose and descriptive language. Typical Token Window . . ◉ . 4,096 – 16,384 tokens, depending on the model, shared across: . . . . ◉ . System instructions . . . . ◉ . User prompt . . . . ◉ . Any internal prompt augmentation Exceeding the Limit . . ◉ . Prompts are not hard-cut. . . ◉ . Instead, text is compressed or summarized. . . ◉ . Salient concepts survive, minor details are discarded, which leads to: . . ◉ . Loss of specificity. . . ◉ . Reinterpretation rather than omission. . . ◉ . "It didn't do what I wrote." Consequence for Prompting . . ◉ . Word order matters less. . . ◉ . Natural language works well. . . ◉ . Repetition has diminishing returns. . . ◉ . Over-specification causes semantic drift, not noise. CLIP BPE (Stable Diffusion Lineage) This tokenizer was trained jointly with image-text pairs and is optimized for visual concepts, not linguistic fluency. Algorithm . . ◉ . LIP BPE tokenizer. . . ◉ . Fixed vocabulary. . . ◉ . Vision-aligned token embeddings. Characteristics . . ◉ . An extremely small token window. . . ◉ . No semantic planning. . . ◉ . No conflict resolution. . . ◉ . All tokens compete simultaneously. Typical Token Window . . ◉ . 75 tokens (SD 1.5) ≈ 300 characters. . . ◉ . 75 + 75 tokens (SDXL has a dual-encoder) ≈ 600 characters. . . ◉ . Hard maximum that cannot be exceeded. Exceeding the Limit . . ◉ . Tokens beyond the window are silently dropped. Consequence for Prompting . . ◉ . Token order matters a lot. . . ◉ . Important concepts must appear early. . . ◉ . Repetition increases influence (vector scaling). . . ◉ . Long prompts degrade sharply and unpredictably. Unigram / SentencePiece (Imagen, Hunyuan-style) This family behaves similarly to OpenAI models but with less aggressive reinterpretation. Unlike BPE, Unigram tokenization evaluates multiple possible segmentations and selects the most likely overall decomposition. Algorithm . . ◉ . Unigram Language Model. . . ◉ . Implemented via SentencePiece. . . ◉ . Probabilistic segmentation. Characteristics . . ◉ . Strong multilingual support. . . ◉ . Semantic chunking. . . ◉ . Less brittle than CLIP. . . ◉ . More stable than classic BPE for long prompts. Typical Token Window . . ◉ . Several thousand tokens accepted. . . ◉ . The effective conditioning window is smaller and model-dependent. Exceeding the Limit . . ◉ . Early semantic compression. . . ◉ . Salient ideas retained. . . ◉ . Low-importance detail removed. Consequence for Prompting . . ◉ . Natural language performs well. . . ◉ . Descriptor lists are less effective. . . ◉ . Prompt clarity matters more than prompt length. Hybrid / Multi-Encoder Systems (Flux-style) Algorithm . . ◉ . Multiple tokenizers feeding different encoders. . . ◉ . Typically: . . . . . ◉ . Language tokenizer for prompt ingestion. . . . . . ◉ . CLIP-style tokenizer for image conditioning. Characteristics . . ◉ . Long prompts are accepted. . . ◉ . Only a subset meaningfully influences generation. . . ◉ . Bottlenecks occur after tokenization. Typical Token Window . . ◉ . Accepted: thousands of tokens. . . ◉ . Effectively used: . . . . ◉ . ~75–100 tokens (lower tiers) ≈ 300 - 400 characters. . . . . ◉ . ~150–200 tokens (higher tiers) ≈ 600 - 800 characters. . . ◉ . Exact limits are rarely disclosed and vary by model version. Exceeding the Limit . . ◉ . No immediate truncation. . . ◉ . Conditioning vectors saturate. . . ◉ . Later concepts lose influence rather than disappearing. Consequence for Prompting . . ◉ . Feels permissive but deceptive. . . ◉ . Long prompts “work” until they don’t. . . ◉ . Focus collapse happens gradually, not abruptly. This is the source of many user misconceptions about prompt length. Models This is a list of several models, along with their prompting details. OpenAI Image Models (Language-First, Semantic) Models . . ◉ . GPT Image 1 . . ◉ . GPT Image 1.5 . . ◉ . DALL·E 3 Tokenizer Family . . ◉ . Language-first BPE (tiktoken-derived) . . ◉ . Same lineage as GPT text models . . ◉ . No CLIP tokenization Prompt Length . . ◉ . GPT Image 1: ~8k total context . . ◉ . GPT Image 1.5: ~16k total context . . ◉ . DALL·E 3: undisclosed, empirically several thousand Stable Diffusion Lineage (CLIP-Bound, Mechanical) Models . . ◉ . SD 1.5 . . ◉ . SDXL Base . . ◉ . Pony (inherits from SD 1.5) . . ◉ . Illustrious (inherits from SDXL) Tokenizer Family . . ◉ . CLIP BPE tokenizer . . ◉ . Vision-aligned embeddings . . ◉ . Fixed token window Flux Models (Hybrid / Multi-Encoder) Models . . ◉ . Flux Schnell: ~75–100 tokens . . ◉ . Flux Dev: ~150 tokens . . ◉ . Flux Pro: ~200 tokens Tokenizer Family . . ◉ . Hybrid system . . ◉ . Language-style tokenizer for ingestion . . ◉ . CLIP-like tokenizer for image conditioning HiDream Models (CLIP-Style Diffusion) Models . . ◉ . HiDream Fast . . ◉ . HiDream Dev . . ◉ . HiDream Full Tokenizer Family . . ◉ . CLIP-style tokenizer (OpenCLIP variant) Google Imagen 4 (Language-First, Semantic) Tokenizer Family . . ◉ . SentencePiece / Unigram LM Ideogram Tokenizer Family . . ◉ . Not publicly disclosed . . ◉ . Behaviour suggests language-first tokenization with a semantic planning stage Prompt Length . . ◉ . Thousands of tokens accepted . . ◉ . Effective conditioning is smaller Exceeding the Limit . . ◉ . Semantic pruning, not truncation Practical Consequence . . ◉ . Responds well to prose . . ◉ . Less sensitive to token order . . ◉ . Descriptor weighting has a limited effect Tencent Hunyuan Image 3 Tokenizer Family . . ◉ . SentencePiece-based tokenizer . . ◉ . Multilingual-first Nano Banana / Nano Banana Pro Tokenizer Family . . ◉ . Not disclosed . . ◉ . Observed behavior strongly suggests CLIP-derived tokenization Qwen Image Models Tokenizer Family . . ◉ . Qwen BPE tokenizer . . ◉ . Language-first Prompt Length . . ◉ . Thousands of tokens accepted Exceeding the Limit . . ◉ . Semantic compression Practical Consequence . . ◉ . Natural language preferred . . ◉ . Clear intent matters more than detail density Lucid Origin / SeeDream Tokenizer Family . . ◉ . Not disclosed . . ◉ . Observed behaviour suggests CLIP-style conditioning Prompt Length . . ◉ . Short effective window Exceeding the Limit . . ◉ . Hard truncation or rapid focus collapse Practical Consequence . . ◉ . Descriptor-style prompts work best . . ◉ . Weighting and repetition matter . . ◉ . Long prompts degrade sharply Final Thoughts Common Prompt Failure Modes (And Why They Happen) Even when you understand prompt context, focus, and tokens, prompts still fail, just not randomly. Most failures fall into a small number of predictable patterns. Knowing these patterns helps you diagnose problems without guessing. Silent Truncation The latter part of the prompt has no visible effect. This happens when: . . ◉ . You exceed a hard token window (common in CLIP-based models) . . ◉ . Important details appear too late in the prompt From the artist’s perspective, it looks like the model “ignored” you. In reality, the model never saw that text at all. Focus Dilution Everything looks technically correct, but nothing looks strong. This happens when: . . ◉ . Too many subjects compete for attention . . ◉ . Too many styles, moods, or modifiers are introduced . . ◉ . The model spreads its effort too evenly The result is an image that feels generic or undecided. Context Lock-In Early wording dominates, even when it's contradicted later. This happens when: . . ◉ . Strong stylistic or conceptual framing appears early . . ◉ . Later instructions conflict with the established direction The model isn’t being stubborn; it’s staying consistent with the context you already set. Semantic Overwrite The image matches the spirit of the prompt, but not the specifics. This happens in: . . ◉ . Language-first, semantic models . . ◉ . Very long or over-specified prompts The model compresses your intent and fills in details on its own, sometimes “improving” the prompt in ways you didn’t ask for. Accidental Blending Unintended styles, subjects, or attributes appear. This happens when: . . ◉ . Multiple concepts overlap semantically . . ◉ . The model averages incompatible ideas instead of choosing one What feels like randomness is usually a statistical compromise. Why This Matters These failure modes aren’t mistakes; they’re consequences. Once you recognize them, prompt troubleshooting becomes straightforward: . . ◉ . Missing detail → check truncation . . ◉ . Weak subject → check focus . . ◉ . Ignored instruction → check context order . . ◉ . Unexpected interpretation → check semantic compression At that point, prompting stops feeling fragile and starts feeling explainable. A Note on Transferability Prompting techniques do not transfer cleanly between image models. Two prompts that look identical can behave very differently depending on how the model processes text. This isn’t a matter of quality or intelligence; it’s a consequence of architecture. For example: Techniques that rely on token weighting and repetition work well in CLIP-based diffusion models, but have little effect in language-first models. Long, descriptive prompts often help semantic models, but can actively harm results in CLIP-bound systems. Reordering words can dramatically change an image in one model and do almost nothing in another. This is why advice like “always do X” or “never do Y” rarely holds up. Prompting isn’t a universal skill — it’s a model-specific practice. The most reliable way to improve results isn’t memorizing tricks, but understanding: . . ◉ . how the model tokenizes text, . . ◉ . how it handles context, . . ◉ . and how it allocates focus. Once you know those mechanics, adapting your approach from one model to another becomes straightforward. A Practical Prompting Checklist Before tweaking a prompt or adding more words, it helps to pause and run through a short mental check. This catches most problems early. 1. What sets the context? What idea, style, or mood appears first in the prompt? That’s the frame through which everything else will be interpreted. 2. What is the main focus? If you had to remove everything except one thing, what would stay? If that isn’t obvious, the model won’t know either. 3. What am I willing to lose? Every model has limits. Decide in advance which details are optional. If you don’t choose, the model will. 4. Am I near a token boundary? If important details are late in a long prompt, they’re at risk: . . ◉ . of being truncated, or . . ◉ . of being diluted into irrelevance. 5. Am I asking the model to obey or to decide? . . ◉ . Short, precise prompts → obedience . . ◉ . Long, descriptive prompts → interpretation Neither is wrong — but mixing them often causes surprises. Most prompting problems aren’t about missing knowledge. They’re about unexamined assumptions. This checklist forces you to: . . ◉ . clarify intent, . . ◉ . reduce competition, . . ◉ . and respect the model’s limits. Used consistently, it turns trial-and-error prompting into a controlled process, even across different models. And Finally ... Prompting isn’t a dark art, and it isn’t a collection of tricks. It’s a technical interaction with systems that have real constraints and predictable behaviour. Once you understand prompt context, focus, tokenization, and limits, a lot of the mystery disappears. When an image fails, it usually fails for a reason — not because you phrased something “wrong,” but because something was truncated, diluted, or reinterpreted along the way. The goal isn’t to write longer prompts or smarter prompts. It’s to write intentional ones. When you decide what matters most, set the context early, and respect the model’s limits, prompting stops feeling fragile. It becomes a controlled process, one you can adapt across models instead of relearning from scratch.