Is GPT 1.5 Magic? An Interview With 'Chad'

By Dirty Old Biker

3/22/2026
Introduction I recently had a conversation with Chad (ChatGPT) about why GPT Image 1.5 works so well. Interview dirty_biker: Does DALL•E 3 / GPT Image contain an LLM? Chad: Short answer: yes — but not in the way you might think. DALL·E 3 / GPT Image uses an LLM as part of the pipeline, but it is not itself an LLM. • The LLM (like GPT): • Interprets your prompt • Expands/rewrites it into a detailed internal description • The image model (diffusion model): • Actually generates the image from that processed prompt So: • LLM = “understands and rewrites your intent” • Image model = “draws the picture” That combo is why GPT Image feels smarter than older models — the LLM is quietly fixing your prompt behind the scenes. If you want, I can break down exactly how the prompt rewriting works internally. dirty_biker: Sure, but first, what LLM is it comparable to, e.g., 4o, 5.1 mini, etc Chad: It’s closest to something in the GPT-4-class family, but tuned specifically for image prompting — not a general model like GPT-4o or GPT-5.1 mini. Think of it like this: • Stronger than mini models (way better at intent + visual detail extraction) • Not as broad as full GPT-4o / GPT-5.x (it’s specialized, not general-purpose) • Heavily fine-tuned for visual grounding • understands composition, lighting, style terms • expands vague prompts into structured scene descriptions So in practical terms: • Capability level ≈ mid-to-high GPT-4 tier • But narrowed to “describe an image perfectly” instead of “do everything” The key difference: A normal LLM thinks in language. This one thinks in “what would an image model need to know?” If you say “epic dragon scene” , it silently turns that into something like: • subject • scale • camera angle • lighting • environment • style cues Next step: I’ll show you what that transformation actually looks like. dirty_biker: Go ahead with the next step, please. Chad: Here’s what actually happens to your prompt behind the scenes. Step 1 — Your raw prompt epic dragon scene Step 2 — LLM expansion (internal rewrite) The LLM turns that into something closer to: A colossal dragon with massive wings spread wide, soaring above a rugged mountain landscape. The scene is viewed from a low-angle perspective to emphasize scale. Golden hour lighting casts warm highlights on the dragon’s scales, with dramatic shadows. Clouds swirl around the peaks, and atmospheric mist adds depth. Highly detailed, cinematic composition, sharp focus, dynamic lighting. Step 3 — Structured breakdown (implicit, not shown) Internally, it’s thinking more like: • Subject: colossal dragon • Pose/action: wings spread, soaring • Camera: low angle • Lighting: golden hour, dramatic shadows • Environment: mountains, clouds, mist • Style: cinematic, detailed Why this matters (for you) • If your prompt is vague, the LLM fills in the blanks • If your prompt is precise, the LLM mostly stays out of the way • If your prompt is conflicting, the LLM “resolves” it (sometimes incorrectly) That’s why GPT Image feels like it has “opinions” — it’s not just drawing, it’s interpreting first. Next step: I can show you how to control or bypass that rewriting so it stops “helping” too much. dirty_biker: Go ahead with the next step, please. Chad: Alright — here’s how you control or bypass the rewrite. Step 4 — Force the LLM to behave The trick is: give it structure so it doesn’t “get creative.” Instead of: epic dragon scene Use something like: Subject: colossal dragon Action: gliding, wings fully extended Camera: low angle, wide shot Lighting: overcast, soft diffuse light Environment: rocky mountains, no clouds, clear visibility Style: semi-realistic, grounded, no fantasy exaggeration What this does • Removes ambiguity → less rewriting • Locks decisions → fewer “creative guesses” • Prevents it from injecting defaults (like golden hour, mist, etc.) Step 5 — Block unwanted “auto-enhancements” If it keeps adding stuff you don’t want, explicitly negate the usual suspects: no dramatic lighting no golden hour no atmospheric fog no cinematic effects no exaggerated scale Step 6 — When you actually WANT the rewrite Lean into it with: highly detailed, cinematic, dramatic composition That basically tells the LLM: “Go nuts. Fill in the gaps.” Bottom line • Loose prompt → LLM takes control • Structured prompt → you take control If you want, next I can show you a hybrid style that gives you control without losing the “magic.” dirty_biker: That would be great, thanks. Chad: This is the sweet spot — controlled + still getting the “magic.” Step 7 — Hybrid prompt pattern You lock the important stuff, and leave intentional gaps for the LLM to enhance. Subject: colossal dragon Action: gliding over a mountain range Camera: low angle Lighting: dramatic, natural (LLM can enhance) Environment: rugged mountains (LLM can add atmosphere) Style: semi-realistic, cinematic detail Mood: awe-inspiring, grand scale Why this works • You anchor the core facts (dragon, pose, camera) • You delegate the polish (lighting, atmosphere, mood) • The LLM enhances — but inside your boundaries Step 8 — Control how much freedom it has Think of it like sliders: • More control • Specify everything (lighting, weather, time of day) • More magic • Use soft terms: • dramatic • moody • cinematic • atmospheric Step 9 — The “guardrail” trick Add a short constraint line at the end: Keep composition grounded and physically believable. This quietly stops: • insane proportions • weird physics • over-the-top fantasy nonsense What you get • Still benefits from the LLM’s visual intelligence • But it stops hijacking your intent That pattern works ridiculously well with GPT Image 1.5 in particular.