Imagen: Breaking LLM Intuition (By: Haleema Tallat) - DevFest Lahore 2025

Lahore Haleema Tallat AI Engineer, IgniteTech Imagen: Breaking LLM Intuition

01 Why This Talk Exists

LLMs trained us to believe: ◦ Retries fix errors and
improve outputs ◦ Prompts specify precise outcomes ◦ Evaluation is cheap and straightforward ◦ Determinism is achievable with effort

Lahore Imagen does not try to be correct. It tries
to be plausible.

02 What Imagen Is Actually Modelling

03 Diffusion, Mechanically

Begin with xT, completely random pixels Gradually refine towards x₀
through learned steps All pixels update. Every denoising step modifies the entire image. Imagen generates images by reversing accumulated noise through iterative refinement:

• No privileged step: Each iteration has equal importance •
No "finishing" concept: Objects don't get completed sequentially • No symbolic structure: Imagen never knows it's drawing a hand

Editable Location Prompting is force applied to a probability field
— not instruction.

Why Imagen Is Inherently High-Variance Imagen samples from a wide,
multimodal distribution where language dramatically under-constraints visual outcomes.

04 First Broken Intuition: Retries Fix Things

In LLMs: • Temperature controls randomness • Retries with lower
temperature converge • Errors can be corrected • Quality improves with iterations

In Imagen: • Retries don't correct errors • They resample
the distribution • Each attempt is independent • No convergence guarantee

05 Second Broken Intuition: Prompts Specify Outcomes

Prompts guide distributions rather than enforce constraints. They bias what
is likely, not what is allowed.

• Spatial constraints saturate early ◦ Adding "on the left"
or "in the centre" provides diminishing control • Detail shifts style, not structure ◦ More descriptive prompts change texture and mood, not fundamental composition • Improvements hit limits fast ◦ Prompt engineering reaches a ceiling far earlier than with LLMs

06 Third Broken Intuition: Realism Implies Understanding

Matter at Hand • Imagen 4 generates more convincing hands
with better anatomical appearance • Finger count remains probabilistic, not enforced • No hand-level structural invariants exist in the model

Prompt: “Make line-art of a hand” Model: Imagen 4

Mirrors, Text, Symmetry • Imagen does not model: ◦ object
identity across space ◦ bidirectional consistency ◦ symbols as symbols • It models local pixel correlations • Realism collapses under global constraints.

Realism • learns how images usually look • captures texture,
lighting, style • makes unlikely images less likely • produces convincing surface coherence • “Does this look like something I’ve seen before?” Understanding • no enforced rules or invariants • no object identity across space • no constraint satisfaction • “Is this necessarily correct?”

What Improves Every Generation • texture fidelity • lighting &
materials • style imitation • photorealism -> surface plausibility • instruction following • compositional reasoning • long-range coherence • constraint satisfaction -> behavioral reliability Imagen LLMs

Lahore That’s not an accident — it’s what their objectives
reward.

07 What Can Engineers Do

Lahore Golden Rule: Never expose raw, unfiltered samples directly to
end users

Imagen in a Startup Product: Viggle • uses Imagen 3
to generate virtual characters from text • users describe characters • Imagen creates visually plausible characters for video • motion, timing, and storytelling are handled by the product

Imagen in a Startup Product: Cartwheel • uses Imagen 3
for text-to-character visual generation • creators describe characters in natural language • Imagen generates character visuals directly in the product • characters are then animated and exported fully rigged • Imagen supports creative ideation, not animation logic

08 Conclusion

Lahore LLMs trained us to expect generative models to behave
like software. But Imagen shows us: • outputs are sampled, not computed • realism can emerge without understanding • improvement does not imply new capabilities

Questions? Find me at: linkedin.com/haleema-tallat

Imagen: Breaking LLM Intuition (By: Haleema Tal...

Imagen: Breaking LLM Intuition (By: Haleema Tallat) - DevFest Lahore 2025

GDG Lahore PRO

More Decks by GDG Lahore

Other Decks in Programming

Featured

Transcript

Lahore Haleema Tallat AI Engineer, IgniteTech Imagen: Breaking LLM Intuition

01 Why This Talk Exists

LLMs trained us to believe: ◦ Retries fix errors and

Lahore Imagen does not try to be correct. It tries

02 What Imagen Is Actually Modelling

03 Diffusion, Mechanically

Begin with xT, completely random pixels Gradually refine towards x₀

• No privileged step: Each iteration has equal importance •

Editable Location Prompting is force applied to a probability field

Why Imagen Is Inherently High-Variance Imagen samples from a wide,

04 First Broken Intuition: Retries Fix Things

In LLMs: • Temperature controls randomness • Retries with lower

In Imagen: • Retries don't correct errors • They resample

05 Second Broken Intuition: Prompts Specify Outcomes

Prompts guide distributions rather than enforce constraints. They bias what

• Spatial constraints saturate early ◦ Adding "on the left"

06 Third Broken Intuition: Realism Implies Understanding

Matter at Hand • Imagen 4 generates more convincing hands

Prompt: “Make line-art of a hand” Model: Imagen 4

Mirrors, Text, Symmetry • Imagen does not model: ◦ object

Realism • learns how images usually look • captures texture,

What Improves Every Generation • texture fidelity • lighting &

Lahore That’s not an accident — it’s what their objectives

07 What Can Engineers Do

Lahore Golden Rule: Never expose raw, unfiltered samples directly to

Imagen in a Startup Product: Viggle • uses Imagen 3

Imagen in a Startup Product: Cartwheel • uses Imagen 3

08 Conclusion

Lahore LLMs trained us to expect generative models to behave

Questions? Find me at: linkedin.com/haleema-tallat