Back to features
features
2026-01-24

The Physics of Virtual Try-On: 2D Overlays vs. Generative Diffusion

Why do most try-on apps look fake? Learn the difference between simple AR overlays and Kombinlio's generative diffusion engine that simulates gravity, lighting, and fabric drape.

The Physics of Virtual Try-On

Rated 4.7/5 by 1,700+ reviewers

Why do most try-on apps look fake? Because they use 2D Overlays—digital stickers that ignore physics. Kombinlio uses Generative Diffusion to distinctively solve the "Paper Doll" problem.

🎯 See the Physics (Free): Download for iOS | Download for Android

The Engine Protocol

The Problem: Traditional AR overlays ignore gravity and lighting. The Solution: Generative Diffusion (Stable Diffusion adapted for Inpainting). The Pipeline:

  1. Semantic Segmentation: Isolate skin/hair.
  2. Pose Estimation: Map skeleton joints (OpenPose).
  3. Warping: Deform garment mesh to body topology.
  4. Generative Inpainting: "Dream" the texture with correct lighting.

1. Beyond the "Paper Doll" Effect

Traditional AR uses simple 2D superposition.

  • No Gravity: Clothes float.
  • No Occlusion: Dresses cover hands.
  • Lighting Mismatch: Studio-lit clothes on a bedroom-lit body.
Technical comparison. Left: A 2D overlay where the dress lacks shadow and depth (Paper Doll). Right: A Generative Diffusion output where the dress follows the body's curvature and casts correct shadows.
Figure 1: The Physics Difference. 2D overlays ignore gravity. Diffusion models simulate it.

2. The Mechanics of "Visual Hero"

Kombinlio's engine treats fashion as a physics simulation, not an image overlay.

Step 1: Semantic Masking

We use advanced segmentation models (like SAM - Segment Anything Model) to understand the image at a pixel level.

  • Hair: Locked.
  • Face: Locked.
  • Background: Locked.
  • Clothing Area: Marked for regeneration.
Visual representation of 'Semantic Masking'. The AI identifies key facial features and body landmarks, creating a 'Locked Mask' for the face to preserve identity while regenerating the clothing area.
Figure 2: Semantic Segmentation. We lock your identity pixel-by-pixel before the diffusion process begins.

Step 2: Denoising & Drift Control

The model generates new pixels to fill the "Clothing Area."

  • The Challenge: Generative AI can hallucinate (e.g., adding extra buttons).
  • The Fix: We use ControlNet to enforce the structural integrity of the garment, ensuring the generated pixels align with the original garment's edges and folds.

🛡️ Engineering Transparency: To prevent "Identity Drift" (where the user's face changes), Kombinlio uses a Reference-Only ControlNet for the face region. This ensures that while the body pixels are regenerated with new clothes, the facial pixels remain bit-exact or heavily constrained to the original photo.

3. The "Uncanny Valley" Solution

The "Uncanny Valley" happens when lighting is off. Our engine estimates Global Illumination. If your photo has a light source on the left, the virtual dress will cast a shadow on the right. This shadow coherence is what tricks the brain into perceiving reality.

High-fidelity VTO result showing a user in a complex lighting environment. The virtual garment reflects the ambient light and casts realistic self-shadows, demonstrating 'Global Illumination' estimation.
Figure 3: Global Illumination. Lighting consistency is the key to crossing the Uncanny Valley.

🧠 You don't have to do this manually. The personal stylist app automates the entire process from your phone.

4. Explore More


Experience the physics.