Skip to content

Conversation

@Ando233
Copy link

@Ando233 Ando233 commented Jan 28, 2026

What does this PR do?

This PR adds a new representation autoencoder implementation, AutoencoderRAE, to diffusers.
Implements diffusers.models.autoencoders.autoencoder_rae.AutoencoderRAE with a frozen pretrained vision encoder (DINOv2 / SigLIP2 / ViT-MAE) and a ViT-MAE style decoder.
The decoder implementation is aligned with the RAE-main GeneralDecoder parameter structure, enabling loading of existing trained decoder checkpoints (e.g. model.pt) without key mismatches when encoder/decoder settings are consistent.
Adds unit/integration tests under diffusers/tests/models/autoencoders/test_models_autoencoder_rae.py.
Registers exports so users can import directly via from diffusers import AutoencoderRAE.

Fixes #13000

Before submitting

Usage

ae = AutoencoderRAE(
    encoder_cls="dinov2",
    encoder_name_or_path=encoder_path,
    image_size=image_size,
    encoder_input_size=image_size,
    patch_size=patch_size,
    num_patches=num_patches,
    decoder_hidden_size=1152,
    decoder_num_hidden_layers=28,
    decoder_num_attention_heads=16,
    decoder_intermediate_size=4096,
).to(device)
ae.eval()

state = torch.load(args.decoder_ckpt, map_location="cpu")
ae.decoder.load_state_dict(state, strict=False)

with torch.no_grad():
    recon = ae(x).sample

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul sayakpaul requested a review from kashif January 30, 2026 11:31
@sayakpaul
Copy link
Member

@bytetriper if you could take a look?

@kashif
Copy link
Contributor

kashif commented Jan 30, 2026

nice works @Ando233 checking

@kashif
Copy link
Contributor

kashif commented Jan 30, 2026

off the bat,

  • let's have a nice convention for the output datatype classes, have a look at the other autoencoder for the convention in difusers
  • some of the tests might need to be marked as slow and some paths are hard-coded

lets sort out these things and then re-look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RAE support

3 participants