Your LongClip and sd1.15
Is there a way to replace and use original Clipl_l in an SD1.5 model with yours?
Of course! How?
That really depends on what you are using. In ComfyUI, you can just put it in a loader node (ComfyUI natively supports Long-CLIP).
For diffusers / transformers, the instructions are in the model card.
In general, you might have to tell whatever you are using about this CLIP now having "248" tokens, and not "77". Searching for "77" in the code can be a good start. Please let me know what you are using if you need more help!
I mean a direct replacing in a model file.
As you said β load a checkpoint "Test", load LongClip and save a new checkpoint as UNET+VAE from "test" and Clip as LongClip. As I understand at this point I have a new checkpoint with old Unet, VAE and new Clip.
Do I need to do anything else to work with this new model in webui or Comfy?
In ComfyUI, it will just work (as Comfy natively supports Long-CLIP). For anything else, you'll have to test it; if you see some error about a mismatch between 77 and 248, the code does not natively support Long-CLIP, and you'd have to adjust the code (or open an issue on the GitHub repo and ask for implementation). But in general, it should 'just work', if Long-CLIP support is implemented. No matter if just inference / generating images, or using the wrapped model checkpoint to fine-tune the UNET - as long as the 248 tokens are supported, it works just like the original short-CLIP-L.
max_position_embeddings = 248
already in config.json of model would help with easy integration in tools
Guide to downloading ClipTextModel directly could also be nice (it's used in base StableDiffusion models so it could be plug and play in tools that use diffusers models).
Thanks @Heasterian - I added it to all Long-CLIP models. I really appreciate your contribution! π