๐ Digital Odyssey: AI Image & Video Generation Platform ๐จ Welcome to our all-in-one AI platform for image and video generation! ๐ โจ Key Features
๐จ High-quality image generation from text ๐ฅ Video creation from still images ๐ Multi-language support with automatic translation ๐ ๏ธ Advanced customization options
๐ซ Unique Advantages
โก Fast and accurate results using FLUX.1-dev and Hyper-SD models ๐ Robust content safety filtering system ๐ฏ Intuitive user interface ๐ ๏ธ Extended toolkit including image upscaling and logo generation
๐ฎ How to Use
Enter your image or video description Adjust settings as needed Click generate Save and share your results automatically
Quite excited by the ModernBERT release! 0.15/0.4B small, 2T modern pre-training data and tokenizer with code, 8k context window, great efficient model for embeddings & classification!
This will probably be the basis for many future SOTA encoders! And I can finally stop using DeBERTav3 from 2021 :D
You are all happy ๐ that @meta-llama released Llama 3.
Then you are sad ๐ that it only has a context length of 8k.
Then you are happy ๐ that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.
But then you are sad ๐ข it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).
But then you are happy ๐ that the @GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).
Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" ๐.
The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuningโ๏ธ.
The training cycle is highly efficient, taking "only" ๐ 8 hours on a single 8xA800 (80G) GPU machine.
The model also preserves its original capability over short contexts. โ
The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.๐
The paper suggests that the context length could be extended far beyond 80K with more computation resources (๐ GPU-poor).
The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the โค๏ธ community.