File size: 2,161 Bytes
b6c994f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0972f71
b6c994f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
<div align="center">

# Xora️
</div>

This is the official repository for Xora.

## Table of Contents

* [Introduction](#introduction)
* [Installation](#installation)
* [Inference](#inference)
  * [Inference Code](#inference-code)
* [Acknowledgement](#acknowledgement)

## Introduction

The performance of Diffusion Transformers is heavily influenced by the number of generated latent pixels (or tokens). In video generation, the token count becomes substantial as the number of frames increases. To address this, we designed a carefully optimized VAE that compresses videos into a smaller number of tokens while utilizing a deeper latent space. This approach enables our model to generate high-quality 768x512 videos at 24 FPS, achieving near real-time speeds.

## Installation

# Setup
The codebase currently uses Python 3.10.5, CUDA version 12.2, and supports PyTorch >= 2.1.2.


```bash
git clone https://github.com/LightricksResearch/xora-core.git
cd xora-core

# create env
python -m venv env
source env/bin/activate
python -m pip install -e .\[inference-script\]
```

Then, download the model from [Hugging Face](https://huggingface.co/Lightricks/Xora) 

```python
from huggingface_hub import snapshot_download

model_path = 'PATH'   # The local directory to save downloaded checkpoint
snapshot_download("Lightricks/Xora", local_dir=model_path, local_dir_use_symlinks=False, repo_type='model')
```

## Inference

### Inference Code

To use our model, please follow the inference code in `inference.py` at [https://github.com/LightricksResearch/xora-core/blob/main/inference.py]():

For text-to-video generation:

```bash
python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --height HEIGHT --width WIDTH
```

For image-to-video generation:

```python
python inference.py --ckpt_dir 'PATH' --prompt "PROMPT" --input_image_path IMAGE_PATH --height HEIGHT --width WIDTH

```

## Acknowledgement

We are grateful for the following awesome projects when implementing Xora:
* [DiT](https://github.com/facebookresearch/DiT) and [PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha): vision transformers for image generation.


[//]: # (## Citation)