Fishfishfishfishfish/LSTM-1225

the only files needed for inference is inference.py, word2idk.pkl, and lstm_Hxxx.safetensors

input tokens must be space separated, as they aren't tokenized like the training data is.

python inference.py --temp 0.5 --top_k 64 --model_file lstm_H256.safetensors --start_sequence "User : what is the capital of France ? Bot : " --max_length 32

usually results in something like

The capital of the world of the world of the world of the world of the

its not very accurate yet, its trained on only 1.2mb of text

Each safetensors file represents a different hidden dim value. Each trained for 1 epoch.

inference.py hidden dim value must be edited for each safetensors.

sequence_length = 64

batch_size = 16

learning_rate = 0.0001

embedding_dim = 256

num_layers = 4

Fishfishfishfishfish
/

LSTM-1225

Dataset used to train Fishfishfishfishfish/LSTM-1225