Fix(model): Linear detected and added to target module with rope linear (#738) 440c3ab unverified Nanobit commited on Oct 19, 2023
Fix: Higher vram usage for mistral and sample_packing (#691) 669f1d0 unverified Nanobit commited on Oct 6, 2023
flash_attention + sample packing for stablelm 3b (#671) 2d60ba3 unverified winglian commited on Oct 5, 2023
Fix: ValueError when FA + Mistral when padding_side=right (#681) eb480df unverified Nanobit commited on Oct 5, 2023
Fix(tokenizer): Set rstrip,lstrip,norm to False (#678) e0b7eea unverified Nanobit commited on Oct 5, 2023
Feat: Allow usage of native Mistral FA when no sample_packing (#669) 697c50d unverified Nanobit commited on Oct 4, 2023
skip some flash attn patches unless explicitly enabled (#643) 895f0a0 unverified winglian commited on Sep 27, 2023
btlm and falcon monkey patches for flash attn (#566) 6b9b229 unverified winglian commited on Sep 17, 2023
don't resize embeddings if it's already large enough (#577) 3607882 unverified winglian commited on Sep 15, 2023
Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified Glavin001 commited on Sep 13, 2023
Add support for GPTQ using native transformers/peft (#468) 3355706 unverified winglian commited on Sep 5, 2023
fsdp requires params be the same type too (#493) 98bf76e unverified winglian commited on Aug 28, 2023
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer (#489) 4c37bd0 unverified Nanobit commited on Aug 28, 2023
fix: finetune model inference needs the dtype fix to work with flash-attn f311df9 unverified Maxime commited on Aug 26, 2023
Fix(tokenizer): Fix condition to add pad token (#477) 71bd062 unverified Nanobit commited on Aug 25, 2023
recast loralayer, norm, lmhead + embed token weights per original qlora (#393) 96deb6b unverified winglian commited on Aug 21, 2023
don't pass rope_scaling kwarg if it's None (#383) 919246f unverified winglian commited on Aug 13, 2023
try to detect accelerate and only use device_map=None in that case (#373) 094fc2c unverified tmm1 commited on Aug 13, 2023
Attention mask and position id fixes for packing (#285) 2bb0b78 unverified winglian commited on Aug 12, 2023