add gptneox embeddings, fix phi2 inputs, also fix the casting (#1083) 78c5b19 unverified winglian commited on Jan 11, 2024
be more robust about checking embedding modules for lora finetunes (#1074) [skip ci] 0f10080 unverified winglian commited on Jan 10, 2024
feature: better device mapping for large models (#918) bdfefaf unverified kallewoof Karl-Johan Alm winglian commited on Jan 5, 2024
bump transformers and update attention class map name (#1023) bcc78d8 unverified winglian commited on Jan 3, 2024
remove landmark attn and xpos rope implementations (#1010) 70b46ca unverified winglian commited on Dec 28, 2023
Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787) 1ffa386 unverified Nanobit commited on Dec 22, 2023
fix(tokenizer): handle fast tokenizer properly for bos/eos (#914) fde091c unverified Nanobit commited on Dec 8, 2023
feat: add check for quantized model (#913) a581e9f unverified Nanobit winglian commited on Dec 4, 2023
Support device_map=sequential & max_memory config parameters (#903) 992e742 unverified Bryan Thornbury winglian commited on Dec 4, 2023
allow overriding of model_config parameters from the YML (#853) 1bc1186 unverified winglian commited on Nov 16, 2023
fix(tokenizer): update log order after update (#806) 10388a8 unverified Nanobit commited on Oct 31, 2023
fix(config): Set eos/bos to tokenizer if different (#801) 637ed09 unverified Nanobit commited on Oct 29, 2023
refactor neft patch to be more re-usable similar to trl's impl (#796) 827ec3d unverified winglian commited on Oct 29, 2023
Fix(model): Linear detected and added to target module with rope linear (#738) 440c3ab unverified Nanobit commited on Oct 19, 2023
Fix: Higher vram usage for mistral and sample_packing (#691) 669f1d0 unverified Nanobit commited on Oct 6, 2023
flash_attention + sample packing for stablelm 3b (#671) 2d60ba3 unverified winglian commited on Oct 5, 2023
Fix: ValueError when FA + Mistral when padding_side=right (#681) eb480df unverified Nanobit commited on Oct 5, 2023
Fix(tokenizer): Set rstrip,lstrip,norm to False (#678) e0b7eea unverified Nanobit commited on Oct 5, 2023
Feat: Allow usage of native Mistral FA when no sample_packing (#669) 697c50d unverified Nanobit commited on Oct 4, 2023
skip some flash attn patches unless explicitly enabled (#643) 895f0a0 unverified winglian commited on Sep 27, 2023
btlm and falcon monkey patches for flash attn (#566) 6b9b229 unverified winglian commited on Sep 17, 2023
don't resize embeddings if it's already large enough (#577) 3607882 unverified winglian commited on Sep 15, 2023
Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified Glavin001 commited on Sep 13, 2023