fix(runner.sh): disable eager-loading so it using cuda graph (in order for parallel and faster processing) 6bb48e9 yusufs commited on 13 days ago
feat(llama3.2): run llama3.2 using bfloat16 with cache dtype fp8 with same model len 38d356a yusufs commited on Nov 29, 2024
feat(llama3.2): using Llama-3.2-3B-Instruct 0cb88a4f764b7a12671c53f0838cd831a0843b95 8b37c20 yusufs commited on Nov 29, 2024
feat(add-model): always download model during build, it will be cached in the consecutive builds 8679a35 yusufs commited on Nov 27, 2024