metadata
language:
- zh
- en
tags:
- chatglm
- glm
- onnx
- onnxruntime
ChatGLM-6B + ONNX
This model is exported from ChatGLM-6b with int8 quantization and optimized for ONNXRuntime inference. Export code in this repo.
Inference code with ONNXRuntime is uploaded with the model. Install requirements and run streamlit run web-ui.py
to start chatting. Currently the MatMulInteger
(for u8s8 data type) and DynamicQuantizeLinear
operators are only supported on CPU.
安装依赖并运行 streamlit run web-ui.py
预览模型效果。由于 ONNXRuntime 算子支持问题,目前仅能够使用 CPU 进行推理。
Usage
git lfs clone https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8
cd ChatGLM-6b-onnx-u8s8
pip install -r requirements.txt
streamlit run web-ui.py
Codes are released under MIT license.
Model weights are released under the same license as ChatGLM-6b, see MODEL LICENSE.