Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon May 9, 2024 • 12
A dynamic parallel method for performance optimization on hybrid CPUs Paper • 2411.19542 • Published Nov 29, 2024 • 5 • 2
Efficient Post-training Quantization with FP8 Formats Paper • 2309.14592 • Published Sep 26, 2023 • 10 • 2
Effective Quantization for Diffusion Models on CPUs Paper • 2311.16133 • Published Nov 2, 2023 • 4 • 1
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs Paper • 2309.05516 • Published Sep 11, 2023 • 9 • 2