Since Huggingface has omitted to publish a standalone pytorch SmolLM2_360M_model.py to load and finetune and run inference of the released model weights and config at https://huggingface.co/HuggingFaceTB/SmolLM2-360M/ I have attempted to construct a pytorch model.py that can load and at least do inference mode using the published weights and config. One a functioning pytorch model.py is built, it may be possible to export a torchscript version of the SmolLM2 model that can be implemented on non-python hardware such as MPUs or Risc machines or Smartphones, in edge devices. The SmolLM2_360M_model.py runs but is unable to load the safetensors data. Here is the encountered error:
C:\Users\User\OneDrive\Desktop\SmolLM2>python SmolLM2_360M_model_debugging.py Warning: SentencePiece not found, using rudimentary BPE tokenizer. Install SentencePiece for better performance.
A module that was compiled using NumPy 1.x cannot be run in NumPy 2.1.3 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.
Traceback (most recent call last): File "C:\Users\User\OneDrive\Desktop\SmolLM2\SmolLM2_360M_model_debugging.py", line 470, in model = SmolLM2_360M(config_path) File "C:\Users\User\OneDrive\Desktop\SmolLM2\SmolLM2_360M_model_debugging.py", line 243, in init self.embed_tokens = nn.Embedding(self.vocab_size, self.hidden_size) File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\sparse.py", line 142, in init self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs), C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\sparse.py:142: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.) self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs), An error occurred while loading weights: File does not contain tensor lm_head.weight
C:\Users\User\OneDrive\Desktop\SmolLM2>
So, what is the story with safetensors "File does not contain tensor lm_head.weight"
Is there a python script for inspecting the safetensors file?
Why does model.safetensors file "not contain tensor lm_head.weight"?
Help Needed: Building a Standalone PyTorch SmolLM2-360M Model
The Hugging Face Hub hosts the SmolLM2-360M model (HuggingFaceTB/SmolLM2-360M), but currently lacks a standalone PyTorch model.py
file for loading, fine-tuning, and inference. This limits the model's usability outside the Hugging Face ecosystem.
I've started creating a SmolLM2_360M_model.py
file to address this gap, aiming for compatibility with all SmolLM2 models. The initial goal is to enable inference using the published weights and config. A successful PyTorch implementation would pave the way for exporting a TorchScript version, broadening accessibility to non-Python environments like microcontrollers, RISC-V machines, smartphones, and other edge devices.
The Challenge:
While my SmolLM2_360M_model.py
runs, it encounters problems loading the safetensors
data. I'm receiving the following error:
# Insert the full error message here, including traceback. This will help others diagnose the problem quickly.
# For example:
Traceback (most recent call last):
File "SmolLM2_360M_model.py", line 32, in <module>
model.load_state_dict(torch.load("pytorch_model.bin"))
File ".../python3.8/site-packages/torch/serialization.py", line 781, in load
with _open_file_like(f, 'rb') as opened_file:
FileNotFoundError: [Errno 2] No such file or directory: 'pytorch_model.bin'
Call to Action:
I'm seeking assistance from experienced PyTorch developers to debug the loading issue and complete the SmolLM2_360M_model.py
implementation. Your contributions will significantly expand the potential applications of SmolLM2.
Specific Areas Where Help is Needed:
- Safetensors Loading: Resolving the error encountered when loading the model weights from the safetensors file.
- Model Architecture Verification: Confirming the correctness of the PyTorch model architecture based on the config file.
- Inference Implementation: Ensuring the model can perform inference correctly.
- Fine-tuning Support (Optional): Adding functionality for fine-tuning the model on downstream tasks.
- TorchScript Export (Optional): Enabling export to TorchScript for deployment on resource-constrained devices.
How to Contribute:
- Fork the repository containing the
SmolLM2_360M_model.py
file. - Debug the code and implement the missing functionality.
- Submit a pull request with your changes.
By working together, we can make SmolLM2 more accessible and empower a wider range of users to leverage its capabilities. Thank you for your time and expertise!
P.S. Here's a technical breakdown of the process for creating a TorchScript version of the model and deploying it to various platforms:
1. TorchScript Creation:
- Trace or Script: TorchScript offers two ways to convert your PyTorch model: tracing and scripting. Tracing records the operations performed on example inputs, creating a static graph. Scripting directly parses the model code, supporting control flow. Scripting is preferred if your model uses dynamic control flow.
# Tracing Example
example_input = torch.randn(1, 3, 224, 224) # Example input
traced_model = torch.jit.trace(model, example_input)
# Scripting Example
scripted_model = torch.jit.script(model)
- Optimization (Optional): TorchScript provides optimization passes to improve the performance of the exported model.
optimized_model = torch.jit.optimize_for_inference(scripted_model)
- Saving: Save the TorchScript model to a file.
torch.jit.save(optimized_model, "smolLM2_360m.pt")
2. Deployment to Target Environments:
C++: LibTorch, the C++ API for PyTorch, can load and execute TorchScript models. Integrate
libTorch
into your C++ application for microcontroller, RISC-V, or other edge device deployments. This typically involves compiling your C++ code and linking againstlibTorch
.Android/iOS: Use the respective PyTorch Mobile libraries for these platforms. These libraries offer optimized runtime environments for executing TorchScript models within mobile applications.
Other Edge Devices: Depending on the device and its capabilities, explore options like using a custom runtime, or if available, a cross-compilation toolchain to target the device from your development environment.
Example C++ Deployment (Simplified):
#include <torch/script.h>
int main() {
// Load the TorchScript model
torch::jit::script::Module module = torch::jit::load("smolLM2_360m.pt");
// Prepare input tensor
// ... (Device-specific input tensor preparation) ...
// Run inference
std::vector<torch::jit::IValue> inputs;
inputs.push_back(input_tensor); // Add input tensor(s)
auto output = module.forward(inputs);
// Process output
// ... (Handle output tensor on the device) ...
return 0;
}
Key Considerations:
Hardware Limitations: Microcontrollers and other edge devices have limited resources. Model size and complexity may need adjustments (quantization, pruning) for optimal performance.
Platform-Specific Tooling: Each target platform has its own build system and toolchain. Familiarize yourself with these tools for successful deployment.
Cross-Compilation: If building directly on the target device isn't feasible, cross-compilation is necessary. This typically involves setting up a cross-compilation toolchain for the target architecture.
Debugging: Debugging on edge devices can be challenging. Thoroughly testing the TorchScript model within a more accessible environment (e.g., your development machine) before deploying is essential.
This expanded explanation provides a more complete roadmap for creating and deploying TorchScript versions of the SmolLM2 model. Remember to consult the official PyTorch and LibTorch documentation for platform-specific instructions and best practices.