Dear community,
Late in 2019, we introduced the concept of Pipeline in transformers, providing single-line of code inference for downstream NLP tasks. At that time we only supported a few tasks such as:
- Token Classification (ex: NER)
- Sentence Classification (ex: Sentiment Analysis)
- Question Answering
- Feature Extraction (i.e. computing embeddings from pretrained model)
The initial design was very simple and gave us satisfaction for quite a while. It allowed many new tasks to be integrated into this concept, among them:
- Fill mask
- Generation
- Summarization
- Translation
- Zero-Shot
Yet, over the last couple of months, we started to spot a few areas where the current pipeline’s design would not allow us to keep providing all the features we would like to. In this context, with the team, we started discussing a major refactoring of the pipelines in order to make it easier to address those points with potential:
- Increased Flexibility: We want you to be able to customize as much as possible pipelines.
- Performances: Ensure we have improved performances for the pipelines to go in production more easily.
- Training: Pipelines encapsulate some logic to make it easier to do stuff with SOTA NLP models. Training is one step of the overall process and pipeline should definitively support this.
Today, we would like to share with you what we have been thinking about the future for the Pipelines:
Main Architectural Changes:
- Framework Specific Pipeline
Pipelines are currently sharing the same code for both PyTorch and TensorFlow backend. We would like to split the implementation of pipelines in a framework-specific way.
Stepping in this direction will make pipelines more comparable to the rest of the transformers repository, where model implementations are done in a framework-specific fashion.
Also, by using a framework-specific approach, it will be possible to express the entire computation graph with operators provided by the framework. This is particularly appealing as it would allow us to provide training capabilities along with easier export of pipelines for inference.
- Tokenization and model configuration / argument passing
Currently, pipelines allow very few/no flexibility to change the behavior of the tokenizer/model and post-processing steps. For instance, it’s nearly impossible to change the maximum answer length when using the Question Answering pipeline…
Also, it is not possible to provide any parameters when creating the tokenizer for the pipeline…
As such, we would like to introduce the notion of Configuration. Configuration allows specifying all the elements you care about and saving them as part of the pipeline object. Configurations are essentially based on key/value mappings and can tune the overall behavior of every step involved in a pipeline. Finally, many different configurations can be attached to a pipeline, allowing to iterate and version all the elements at the same place.
Configurations are serialized and saved along with the tokenizer and model weights in order to make as easy as possible to save training configuration and deploying such pipelines.
Here is an example of the proposed design:
# Define some configs
grouped_entities_config = TokenClassificationConfig(
tokenizer={
"max_length": 512,
"padding": True,
"truncation": True
},
group_entities=True
)
ungrouped_entities_config = TokenClassificationConfig(
tokenizer={
"max_length": 512,
"padding": True,
"truncation": True
},
group_entities=False
)
# Create the pipeline
nlp = TokenClassificationPipeline.from_pretrained(
"dbmdz/electra-large-discriminator-finetuned-conll03-english"
)
# Register configurations
nlp.register_config("grouped", grouped_entities_config)
nlp.register_config("ungrouped", ungrouped_entities_config)
# Forward (output holds the input(s), the configuration which was used and the output(s))
default_outputs = nlp("My name is Morgan and I live in Paris") # Default config
grouped_outputs = nlp("My name is Morgan and I live in Paris", "grouped") # Config ref
ungrouped_outputs = nlp("My name is Morgan and I live in Paris", ungrouped_entities_config) # Config object
- Model export
One another point we would like to give more flexibility is the export and serialization of pipeline. As you may have seen, PyTorch and TensorFlow are very versatile frameworks but in the context of production and inference workloads, one can leverage more dedicated tools such as ONNX Runtime we have been collaborating a lot with over the past months.
By using the framework-specific approach described above, we would like to rely on the tracing mechanism provided by both PyTorch and TensorFlow to export most of the pipeline parts as a single ONNX graph/TorchScript/TF Graph to make it very easier to run an inference with it.
Still, not all pipelines would benefit from this feature at first as some of them require lot of computations which cannot be expressed with such operators, but we would like to tend to bring more and more release after release.
- Training capabilities
Last but not least, we would like to provide a unified experience while using pipelines by allowing the use of Pipeline object when training a model. This is made possible by the new, framework-specific, approach we discussed above as it allows us to express the overall computation graph.
As an example, the PyTorch implementation of such pipelines would be based on torch.nn.Module
hence providing all the tooling and integration required to train a model with the framework.
Along with providing better integration, we would like to propose framework-specific methods/syntaxes in order to better fit the usage of this framework. For instance, TensorFlow pipelines might benefit from a compile()
method or PyTorch ones to have a state_dict()
generator.
# Create the pipeline
nlp = TokenClassificationPipeline.from_pretrained(
"dbmdz/electra-large-discriminator-finetuned-conll03-english"
)
# Possibility to train ? PytorchPipeline inherits from torch.nn.Module
nlp.train(True)
optim = Adam(nlp.parameters(), lr=0.01)
for _ in some_data_loader:
# forward doesn't post-process the model's output(s)
logits, some_other_tensor = nlp.forward("My name is Morgan and I live in Paris")
loss = cross_entropy(logits, some_labels)
loss.backward()
optim.step()
optim.zero_grad()
All the points detailed here would apply for both PyTorch, TensorFlow, any other backend? (JAX? )
Also, do not hesitate to let us know if you see some points that would be useful to support in the new pipelines.
Morgan & team,