Documentation required for parameters of the Inference API

#9
by raihankhan - opened

For now, there is no documentation regarding the parameters of the Inference API. However, the API call returns an error when the input is too long. Hence, documentation for parameters such as max tokens are required to be mentioned in the README.md

Unfortunately, max_tokens will not necessarily work, you can probably use truncation.

import os

API_TOKEN = os.getenv("HF_API_TOKEN")
import json
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "/static-proxy?url=https%3A%2F%2Fapi-inference.huggingface.co%2Fmodels%2Froberta-base-openai-detector"
def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))
data = query({"inputs": "I like you. I love you" * 100, "parameters":  { "truncation": True}})
print(data)

This is undocumented indeed, will work on it since this model is getting a lot more usage. In general most pipeline parameters from transformers are supported and knowingly not necessarily documented (so we can modify/accept/reject them in the API where it makes sense. This is mostly to prevent abuse when we do it.

Is that clearer to you ? Does the solution work for you ?

Cheers,
Nicolas

hey @Narsil thanks for such a wonderful explanation. I just had one more small doubt : could you explain to me what truncation actually does?

Thanks in advance.

Regards,
Raihan

truncation works by dropping some tokens. In this case the rightmost ones (so the end of the text).

This is usually preferred for text-classification, since initial parts of the text are usually good enough (for regular text-classification, like is this text about politics or science).
There is no real way to make a finite range model work on large range. The best you can do is send each part of the text and try and come up with some kind of aggregate decision.
If the start is not openai generated, and the middle is, does that mean that the text was generated ? There's just no way to tell tbh. So the API, just like transformers pipeline reflect that by just throwing errors by default.

Does that answer your question ?

Yeah. Thanks for the awesome explanation!

Sign up or log in to comment