What are the text size limits for the input to and output from Dolly?

#51
by danrama - opened

I've set up a simple LangChain LLMChain (per the instructions in https://huggingface.co/databricks/dolly-v2-12b#langchain-usage), and I've been playing around with asking Dolly to generate JSON schema from some input JSON data. It works well in a small example, but when I gave it a larger input it seems to return truncated text. What are the default input/output text (token) limits and can these be changed via params?

This works fine:

Prompt>
Create a JSON schema for the following data:

{"title": "The Hitchiker's Guide to the Galaxy", "author": "Douglas Adams", "published": 1979}
{"title": "The Restaurant at the End of the Universe", "author": "Douglas Adams", "published": 1980}

Response (10s) >

    "$schema": "http://json-schema.org/draft/20191222/schema",
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "author": {"type": "string"},
        "published": {"type": "string"}
    },
    "required": ["title", "author", "published"]
}

But when I provided a much more complex JSON example it seems to return a truncated response.

Prompt>
Create a JSON schema for the following data:

[
    {
        "id": "0001",
        "type": "donut",
        "name": "Cake",
        "ppu": 0.55,
        "batters":
            {
                "batter":
                    [
                        { "id": "1001", "type": "Regular" },
                        { "id": "1002", "type": "Chocolate" },
                        { "id": "1003", "type": "Blueberry" },
                        { "id": "1004", "type": "Devil's Food" }
                    ]
            },
        "topping":
            [
                { "id": "5001", "type": "None" },
                { "id": "5002", "type": "Glazed" },
                { "id": "5005", "type": "Sugar" },
                { "id": "5007", "type": "Powdered Sugar" },
                { "id": "5006", "type": "Chocolate with Sprinkles" },
                { "id": "5003", "type": "Chocolate" },
                { "id": "5004", "type": "Maple" }
            ]
    },
    {
        "id": "0002",
        "type": "donut",
        "name": "Raised",
        "ppu": 0.55,
        "batters":
            {
                "batter":
                    [
                        { "id": "1001", "type": "Regular" }
                    ]
            },
        "topping":
            [
                { "id": "5001", "type": "None" },
                { "id": "5002", "type": "Glazed" },
                { "id": "5005", "type": "Sugar" },
                { "id": "5003", "type": "Chocolate" },
                { "id": "5004", "type": "Maple" }
            ]
    },
    {
        "id": "0003",
        "type": "donut",
        "name": "Old Fashioned",
        "ppu": 0.55,
        "batters":
            {
                "batter":
                    [
                        { "id": "1001", "type": "Regular" },
                        { "id": "1002", "type": "Chocolate" }
                    ]
            },
        "topping":
            [
                { "id": "5001", "type": "None" },
                { "id": "5002", "type": "Glazed" },
                { "id": "5003", "type": "Chocolate" },
                { "id": "5004", "type": "Maple" }
            ]
    }
]

Response (24s)>

{
        "type": "object",
        "properties": {
                "id": {
                        "type": "string"
                },
                "type": {
                        "type": "string"
                },
                "name": {
                        "type": "string"
                },
                "ppu": {
                        "type": "number",
                        "format": "percent"
                },
                "batters": {
                        "type": "array",
                        "items": {
                                "$ref": "#/definitions/batter"
                        }
                },
                "topping": {
                        "type": "array",
                        "items": {
                                "$ref": "#/definitions/topping"
                        }
                }
        },
        "definitions": {
                "batter": {
                        "type": "object",
                        "properties": {
                                "id": {
                                        "type": "string"
                                },
                                "type": {
                                        "type": "string"
                                },
                                "name": {
                                        "type": "string"
                                }
Databricks org

Increase max_new_tokens in the pipeline. I think the default is 256, and you will want to increase it to something much larger.

That worked, thanks!

srowen changed discussion status to closed

is there a maximum length that the model could accept ? looking for the doc on this parameter. Thanks a lot !

Databricks org

It's 2048. There's an open issue about whether this is set correctly in the config, but that's it. (Base model is 2048)

Sign up or log in to comment