databricks/dolly-v2-12b · What are the text size limits for the input to and output from Dolly?

Apr 21, 2023

•

edited Apr 21, 2023

I've set up a simple LangChain LLMChain (per the instructions in https://huggingface.co/databricks/dolly-v2-12b#langchain-usage), and I've been playing around with asking Dolly to generate JSON schema from some input JSON data. It works well in a small example, but when I gave it a larger input it seems to return truncated text. What are the default input/output text (token) limits and can these be changed via params?

This works fine:

Prompt>
Create a JSON schema for the following data:

{"title": "The Hitchiker's Guide to the Galaxy", "author": "Douglas Adams", "published": 1979}
{"title": "The Restaurant at the End of the Universe", "author": "Douglas Adams", "published": 1980}

Response (10s) >

    "$schema": "http://json-schema.org/draft/20191222/schema",
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "author": {"type": "string"},
        "published": {"type": "string"}
    },
    "required": ["title", "author", "published"]
}

But when I provided a much more complex JSON example it seems to return a truncated response.

Prompt>
Create a JSON schema for the following data:

[
    {
        "id": "0001",
        "type": "donut",
        "name": "Cake",
        "ppu": 0.55,
        "batters":
            {
                "batter":
                    [
                        { "id": "1001", "type": "Regular" },
                        { "id": "1002", "type": "Chocolate" },
                        { "id": "1003", "type": "Blueberry" },
                        { "id": "1004", "type": "Devil's Food" }
                    ]
            },
        "topping":
            [
                { "id": "5001", "type": "None" },
                { "id": "5002", "type": "Glazed" },
                { "id": "5005", "type": "Sugar" },
                { "id": "5007", "type": "Powdered Sugar" },
                { "id": "5006", "type": "Chocolate with Sprinkles" },
                { "id": "5003", "type": "Chocolate" },
                { "id": "5004", "type": "Maple" }
            ]
    },
    {
        "id": "0002",
        "type": "donut",
        "name": "Raised",
        "ppu": 0.55,
        "batters":
            {
                "batter":
                    [
                        { "id": "1001", "type": "Regular" }
                    ]
            },
        "topping":
            [
                { "id": "5001", "type": "None" },
                { "id": "5002", "type": "Glazed" },
                { "id": "5005", "type": "Sugar" },
                { "id": "5003", "type": "Chocolate" },
                { "id": "5004", "type": "Maple" }
            ]
    },
    {
        "id": "0003",
        "type": "donut",
        "name": "Old Fashioned",
        "ppu": 0.55,
        "batters":
            {
                "batter":
                    [
                        { "id": "1001", "type": "Regular" },
                        { "id": "1002", "type": "Chocolate" }
                    ]
            },
        "topping":
            [
                { "id": "5001", "type": "None" },
                { "id": "5002", "type": "Glazed" },
                { "id": "5003", "type": "Chocolate" },
                { "id": "5004", "type": "Maple" }
            ]
    }
]

Response (24s)>

{
        "type": "object",
        "properties": {
                "id": {
                        "type": "string"
                },
                "type": {
                        "type": "string"
                },
                "name": {
                        "type": "string"
                },
                "ppu": {
                        "type": "number",
                        "format": "percent"
                },
                "batters": {
                        "type": "array",
                        "items": {
                                "$ref": "#/definitions/batter"
                        }
                },
                "topping": {
                        "type": "array",
                        "items": {
                                "$ref": "#/definitions/topping"
                        }
                }
        },
        "definitions": {
                "batter": {
                        "type": "object",
                        "properties": {
                                "id": {
                                        "type": "string"
                                },
                                "type": {
                                        "type": "string"
                                },
                                "name": {
                                        "type": "string"
                                }

srowen

Databricks org Apr 21, 2023

Increase max_new_tokens in the pipeline. I think the default is 256, and you will want to increase it to something much larger.

danrama

Apr 21, 2023

That worked, thanks!

srowen changed discussion status to closed Apr 21, 2023

anslume

Apr 21, 2023

is there a maximum length that the model could accept ? looking for the doc on this parameter. Thanks a lot !

srowen

Databricks org Apr 21, 2023

It's 2048. There's an open issue about whether this is set correctly in the config, but that's it. (Base model is 2048)