How do I make the model output JSON?

#14
by vmajor - opened

Is it just an instruction or is there a particular prompt syntax that I need to follow? I am using a GGUF version if that matters.

Currently I am using it with pydantic but sometimes the output cannot be parsed correctly.

I believe most of the 100% JSON output is achieved by guided decoding (in addition to telling the model to generate JSON).

If you are using Ollama, there is a json mode: https://github.com/ollama/ollama/blob/main/docs/api.md#request-json-mode and the example scripts are at https://github.com/ollama/ollama/tree/main/examples/python-json-datagenerator

Just make it output yaml instead! Save money on tokens and a headache of parsing broken JSON from any LLM model.

yaml? I never even saw yaml mentioned as a possibility. Is this a serious suggestion or just something to consume more of my time? How do I get it to output yaml? Just asking it nicely, or is there a "proper" way?

Could you please share sample code how you make it work with Pydantic? And which framework are you using the serve the model?

I cannot get it to work with TGI (model loads fine).

Actually, I no longer use pydantic. I instead consulted the documentation (lol) and now have this:

response = self.client.chat.completions.create(
model="gpt-3.5-turbo-1106",
messages=[
{"role": "system", "content": "You are a professional business researcher analyzing manufacturer websites."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"},
temperature=0.1
)

I use llama.cpp to run the model using this command: "./llama-server -m ~models/qwen2.5-7b-instruct-q6_k-00001-of-00002.gguf -c 0 --mirostat 2 -fa -j {}

you can add your own flags for GPU offloading and other performance stuff.

Thank you very much. this solution also worked for me before, it will output Json. But if I want to use Pydantic, I cannot get it to work with TGI.

Sign up or log in to comment