Spaces:

HIT-TMG
/

dialogue-bart-large-chinese

Runtime error

YanshekWoo commited on Dec 14, 2022

Commit

02f0e9a

1 Parent(s): 5a9366e

ADD history processor

Files changed (1) hide show

app.py CHANGED Viewed

@@ -4,10 +4,10 @@ from transformers import BertTokenizer, BartForConditionalGeneration
 title = "HIT-TMG/dialogue-bart-large-chinese"
 description = """
-This is a seq2seq model fine-tuned on several Chinese dialogue datasets, from bart-large-chinese.
-See some details of model card at https://huggingface.co/HIT-TMG/dialogue-bart-large-chinese .
-Besides starting the conversation from scratch, you can also input the whole dialogue history utterance by utterance seperated by '[SEP]' (e.g. "可以认识一下吗[SEP]当然可以啦，你好。[SEP]嘿嘿你好，请问你最近在忙什么呢？[SEP]我最近养了一只狗狗，我在训练它呢。").
-Please be careful that the history utterance turn should be odd, since this demo begins from user instead of the chatbot.
 """
@@ -31,12 +31,17 @@ def chat_func(input_utterance: str, history: Optional[List[str]] = None):
                           truncation=True,
                           max_length=max_length).input_ids
-    output_ids = model.generate(input_ids)[0]
     response = tokenizer.decode(output_ids, skip_special_tokens=True)
     history.append(response)
-    display_utterances = [(history[i], history[i + 1]) for i in range(0, len(history) - 1, 2)]
     return display_utterances, history

 title = "HIT-TMG/dialogue-bart-large-chinese"
 description = """
+This is a seq2seq model fine-tuned on several Chinese dialogue datasets, from bart-large-chinese. \n
+See some details of model card at https://huggingface.co/HIT-TMG/dialogue-bart-large-chinese . \n\n
+Besides starting the conversation from scratch, you can also input the whole dialogue history utterance by utterance seperated by '[SEP]'. \n
+(e.g. "可以认识一下吗[SEP]当然可以啦，你好。[SEP]嘿嘿你好，请问你最近在忙什么呢？[SEP]我最近养了一只狗狗，我在训练它呢。") \n
 """
                           truncation=True,
                           max_length=max_length).input_ids
+    output_ids = model.generate(input_ids,
+                                max_new_tokens=30)[0]
     response = tokenizer.decode(output_ids, skip_special_tokens=True)
     history.append(response)
+    if len(history) % 2 == 0:
+        display_utterances = [(history[i], history[i + 1]) for i in range(0, len(history) - 1, 2)]
+    else:
+        display_utterances = [("", history[0])] + [(history[i], history[i + 1]) for i in range(1, len(history) - 1, 2)]
     return display_utterances, history