vectara
/

hallucination_evaluation_model

Text Classification

Transformers

Safetensors

English

HHEMv2Config

custom_code

Model card Files Files and versions Community

Miaoran000 commited on Aug 7, 2024

Commit

d6fdaee

1 Parent(s): 6f7b340

update readme

Browse files

Files changed (1) hide show

README.md +57 -2

README.md CHANGED Viewed

@@ -18,7 +18,11 @@ By "hallucinated" or "factually inconsistent", we mean that a text (hypothesis,
 A common type of hallucination in RAG is **factual but hallucinated**.
 For example, given the premise _"The capital of France is Berlin"_, the hypothesis _"The capital of France is Paris"_ is hallucinated -- although it is true in the world knowledge. This happens when LLMs do not generate content based on the textual data provided to them as part of the RAG retrieval process, but rather generate content based on their pre-trained knowledge.
-## Using HHEM-2.1-Open
 HHEM-2.1-Open can be loaded easily using the `transformers` library. Just remember to set `trust_remote_code=True` to take advantage of the pre-/post-processing code we provided for your convenience. The **input** of the model is a list of pairs of (premise, hypothesis). For each pair, the model will **return** a score between 0 and 1, where 0 means that the hypothesis is not evidenced at all by the premise and 1 means the hypothesis is fully supported by the premise.
@@ -44,7 +48,58 @@ model.predict(pairs) # note the predict() method. Do not do model(pairs).
 # tensor([0.0111, 0.6474, 0.1290, 0.8969, 0.1846, 0.0050, 0.0543])
 ```
-Note that the order of a pair is important. For example, notice how the 2nd and 3rd examples in the `pairs` list are consistent and hallcuianted, respectively.
 ## HHEM-2.1-Open vs. HHEM-1.0

 A common type of hallucination in RAG is **factual but hallucinated**.
 For example, given the premise _"The capital of France is Berlin"_, the hypothesis _"The capital of France is Paris"_ is hallucinated -- although it is true in the world knowledge. This happens when LLMs do not generate content based on the textual data provided to them as part of the RAG retrieval process, but rather generate content based on their pre-trained knowledge.
+## Using HHEM-2.1-Open with `transformers`
+HHEM-2.1 has some breaking change from HHEM-1.0. Your previous code will not work anymore. While we are working on backward compatibility, please follow the new usage instructions below.
+**Using with `Auto` class**
 HHEM-2.1-Open can be loaded easily using the `transformers` library. Just remember to set `trust_remote_code=True` to take advantage of the pre-/post-processing code we provided for your convenience. The **input** of the model is a list of pairs of (premise, hypothesis). For each pair, the model will **return** a score between 0 and 1, where 0 means that the hypothesis is not evidenced at all by the premise and 1 means the hypothesis is fully supported by the premise.
 # tensor([0.0111, 0.6474, 0.1290, 0.8969, 0.1846, 0.0050, 0.0543])
 ```
+**Using with `text-classification` pipeline**
+Please note that when using `text-classification` pipeline for prediction, scores for two labels will be returned for each pair. The score for **consistent** label is the one that should be focused on.
+```python
+from transformers import pipeline, AutoTokenizer
+pairs = [
+    ("The capital of France is Berlin.", "The capital of France is Paris."),
+    ('I am in California', 'I am in United States.'),
+    ('I am in United States', 'I am in California.'),
+    ("A person on a horse jumps over a broken down airplane.", "A person is outdoors, on a horse."),
+    ("A boy is jumping on skateboard in the middle of a red bridge.", "The boy skates down the sidewalk on a red bridge"),
+    ("A man with blond-hair, and a brown shirt drinking out of a public water fountain.", "A blond man wearing a brown shirt is reading a book."),
+    ("Mark Wahlberg was a fan of Manny.", "Manny was a fan of Mark Wahlberg.")
+]
+# Apply prompt to pairs
+prompt = "<pad> Determine if the hypothesis is true given the premise?\n\nPremise: {text1}\n\nHypothesis: {text2}"
+input_pairs = [prompt.format(text1=pair[0], text2=pair[1]) for pair in pairs]
+# Use text-classification pipeline to predict
+classifier = pipeline(
+            "text-classification",
+            model='vectara/hallucination_evaluation_model',
+            tokenizer=AutoTokenizer.from_pretrained('google/flan-t5-base'),
+            trust_remote_code=True
+        )
+classifier(input_pairs, return_all_scores=True)
+# output
+# [[{'label': 'hallucinated', 'score': 0.9889384508132935},
+#   {'label': 'consistent', 'score': 0.011061512865126133}],
+#  [{'label': 'hallucinated', 'score': 0.35263675451278687},
+#   {'label': 'consistent', 'score': 0.6473632454872131}],
+#  [{'label': 'hallucinated', 'score': 0.870982825756073},
+#   {'label': 'consistent', 'score': 0.1290171593427658}],
+#  [{'label': 'hallucinated', 'score': 0.1030581071972847},
+#   {'label': 'consistent', 'score': 0.8969419002532959}],
+#  [{'label': 'hallucinated', 'score': 0.8153750896453857},
+#   {'label': 'consistent', 'score': 0.18462494015693665}],
+#  [{'label': 'hallucinated', 'score': 0.9949689507484436},
+#   {'label': 'consistent', 'score': 0.005031010136008263}],
+#  [{'label': 'hallucinated', 'score': 0.9456764459609985},
+#   {'label': 'consistent', 'score': 0.05432349815964699}]]
+```
+You may run into a warning message that "Token indices sequence length is longer than the specified maximum sequence length". Please ignore this warning for now. It is a notification inherited from the foundation, T5-base.
+Note that the order of a pair is important. For example, the 2nd and 3rd examples in the `pairs` list are consistent and hallucinated, respectively.
 ## HHEM-2.1-Open vs. HHEM-1.0