Spaces:
Running
Running
abdullahmeda
commited on
Update app.py
Browse files
app.py
CHANGED
@@ -116,7 +116,7 @@ def predict(text):
|
|
116 |
|
117 |
with gr.Blocks() as demo:
|
118 |
gr.Markdown(
|
119 |
-
"""
|
120 |
## Detect text generated using LLMs 🤖
|
121 |
|
122 |
Linguistic features such as Perplexity and other SOTA methods such as GLTR were used to classify between Human written and LLM Generated \
|
@@ -124,31 +124,36 @@ with gr.Blocks() as demo:
|
|
124 |
|
125 |
- Source & Credits: [https://github.com/Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
|
126 |
- Competition: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard)
|
127 |
-
- Solution WriteUp: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224)
|
128 |
-
|
129 |
-
### Linguistic Analysis: Language Model Perplexity
|
130 |
-
The perplexity (PPL) is commonly used as a metric for evaluating the performance of language models (LM). It is defined as the exponential \
|
131 |
-
of the negative average log-likelihood of the text under the LM. A lower PPL indicates that the language model is more confident in its \
|
132 |
-
predictions, and is therefore considered to be a better model. The training of LMs is carried out on large-scale text corpora, it can \
|
133 |
-
be considered that it has learned some common language patterns and text structures. Therefore, PPL can be used to measure how \
|
134 |
-
well a text conforms to common characteristics.
|
135 |
-
|
136 |
-
### GLTR: Giant Language Model Test Room
|
137 |
-
This idea originates from the following paper: arxiv.org/pdf/1906.04043.pdf. It studies 3 tests to compute features of an input text. Their \
|
138 |
-
major assumption is that to generate fluent and natural-looking text, most decoding strategies sample high probability tokens from the head \
|
139 |
-
of the distribution. I selected the most powerful Test-2 feature, which is the number of tokens in the Top-10, Top-100, Top-1000, and 1000+ \
|
140 |
-
ranks from the LM predicted probability distributions.
|
141 |
-
|
142 |
-
### Modelling
|
143 |
-
Scikit-learn's VotingClassifier consisting of XGBClassifier, LGBMClassifier, CatBoostClassifier and RandomForestClassifier with default parameters
|
144 |
"""
|
145 |
)
|
146 |
-
|
147 |
-
|
148 |
-
|
149 |
-
|
150 |
-
|
151 |
-
|
152 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
153 |
|
154 |
demo.launch()
|
|
|
116 |
|
117 |
with gr.Blocks() as demo:
|
118 |
gr.Markdown(
|
119 |
+
"""\
|
120 |
## Detect text generated using LLMs 🤖
|
121 |
|
122 |
Linguistic features such as Perplexity and other SOTA methods such as GLTR were used to classify between Human written and LLM Generated \
|
|
|
124 |
|
125 |
- Source & Credits: [https://github.com/Hello-SimpleAI/chatgpt-comparison-detection](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)
|
126 |
- Competition: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/leaderboard)
|
127 |
+
- Solution WriteUp: [https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224](https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/470224)\
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
128 |
"""
|
129 |
)
|
130 |
+
with gr.Column():
|
131 |
+
gr.Markdown(
|
132 |
+
"""\
|
133 |
+
### Linguistic Analysis: Language Model Perplexity
|
134 |
+
The perplexity (PPL) is commonly used as a metric for evaluating the performance of language models (LM). It is defined as the exponential \
|
135 |
+
of the negative average log-likelihood of the text under the LM. A lower PPL indicates that the language model is more confident in its \
|
136 |
+
predictions, and is therefore considered to be a better model. The training of LMs is carried out on large-scale text corpora, it can \
|
137 |
+
be considered that it has learned some common language patterns and text structures. Therefore, PPL can be used to measure how \
|
138 |
+
well a text conforms to common characteristics.
|
139 |
+
|
140 |
+
### GLTR: Giant Language Model Test Room
|
141 |
+
This idea originates from the following paper: arxiv.org/pdf/1906.04043.pdf. It studies 3 tests to compute features of an input text. Their \
|
142 |
+
major assumption is that to generate fluent and natural-looking text, most decoding strategies sample high probability tokens from the head \
|
143 |
+
of the distribution. I selected the most powerful Test-2 feature, which is the number of tokens in the Top-10, Top-100, Top-1000, and 1000+ \
|
144 |
+
ranks from the LM predicted probability distributions.
|
145 |
+
|
146 |
+
### Modelling
|
147 |
+
Scikit-learn's VotingClassifier consisting of XGBClassifier, LGBMClassifier, CatBoostClassifier and RandomForestClassifier with default parameters\
|
148 |
+
"""
|
149 |
+
)
|
150 |
+
with gr.Group()
|
151 |
+
a1 = gr.Textbox( lines=7, label='Text', value=example )
|
152 |
+
button1 = gr.Button("🤖 Predict!")
|
153 |
+
gr.Markdown("Prediction:")
|
154 |
+
label1 = gr.Textbox(lines=1, label='Predicted Label')
|
155 |
+
score1 = gr.Textbox(lines=1, label='Predicted Probability')
|
156 |
+
|
157 |
+
button1.click(predict, inputs=[a1], outputs=[label1, score1])
|
158 |
|
159 |
demo.launch()
|