File size: 14,148 Bytes
30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa 479eb9c 30081aa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 |
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os; os.chdir('..')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"s= '''Ah, spring. It's our season of hope, a time when nature stirs from its winter slumber. The days lengthen, the frost surrenders its grip, and a world once dormant reawakens. The air, once frigid and crisp, transforms, carrying the gentle fragrance of blooming flowers and fresh grass that invigorates the senses. Spring, a time of magic, sees nature's dormant forces burst forth in a vivid spectacle of colors and life.\n",
"\n",
"The trees, once bare, now bud, and delicate green leaves unfurl, creating a lush canopy overhead. Cherry blossoms, daffodils, and tulips paint gardens and parks with their vibrant palettes, infusing the landscape with joy. Birds, returning from their long migrations, fill the air with their'''"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Ah, spring',\n",
" \"It's our season of hope, a time when nature stirs from its winter slumber\",\n",
" 'The days lengthen, the frost surrenders its grip, and a world once dormant reawakens',\n",
" 'The air, once frigid and crisp, transforms, carrying the gentle fragrance of blooming flowers and fresh grass that invigorates the senses',\n",
" \"Spring, a time of magic, sees nature's dormant forces burst forth in a vivid spectacle of colors and life\",\n",
" 'The trees, once bare, now bud, and delicate green leaves unfurl, creating a lush canopy overhead',\n",
" 'Cherry blossoms, daffodils, and tulips paint gardens and parks with their vibrant palettes, infusing the landscape with joy',\n",
" 'Birds, returning from their long migrations, fill the air with their']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import re\n",
"\n",
"def split_sentence(sentence:str):\n",
" # Create a regular expression pattern from the list of separators\n",
" sentence= sentence.replace('\\n', '')\n",
" separators = ['. ', '.', ':']\n",
" \n",
" pattern = '|'.join(map(re.escape, separators))\n",
"\n",
" # Split the sentence using the pattern as a delimiter\n",
" parts = re.split(pattern, sentence)\n",
"\n",
" return parts\n",
"\n",
"split_sentence(s)\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/ubuntu/SentenceStructureComparision/venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
]
}
],
"source": [
"from transformers import AutoTokenizer\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"gpt3_finetuned_model/checkpoint-30048\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"tokenizer = AutoTokenizer.from_pretrained(\"gpt2-large\")\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"def calculate_burst(list_of_sentences):\n",
" arr= []\n",
" for i in list_of_sentences:\n",
" ei= tokenizer(i, return_tensors=\"pt\")\n",
" arr.append(ei.input_ids.size(1))\n",
" print(f\"arr= {(arr)}\")\n",
" print(f'variance: {np.var(np.array(arr))}')\n",
" print(f'std: {np.std(np.array(arr))}')\n",
" print(f'average length: {np.average(np.array(arr))}')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['Ah, spring',\n",
" \"It's our season of hope, a time when nature stirs from its winter slumber\",\n",
" 'The days lengthen, the frost surrenders its grip, and a world once dormant reawakens',\n",
" 'The air, once frigid and crisp, transforms, carrying the gentle fragrance of blooming flowers and fresh grass that invigorates the senses',\n",
" \"Spring, a time of magic, sees nature's dormant forces burst forth in a vivid spectacle of colors and life\",\n",
" 'The trees, once bare, now bud, and delicate green leaves unfurl, creating a lush canopy overhead',\n",
" 'Cherry blossoms, daffodils, and tulips paint gardens and parks with their vibrant palettes, infusing the landscape with joy',\n",
" 'Birds, returning from their long migrations, fill the air with their']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list_of_sentences= split_sentence(s)\n",
"list_of_sentences"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr= [3, 18, 21, 28, 22, 21, 29, 15]\n",
"variance: 58.484375\n",
"std: 7.647507763971214\n",
"average length: 19.625\n"
]
}
],
"source": [
"calculate_burst(list_of_sentences)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Prediction\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
]
}
],
"source": [
"from transformers import AutoTokenizer\n",
"from transformers import AutoModelForSequenceClassification\n",
"import torch\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"gpt3_finetuned_model/checkpoint-30048\")\n",
"\n",
"model = AutoModelForSequenceClassification.from_pretrained(\"gpt3_finetuned_model/checkpoint-30048\")\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"from torch.nn import functional as F\n",
"def predict(sentence):\n",
" inputs = tokenizer(sentence, return_tensors=\"pt\")\n",
" with torch.no_grad():\n",
" logits = model(**inputs).logits\n",
" \n",
" print(\"logits: \", logits)\n",
" predicted_class_id = logits.argmax().item()\n",
" # get probabilities using softmax from logit score and convert it to numpy array\n",
" probabilities_scores = F.softmax(logits, dim = -1).numpy()[0]\n",
" print(\"P(Human): \", probabilities_scores[0])\n",
" print(\"P(AI): \", probabilities_scores[1])\n",
" label= \"Human Written\" if model.config.id2label[predicted_class_id]=='NEGATIVE' else 'AI written'\n",
" print(\"Label: \", label)\n",
" print(model.config.id2label[predicted_class_id])\n",
" \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"logits: tensor([[-7.7618, 7.7867]])\n",
"P(Human): 1.7674812e-07\n",
"P(AI): 0.9999999\n",
"Label: AI written\n",
"POSITIVE\n"
]
}
],
"source": [
"predict('''The Flash (or simply Flash) is the name of several superheroes in the DC Comics universe. Each iteration of the character possesses superhuman speed, allowing them to move at incredible velocities, run on water, phase through solid objects, and even time travel. The most iconic Flash is Barry Allen, who first appeared in 1956 and became the Scarlet Speedster known for his distinctive red costume with a lightning bolt emblem.\n",
"\n",
"Barry Allen's origin story involves a lightning strike combined with a chemical accident, granting him his incredible speed powers. He adopts the superhero persona of The Flash to fight crime in Central City. His adventures often revolve around thwarting supervillains and metahuman threats, while also serving as a founding member of the Justice League.\n",
"\n",
"''')"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"logits: tensor([[ 8.0190, -7.4839]])\n",
"P(Human): 0.99999976\n",
"P(AI): 1.8500727e-07\n",
"Label: Human Written\n",
"NEGATIVE\n"
]
}
],
"source": [
"predict(\n",
" '''The Flash first appeared in the Golden Age Flash Comics #1 (January 1940), from All-American Publications, one of three companies that would eventually merge to form DC Comics. Created by writer Gardner Fox and artist Harry Lampert, this Flash was Jay Garrick, a college student who gained his speed through the inhalation of hard water vapors. When re-introduced in the 1960s Garrick's origin was modified slightly, gaining his powers through exposure to heavy water.\n",
"\n",
"Jay Garrick was a popular character in the 1940s, supporting both Flash Comics and All-Flash Quarterly (later published bi-monthly as simply All-Flash); co-starring in Comic Cavalcade; and being a charter member of the Justice Society of America, the first superhero team, whose adventures ran in All Star Comics. With superheroes' post-war decline in popularity, Flash Comics was canceled with issue #104 (1949) which featured an evil version of the Flash called the Rival. The Justice Society's final Golden Age story ran in All Star Comics #57 (1951; the title itself continued as All Star Western).'''\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"logits: tensor([[ 7.9124, -7.3888]])\n",
"P(Human): 0.99999976\n",
"P(AI): 2.2633e-07\n",
"Label: Human Written\n",
"NEGATIVE\n"
]
}
],
"source": [
"predict(\n",
" '''Virat Kohli (Hindi pronunciation: [ʋɪˈɾɑːʈ ˈkoːɦli] ⓘ; born 5 November 1988) is an Indian international cricketer and the former captain of the Indian national cricket team who plays for Royal Challengers Bangalore in the IPL and Delhi in domestic cricket. Considered to be one of the best cricketers in the world, he is widely regarded as one of the greatest batsmen in the history of the sport.[4] Nicknamed \"The King\", due to his dominant style of play and popularity, Kohli holds numerous records in his career across all formats. In 2020, the International Cricket Council named him the male cricketer of the decade. Kohli has also contributed to India's successes, captaining the team from 2014 to 2022, and winning the 2011 World Cup and the 2013 Champions trophy. He is among the only four Indian cricketers who have played over 500 matches for India.[5]\n",
"\n",
"Born and raised in New Delhi, Kohli trained at the West Delhi Cricket Academy and started his youth career with the Delhi Under-15 team. He made his international debut in 2008 and quickly became a key player in the ODI team and later made his Test debut in 2011. In 2013, Kohli reached the number one spot in the ICC rankings for ODI batsmen for the first time. During 2014 T20 World Cup, he set a record for the most runs scored in the tournament. In 2018, he achieved yet another milestone, becoming the world's top-ranked Test batsman, making him the only Indian cricketer to hold the number one spot in all three formats of the game. His form continued in 2019, when he became the first player to score 20,000 international runs in a single decade. In 2021, Kohli made the decision to step down as the captain of the Indian national team for T20Is, following the T20 World Cup and in early 2022 he stepped down as the captain of the Test team as well.'''\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"logits: tensor([[-8.4224, 8.2709]])\n",
"P(Human): 5.6263374e-08\n",
"P(AI): 1.0\n",
"Label: AI written\n",
"POSITIVE\n"
]
}
],
"source": [
"predict(\n",
" '''Virat Kohli is an Indian cricketing sensation who has left an indelible mark on the world of sports. Born in Delhi, India, Kohli's journey from a young aspiring cricketer to becoming one of the greatest batsmen in the history of the game is nothing short of remarkable.\n",
"\n",
"Kohli's cricketing prowess was evident from a tender age, and he quickly rose through the ranks of junior cricket in India. He made his debut for the Indian national team in 2008, and since then, he has been a symbol of consistency and excellence. His distinctive blend of aggression and technical finesse at the crease has earned him a reputation as a modern-day batting maestro.'''\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
|