{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "807c463e-cd0f-4ffb-b974-b19a33a674bb",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Demo of the KAILAS UAT labeler capabilities\n",
    "This notebooks shows how to use KAILAS to automatically tag text with Unified Astronomy Thesaurus concepts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fbf6cb4-1110-4883-9ffd-d39d81e4301a",
   "metadata": {},
   "source": [
    "## Preliminaries\n",
    "1. load UAT concepts\n",
    "2. load a dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9bc5eaae-4e37-425f-9a0e-b6a7e8eca3e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We need to build version of the UAT that is more suited to our needs\n",
    "# Download the original here: https://github.com/astrothesaurus/UAT/blob/master/UAT.json\n",
    "# and replace the path to it below\n",
    "# build the UAT labels dict\n",
    "import json\n",
    "with open('../data/UAT/UAT_list.json', 'r') as f:\n",
    "    uat_list = json.load(f)\n",
    "\n",
    "# build the dict that matches UAT ID (numbers) to common names\n",
    "uat_names = {}\n",
    "for entry in uat_list:\n",
    "    uat_id = entry['uri'].split('/')[-1]\n",
    "    uat_names[uat_id] = entry['name'].lower().strip()\n",
    "\n",
    "# sort by key\n",
    "uat_names = dict(sorted(uat_names.items()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "688bdebb-f711-49c7-8c1b-f9ede9529ce8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the open dataset\n",
    "from datasets import load_dataset\n",
    "uat_dataset = load_dataset('adsabs/SciX_UAT_keywords')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d4f2fd73-9cd1-44cb-bf7a-df3c91e19509",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DatasetDict({\n",
       "    val: Dataset({\n",
       "        features: ['bibcode', 'title', 'abstract', 'verified_uat_ids', 'verified_uat_labels'],\n",
       "        num_rows: 3025\n",
       "    })\n",
       "    train: Dataset({\n",
       "        features: ['bibcode', 'title', 'abstract', 'verified_uat_ids', 'verified_uat_labels'],\n",
       "        num_rows: 18677\n",
       "    })\n",
       "})"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Hugginface datasets can be interface both by rows (int) or columns (str)\n",
    "uat_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2a9c73b-1d0b-4c8b-92e5-d964b8191003",
   "metadata": {},
   "source": [
    "## Main Demo\n",
    "1. create the prediction pipeline\n",
    "2. make your predictions\n",
    "3. format predictions for readability"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f7e90b5-8aa0-4272-9c3c-ff365c95a0f9",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# create the pipeline\n",
    "\n",
    "from transformers import pipeline, AutoTokenizer\n",
    "\n",
    "model_path = 'adsabs/KAILAS'\n",
    "revision = None\n",
    "\n",
    "# sentiment-analysis means loading ModelForSequenceClassification\n",
    "pipe = pipeline(task='sentiment-analysis',\n",
    "                model=model_path,\n",
    "                tokenizer=AutoTokenizer.from_pretrained(model_path, \n",
    "                                                        model_max_length=512, \n",
    "                                                        do_lower_case=False,\n",
    "                                                       ),\n",
    "                revision=revision,\n",
    "                num_workers=1,\n",
    "                batch_size=32,\n",
    "                return_all_scores=True,\n",
    "                truncation=True,\n",
    "               )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "bf6071ce-692c-4870-a0c1-1d8249614227",
   "metadata": {},
   "outputs": [],
   "source": [
    "# custom top_k function \n",
    "import heapq\n",
    "\n",
    "def top_k_scores(scores, k):\n",
    "    return(heapq.nlargest(k, scores, key=lambda x: x['score']) )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "1ae70736-19fd-42c7-bd5f-7491e6a97cd8",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "BIBCODE: 2022ApJ...933..110X\n",
      "\tTITLE: Spatially Resolved Ionized Outflows Extending to   2 kpc in Seyfert 1 Galaxy NGC 7469 Revealed by the Very Large Telescope/MUSE\n",
      "\tABSTRACT: The Seyfert 1 galaxy NGC 7469 possesses a prominent nuclear starburst ring and a luminous active galactic nucleus (AGN). Evidence of an outflow in the innermost nuclear region has been found in previous works. We detect the ionized gas outflow on a larger scale in the galaxy using the archival Very Large Telescope/MUSE and Chandra observations. The optical emission lines are modeled using two Gaussian components, and a nonparametric approach is applied to measure the kinematics of [O III] and Hα emitting gas. Line ratio diagnostics and spatially resolved maps are derived to examine the origin of the outflow. The kiloparsec-scale kinematics of [O III] are dominated by a blueshifted component whereas the velocity map of Hα shows a rotational disk with a complex nonrotational substructure. The starburst wind around the circumnuclear ring is confirmed, and we find evidence of an AGN-driven outflow extending to a radial distance of ~ 2 kpc from the nucleus, with a morphology consistent with a nearly face-on ionization cone. The previously reported circumnuclear outflow resembles part of the bright base. We derive mass and energy outflow rates for both the starburst wind and the AGN-driven outflow. The estimated kinetic coupling efficiency of the kiloparsec-scale AGN outflow is ${\\dot{E}}_{\\mathrm{out}}/{L}_{\\mathrm{bol}}\\sim 0.1 \\% $ Ėout/Lbol~0.1% , lower than the threshold predicted by the \"two-stage\" theoretical model for effective feedback. Our results reinforce the importance of spatially resolved study to disentangle feedback where AGNs and starbursts coexist, which may be common during the cosmic noon of black hole and galaxy growth.\n",
      "\n",
      "AUTHOR ASSIGNED: ['seyfert galaxies', 'luminous infrared galaxies', 'interstellar medium wind']\n",
      "MODEL ASSIGNED: [('galactic winds', '0.9747'), ('active galactic nuclei', '0.4349'), ('starburst galaxies', '0.3764'), ('galaxy evolution', '0.2259')]\n",
      "\n",
      "NEXT SCORES [('stellar feedback', '0.0507'), ('agn host galaxies', '0.0381'), ('star formation', '0.0153'), ('galaxy interactions', '0.0127'), ('galaxy kinematics', '0.0120'), ('active galaxies', '0.0085'), ('interstellar medium', '0.0061'), ('galaxy winds', '0.0045'), ('seyfert galaxies', '0.0020'), ('hydrodynamical simulations', '0.0018'), ('compact galaxies', '0.0018')]\n",
      "\n",
      "\n",
      "BIBCODE: 2023ApJ...958...52T\n",
      "\tTITLE: X-Ray Spectral Variations of Circinus X-1 Observed with NICER throughout an Entire Orbital Cycle\n",
      "\tABSTRACT: Circinus X-1 (Cir X-1) is a neutron star binary with an elliptical orbit of 16.6 days. The source is unique for its extreme youth, providing a key to understanding early binary evolution. However, its X-ray variability is too complex to reach a clear interpretation. We conducted the first high-cadence (every 4 hr, on average) observations covering one entire orbit using the NICER X-ray telescope. The X-ray flux behavior can be divided into stable, dip, and flaring phases. The X-ray spectra in all phases can be described by a common model consisting of a partially covered disk blackbody emission and the line features from a highly ionized photoionized plasma. The spectral change over the orbit is attributable to rapid changes of the partial covering medium in the line of sight and gradual changes of the disk blackbody emission. Emission lines of H- and He-like Mg, Si, S, and Fe are detected, most prominently in the dip phase. The Fe emission lines change to absorption in the course of the transition from the dip phase to the flaring phase. The estimated ionization degree indicates no significant changes, suggesting that the photoionized plasma is stable over the orbit. We propose a simple model in which the disk blackbody emission is partially blocked by a local medium in the line of sight that has spatial structures depending on the azimuth of the accretion disk. Emission lines upon the continuum emission are from the photoionized plasma located outside of the blocking material.\n",
      "\n",
      "AUTHOR ASSIGNED: ['x-ray binary stars', 'atomic spectroscopy', 'spectroscopy', 'ionization', 'plasma astrophysics', 'high energy astrophysics']\n",
      "MODEL ASSIGNED: [('x-ray astronomy', '0.9219'), ('high mass x-ray binary stars', '0.3562')]\n",
      "\n",
      "NEXT SCORES [('x-ray binary stars', '0.1496'), ('x-ray sources', '0.0848'), ('polarimetry', '0.0595'), ('black hole physics', '0.0335'), ('radio jets', '0.0133'), ('x-ray active galactic nuclei', '0.0117'), ('active galactic nuclei', '0.0066'), ('non-thermal radiation sources', '0.0048'), ('x-ray observatories', '0.0044'), ('x-ray detectors', '0.0037'), ('symbiotic binary stars', '0.0032'), ('extragalactic astronomy', '0.0031'), ('spectroscopy', '0.0026'), ('gamma-ray bursts', '0.0022'), ('high energy astrophysics', '0.0021'), ('quiet solar corona', '0.0020'), ('astrophysical black holes', '0.0017'), ('spectropolarimetry', '0.0015')]\n",
      "\n",
      "\n",
      "BIBCODE: 2024AJ....167...64B\n",
      "\tTITLE: VLTI/GRAVITY Provides Evidence the Young, Substellar Companion HD 136164 Ab Formed Like a \"Failed Star\"\n",
      "\tABSTRACT: Young, low-mass brown dwarfs orbiting early-type stars, with low mass ratios (q ≲ 0.01), appear to be intrinsically rare and present a formation dilemma: could a handful of these objects be the highest-mass outcomes of \"planetary\" formation channels (bottom up within a protoplanetary disk), or are they more representative of the lowest-mass \"failed binaries\" (formed via disk fragmentation or core fragmentation)? Additionally, their orbits can yield model-independent dynamical masses, and when paired with wide wavelength coverage and accurate system age estimates, can constrain evolutionary models in a regime where the models have a wide dispersion depending on the initial conditions. We present new interferometric observations of the 16 Myr substellar companion HD 136164 Ab (HIP 75056 Ab) made with the Very Large Telescope Interferometer (VLTI)/GRAVITY and an updated orbit fit including proper motion measurements from the Hipparcos-Gaia Catalog of Accelerations. We estimate a dynamical mass of 35 ± 10 M <SUB>J</SUB> (q ~ 0.02), making HD 136164 Ab the youngest substellar companion with a dynamical mass estimate. The new mass and newly constrained orbital eccentricity (e = 0.44 ± 0.03) and separation (22.5 ± 1 au) could indicate that the companion formed via the low-mass tail of the initial mass function. Our atmospheric fit to a SPHINX M-dwarf model grid suggests a subsolar C/O ratio of 0.45 and 3 × solar metallicity, which could indicate formation in a circumstellar disk via disk fragmentation. Either way, the revised mass estimate likely excludes bottom-up formation via core accretion in a circumstellar disk. HD 136164 Ab joins a select group of young substellar objects with dynamical mass estimates; epoch astrometry from future Gaia data releases will constrain the dynamical mass of this crucial object further.\n",
      "\n",
      "AUTHOR ASSIGNED: ['brown dwarfs', 'substellar companion stars', 'orbit determination', 'orbits']\n",
      "MODEL ASSIGNED: [('radial velocity', '0.8792'), ('exoplanets', '0.8334'), ('exoplanet detection methods', '0.5305'), ('exoplanet dynamics', '0.3328'), ('extrasolar gaseous giant planets', '0.2636')]\n",
      "\n",
      "NEXT SCORES [('direct imaging', '0.0761'), ('transit photometry', '0.0569'), ('exoplanet atmospheres', '0.0440'), ('exoplanet evolution', '0.0425'), ('exoplanet structure', '0.0157'), ('brown dwarfs', '0.0146'), ('exoplanet systems', '0.0138'), ('solar-terrestrial interactions', '0.0083'), ('atmospheric composition', '0.0079'), ('natural satellites (extrasolar)', '0.0056'), ('interferometers', '0.0056'), ('gaussian processes regression', '0.0049'), ('exoplanet atmospheric variability', '0.0043'), ('exoplanet atmospheric composition', '0.0028'), ('astrometry', '0.0023'), ('interplanetary magnetic fields', '0.0019'), ('solar analogs', '0.0016')]\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# MAIN DEMO \n",
    "# pick some samples from our dataset\n",
    "# this is a list of strings\n",
    "\n",
    "num_pred = 3\n",
    "start = 510\n",
    "\n",
    "temp_dataset = uat_dataset['val'][start:start+num_pred]\n",
    "sentences = [str(t)+' '+str(a) for t,a in zip(temp_dataset['title'],\n",
    "                                      temp_dataset['abstract'])\n",
    "             if t\n",
    "            ]\n",
    "\n",
    "# make predictions\n",
    "all_sentence_scores = pipe(sentences)\n",
    "\n",
    "# we need to change the output of the model to strings to make it compatible with the next version.\n",
    "# it's best to think of the outputs as labels anyways, not as integers\n",
    "all_sentence_scores = [[{'label':str(s['label']), 'score':s['score']} for s in sample_scores] for sample_scores in all_sentence_scores]\n",
    "\n",
    "# format for readability, and show top k scores\n",
    "threshold = 0.15\n",
    "top_sentence_scores = [[ {'label':uat_names[l['label']], 'score':l['score']} \n",
    "                        for l in top_k_scores(s, k=1000) if l['score']>=threshold] \n",
    "                       for s in all_sentence_scores]\n",
    "\n",
    "next_sentence_scores = [[ {'label':uat_names[l['label']], 'score':l['score']} \n",
    "                        for l in top_k_scores(s, k=1000) if l['score']<=threshold and  l['score']>=0.01*threshold] \n",
    "                       for s in all_sentence_scores]\n",
    "\n",
    "for i in range(min(10,num_pred)):\n",
    "    print('BIBCODE:', temp_dataset['bibcode'][i])\n",
    "    print('\\tTITLE:', temp_dataset['title'][i])\n",
    "    print('\\tABSTRACT:', temp_dataset['abstract'][i])\n",
    "    print()\n",
    "    print('AUTHOR ASSIGNED:', temp_dataset['verified_uat_labels'][i])\n",
    "\n",
    "    if len(top_sentence_scores[i])>0:\n",
    "        print('MODEL ASSIGNED:', [(x['label'], '{:.4f}'.format(x['score'])) for x in top_sentence_scores[i]] )\n",
    "        print()\n",
    "        print('NEXT SCORES', [(x['label'], '{:.4f}'.format(x['score'])) for x in next_sentence_scores[i]] )\n",
    "    \n",
    "    print() \n",
    "    print()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "9d90c6a5-7a0a-4762-ae5d-33523bd6bc9a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Note: truncation is in effect and long sentences will only take into account the first 512 tokens\n",
    "sentences = [' '.join(['1' for i in range(j) ]) for j in range(505,515)]\n",
    "sentence_scores = pipe(sentences)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "e4b80a73-577a-464c-add2-2356c3adff18",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(505, [{'label': 2189, 'score': 0.8530406355857849}]),\n",
       " (506, [{'label': 2189, 'score': 0.8583861589431763}]),\n",
       " (507, [{'label': 2189, 'score': 0.8545922040939331}]),\n",
       " (508, [{'label': 2189, 'score': 0.8484249114990234}]),\n",
       " (509, [{'label': 2189, 'score': 0.8524807095527649}]),\n",
       " (510, [{'label': 2189, 'score': 0.8559486269950867}]),\n",
       " (511, [{'label': 2189, 'score': 0.8559486865997314}]),\n",
       " (512, [{'label': 2189, 'score': 0.8559486269950867}]),\n",
       " (513, [{'label': 2189, 'score': 0.8559486269950867}]),\n",
       " (514, [{'label': 2189, 'score': 0.8559486865997314}])]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[(len(sent.split()), top_k_scores(scores, k=1)) for sent, scores in zip(sentences,sentence_scores)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0cd3c521-b25e-4768-b3f8-14f28d9c9c48",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Demo with manually input  astro sentences\n",
    "sentences = ['This work discusses a junction-less nanowire tunnel field effect transistor (JLN-TFET) that combines the advantages of a junction-less field effect transistor (JLFET) and a tunnel field effect transistor (TFET). With a hetero-structure device made of silicon (Si) and germanium (Ge), an amalgamation of gate engineering and channel engineering is investigated. To eliminate junctions in the structure, a uniformly high dosage of doping (1019cm-3) has been employed throughout. In contrast to the source work function, which is set at 5.93 eV, the gate work function is set at 4.5 eV. When compared to junction less nanowire tunnel FET (JLTFET), the modified gate-all-around hetero junction less nanowire tunnel field effect transistor (GAA-H-JLNTFET) performs better. The proposed structure GAA-H-JLNTFET exhibits the ON current (ION) 6.5 × 10−5µA/m, the off current (IOFF) measures 2.97 × 10−20µA/m, the subthreshold slope (SS) is 12mV/Dec, and ION/IOFFis≈1015which makes them immune to short channel effect and suitable for low power application in Nano regime. Further, in this work, the proposed structure is utilized to implement the dielectric modulated low-power biosensor. The drain current is taken as the sensitivity parameter. Five different biomolecules sensitivity are measured and found better than the previous published results. For the simulation and analysis, the Silvaco Atlas 2D simulator with non-local band-to-band tunneling is used. ',\n",
    "             'We report observations from the Hubble Space Telescope (HST) of Cepheid variables in the host galaxies of 42 Type Ia supernovae (SNe Ia) used to calibrate the Hubble constant (H 0). These include the complete sample of all suitable SNe Ia discovered in the last four decades at redshift z ≤ 0.01, collected and calibrated from ≥1000 HST orbits, more than doubling the sample whose size limits the precision of the direct determination of H 0. The Cepheids are calibrated geometrically from Gaia EDR3 parallaxes, masers in NGC 4258 (here tripling that sample of Cepheids), and detached eclipsing binaries in the Large Magellanic Cloud. All Cepheids in these anchors and SN Ia hosts were measured with the same instrument (WFC3) and filters (F555W, F814W, F160W) to negate zero-point errors. We present multiple verifications of Cepheid photometry and six tests of background determinations that show Cepheid measurements are accurate in the presence of crowded backgrounds. The SNe Ia in these hosts calibrate the magnitude-redshift relation from the revised Pantheon+ compilation, accounting here for covariance between all SN data and with host properties and SN surveys matched throughout to negate systematics. We decrease the uncertainty in the local determination of H 0 to 1 km s-1 Mpc-1 including systematics. We present results for a comprehensive set of nearly 70 analysis variants to explore the sensitivity of H 0 to selections of anchors, SN surveys, redshift ranges, the treatment of Cepheid dust, metallicity, form of the period-luminosity relation, SN color, peculiar-velocity corrections, sample bifurcations, and simultaneous measurement of the expansion history. Our baseline result from the Cepheid-SN Ia sample is H 0 = 73.04 ± 1.04 km s-1 Mpc-1, which includes systematic uncertainties and lies near the median of all analysis variants. We demonstrate consistency with measures from HST of the TRGB between SN Ia hosts and NGC 4258, and include them simultaneously to yield 72.53 ± 0.99 km s-1 Mpc-1. The inclusion of high-redshift SNe Ia yields H 0 = 73.30 ± 1.04 km s-1 Mpc-1 and q 0 = -0.51 ± 0.024. We find a 5σ difference with the prediction of H 0 from Planck cosmic microwave background observations under ΛCDM, with no indication that the discrepancy arises from measurement uncertainties or analysis variations considered to date. The source of this now long-standing discrepancy between direct and cosmological routes to determining H 0 remains unknown.',\n",
    "             'We use archival COBE/DIRBE data to construct a map of polycyclic aromatic hydrocarbon (PAH) emission in the λ-Orionis region. The presence of the 3.3 μm PAH feature within the DIRBE 3.5 μm band and the corresponding lack of significant PAH spectral features in the adjacent DIRBE bands (1.25, 2.2, and 4.9 μm) enable estimation of the PAH contribution to the 3.5 μm data. Having the shortest wavelength of known PAH features, the 3.3 μm feature probes the smallest PAHs, which are also the leading candidates for carriers of anomalous microwave emission (AME). We use this map to investigate the association between the AME and the emission from PAH molecules. We find that the spatial correlation in λ-Orionis is higher between AME and far-infrared dust emission (as represented by the DIRBE 240 μm map) than it is between our PAH map and AME. This finding, in agreement with previous studies using PAH features at longer wavelengths, is in tension with the hypothesis that AME is due to spinning PAHs. However, the expected correlation between mid-infrared and microwave emission could potentially be degraded by different sensitivities of each emission mechanism to local environmental conditions even if PAHs are the carriers of both.',\n",
    "             'THis is a noew sentence with typoes and not really about astro anyways',\n",
    "            ]\n",
    "\n",
    "\n",
    "all_sentence_scores = pipe(sentences)\n",
    "# again convert to strings, to future-proof\n",
    "all_sentence_scores = [[{'label':str(s['label']), 'score':s['score']} for s in sample_scores] for sample_scores in all_sentence_scores]\n",
    "\n",
    "top_sentence_scores = [[ {'label':uat_names[l['label']], 'score':l['score']} \n",
    "                        for l in top_k_scores(s, k=3)] \n",
    "                       for s in all_sentence_scores]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c7aea0c3-ae49-464a-bca9-ae46cbeb1cbc",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[{'label': 'astronomical instrumentation', 'score': 0.9473740458488464},\n",
       "  {'label': 'astronomy data modeling', 'score': 0.49145984649658203},\n",
       "  {'label': 'space vehicle instruments', 'score': 0.39475584030151367}],\n",
       " [{'label': 'hubble constant', 'score': 0.993058443069458},\n",
       "  {'label': 'cosmology', 'score': 0.006782000884413719},\n",
       "  {'label': 'planetary nebulae', 'score': 0.001997443614527583}],\n",
       " [{'label': 'interstellar dust', 'score': 0.9986469149589539},\n",
       "  {'label': 'polycyclic aromatic hydrocarbons', 'score': 0.99810791015625},\n",
       "  {'label': 'interstellar medium', 'score': 0.9940189123153687}],\n",
       " [{'label': 'time series analysis', 'score': 0.2862895429134369},\n",
       "  {'label': 'astronomy data analysis', 'score': 0.08515171706676483},\n",
       "  {'label': 'optical telescopes', 'score': 0.04666740819811821}]]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top_sentence_scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "3a7b8b95-9917-4408-85fb-e74186f9ca7d",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'label': '0', 'score': 4.918351805827115e-07},\n",
       " {'label': '2', 'score': 2.4442993407092217e-08},\n",
       " {'label': '3', 'score': 1.2135334372942452e-06},\n",
       " {'label': '4', 'score': 7.317441941268044e-08},\n",
       " {'label': '5', 'score': 1.078589502867544e-05},\n",
       " {'label': '6', 'score': 2.913926877567974e-08},\n",
       " {'label': '7', 'score': 2.268499343927033e-07},\n",
       " {'label': '8', 'score': 2.4015573529823087e-08},\n",
       " {'label': '9', 'score': 2.046675717792823e-08},\n",
       " {'label': '10', 'score': 1.2331685006472526e-08}]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# the score for every label is available\n",
    "all_sentence_scores[0][0:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "2cb6636d-7ccd-415f-b30e-68982ad9afbb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'label': '2363', 'score': 7.282670910768729e-09},\n",
       " {'label': '2364', 'score': 3.004765858349856e-07},\n",
       " {'label': '2365', 'score': 1.247675868398801e-06},\n",
       " {'label': '2366', 'score': 5.6775856904778266e-08},\n",
       " {'label': '2367', 'score': 3.278066174061678e-08},\n",
       " {'label': '2368', 'score': 1.39621681682911e-06},\n",
       " {'label': '2369', 'score': 0.00028976111207157373},\n",
       " {'label': '2370', 'score': 1.9995159163954668e-06},\n",
       " {'label': '2371', 'score': 2.577130089775892e-06},\n",
       " {'label': '2372', 'score': 3.240722179498334e-08}]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_sentence_scores[0][-10:]"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8",
   "language": "python",
   "name": "python3.8"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}