File size: 4,043 Bytes
8e7c856 5c3e361 6dc0f2d 5c3e361 6dc0f2d 8e7c856 5c3e361 f63f2b9 5c3e361 f63f2b9 5c3e361 419bbb3 556e635 5c3e361 419bbb3 55a1ca5 822b34c 419bbb3 069f0d5 5c3e361 412ff1e 5c3e361 419bbb3 46e33f5 419bbb3 5c3e361 419bbb3 93a66fb 5c3e361 069f0d5 f4ae32c 192b9dc 5c3e361 ee299e3 419bbb3 069f0d5 5c3e361 a79e96c f01d42f a381c3f 46e33f5 d9915fd a381c3f 2609f8e a381c3f 46e33f5 d9915fd 46e33f5 70e5c6a 46e33f5 2609f8e 46e33f5 0d30fdd 46e33f5 bc95fec 6dc0f2d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: fill-mask
tags:
- earth science
- climate
- biology
datasets:
- nasa-impact/nasa-smd-IR-benchmark
- nasa-impact/nasa-smd-qa-benchmark
- ibm/Climate-Change-NER
---
# Model Card for nasa-smd-ibm-v0.1 (Indus)
nasa-smd-ibm-v0.1 (Currently named as Indus) is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
## Model Details
- **Base Model**: RoBERTa
- **Tokenizer**: Custom
- **Parameters**: 125M
- **Pretraining Strategy**: Masked Language Modeling (MLM)
- **Distilled Version**: You can download a distilled version of the model (30 Million Parameters) here: https://huggingface.co/nasa-impact/nasa-smd-ibm-distil-v0.1
## Training Data
- Wikipedia English (Feb 1, 2020)
- AGU Publications
- AMS Publications
- Scientific papers from Astrophysics Data Systems (ADS)
- PubMed abstracts
- PubMedCentral (PMC) (commercial license subset)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F61099e5d86580d4580767226%2FH0-q9N7IwXQqLdEaCCgm-.png%3C%2Fspan%3E)
## Training Procedure
- **Framework**: fairseq 0.12.1 with PyTorch 1.9.1
- **transformers Version**: 4.2.0
- **Strategy**: Masked Language Modeling (MLM)
## Evaluation
- BLURB Benchmark
- Pruned SQuAD2.0 (SQ2) Benchmark (Amazon Rainforest, Oxygen, Geology and NASA ES QAs)
- NASA SMD Expert QA Benchmark (WIP)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F61099e5d86580d4580767226%2FEtCC3U_tMCv3bfLqQdqQm.png%3C%2Fspan%3E)
![Pruned SQ2 Benchmark](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F61099e5d86580d4580767226%2Fruh6-IyiNlUiK21Ej4lDM.png%3C%2Fspan%3E)
Please refer to the following dataset cards for further benchmarks and evaluation
- NASA IR Benchmark - https://huggingface.co/datasets/nasa-impact/nasa-smd-IR-benchmark
- NASA SMD Expert QA Benchmark - https://huggingface.co/datasets/nasa-impact/nasa-smd-qa-benchmark
- Climate CHange Benchmark - https://huggingface.co/datasets/ibm/Climate-Change-NER
## Uses
- Named Entity Recognition (NER)
- Information Retrieval
- Sentence Transformers
- Extractive QA
For NASA SMD related, scientific usecases.
## Note
Accompanying paper can be found here: https://arxiv.org/abs/2405.10725
## Citation
If you find this work useful, please cite using the following bibtex citation:
```bibtex
@misc {nasa-impact_2023,
author = {Masayasu Maraoka and Bishwaranjan Bhattacharjee and Muthukumaran Ramasubramanian and Ikhsa Gurung and Rahul Ramachandran and Manil Maskey and Kaylin Bugbee and Rong Zhang and Yousef El Kurdi and Bharath Dandala and Mike Little and Elizabeth Fancher and Lauren Sanders and Sylvain Costes and Sergi Blanco-Cuaresma and Kelly Lockhart and Thomas Allen and Felix Grazes and Megan Ansdell and Alberto Accomazzi and Sanaz Vahidinia and Ryan McGranaghan and Armin Mehrabian and Tsendgar Lee},
title = { nasa-smd-ibm-v0.1 (Revision f01d42f) },
year = 2023,
url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 },
doi = { 10.57967/hf/1429 },
publisher = { Hugging Face }
}
```
## Attribution
IBM Research
- Masayasu Muraoka
- Bishwaranjan Bhattacharjee
- Rong Zhang
- Yousef El Kurdi
- Bharath Dandala
NASA SMD
- Muthukumaran Ramasubramanian
- Iksha Gurung
- Rahul Ramachandran
- Manil Maskey
- Kaylin Bugbee
- Mike Little
- Elizabeth Fancher
- Lauren Sanders
- Sylvain Costes
- Sergi Blanco-Cuaresma
- Kelly Lockhart
- Thomas Allen
- Felix Grazes
- Megan Ansdell
- Alberto Accomazzi
- Sanaz Vahidinia
- Ryan McGranaghan
- Armin Mehrabian
- Tsendgar Lee
## Disclaimer
This Encoder-only model is currently in an experimental phase. We are working to improve the model's capabilities and performance, and as we progress, we invite the community to engage with this model, provide feedback, and contribute to its evolution. |