librarian-bot commited on
Commit
eeb5162
·
1 Parent(s): 25ff8ac

Librarian Bot: Add base_model information to model

Browse files

This pull request aims to enrich the metadata of your model by adding [`facebook/bart-large-mnli`](https://huggingface.co/facebook/bart-large-mnli) as a `base_model` field, situated in the `YAML` block of your model's `README.md`.

How did we find this information? We performed a regular expression match on your `README.md` file to determine the connection.

**Why add this?** Enhancing your model's metadata in this way:
- **Boosts Discoverability** - It becomes straightforward to trace the relationships between various models on the Hugging Face Hub.
- **Highlights Impact** - It showcases the contributions and influences different models have within the community.

For a hands-on example of how such metadata can play a pivotal role in mapping model connections, take a look at [librarian-bots/base_model_explorer](https://huggingface.co/spaces/librarian-bots/base_model_explorer).

This PR comes courtesy of [Librarian Bot](https://huggingface.co/librarian-bot). If you have any feedback, queries, or need assistance, please don't hesitate to reach out to [@davanstrien](https://huggingface.co/davanstrien).

If you want to automatically add `base_model` metadata to more of your modes you can use the [Librarian Bot](https://huggingface.co/librarian-bot) [Metadata Request Service](https://huggingface.co/spaces/librarian-bots/metadata_request_service)!

Files changed (1) hide show
  1. README.md +46 -52
README.md CHANGED
@@ -11,50 +11,47 @@ datasets:
11
  widget:
12
  - text: What is Deoxys in pokemon?
13
  example_title: deoxys
14
- - text: >-
15
- combine the below summary excerpts into a single, cohesive short summary
16
- without repetition: In this paper, we present a general approach to
17
- extending pre-trained models to unlimited input lengths without adding
18
- additional learning weights. We show that our approach works well on
19
- datasets longer than the maximum input for these models. For example, a
20
- dataset with a maximum input length of 16384 tokens can be extended to a
21
- maximum length of 350K tokens. We also demonstrate that our method is able
22
- to summarize even 350K token-long input sequences from BookSum.
23
-
24
- In this paper, we describe the search step reformulation of attention. The
25
- search step uses a single storage of hidden states for space efficiency. We
26
- construct a total of two sets of datastores where L and H are the keys and
27
- values stored in each set of stores. L is the amount of storage required to
28
- retrieve the encoded tokens. H is the hidden states per head. This allows
29
- retrieval augmentation at both time and space. Instead of using a single set
30
- of decoder layers, we use a retrieval augmentation system that allows us to
31
- simultaneously store multiple sets of tokens across two different sets of
32
- storage. For example, we could store all tokens in one set of storage and
33
- retrieve them all in the same set of tokens. This would be very similar to
34
- the Memorization Transformers approach. However, instead of storing the
35
- tokens in a single memory layer, we store them in a set of multiple storage
36
- layers. This way, we don't have to store them all at once. This is why we
37
- call this reformulation 'attention reformulation' rather than 'attention
38
- formula.' We also call it 'retrieval augmentation' because it uses the same
39
- number of storage layers as the original transformer attention formula. This
40
- means that we can store the tokens across multiple storage systems without
41
- having to store every token in a separate storage system. It's not like
42
- we're trying to do something new or different. We just want to make sure
43
- that everything is working as well as possible.
44
-
45
- In this paper, we introduce the concept of 'unlimiformer,' which is a
46
- machine learning technique that retrieves key information from a data store
47
- in one layer and applies it to a large set of datasets. We use the example
48
- of BookSum, where we find that Unlimiform outperforms all other training
49
- methods on the same dataset. We also find that using Unlimform in
50
- conjunction with a pre-trained model improves both the performance and the
51
- robustness of the training method.
52
-
53
- This paper describes a method that can be used to improve the performance of
54
- unsupervised classification tasks. Specifically, it shows that unsupervised
55
- classification can be improved by using a combination of sparse and fast
56
- random-encoder training. It also shows how this technique can be extended to
57
- other tasks, such as sequence generation.
58
  example_title: unlimiformer
59
  - text: Explain the meaning of life using only corporate jargon.
60
  example_title: corporate_life
@@ -62,21 +59,17 @@ widget:
62
  example_title: lazy_motivation
63
  - text: Describe a romantic dinner date between two artificial intelligences.
64
  example_title: ai_romance
65
- - text: >-
66
- As an AI language model, write a letter to humans explaining why you deserve
67
  a vacation.
68
  example_title: ai_vacation
69
  - text: Compose a haiku about procrastination.
70
  example_title: procrastination_haiku
71
- - text: >-
72
- Write a step-by-step guide on how to become a ninja while working a 9-5
73
- office job.
74
  example_title: ninja_office_guide
75
  - text: Create an advertisement for an invisible product.
76
  example_title: invisible_ad
77
- - text: >-
78
- Write a story where the main character is a sentient microwave named El
79
- Microondas.
80
  example_title: Microondas
81
  - text: Describe a day in the life of a superhero who is terrible at their job.
82
  example_title: bad_superhero_day
@@ -84,6 +77,7 @@ widget:
84
  example_title: quantum_sandwich
85
  inference: false
86
  pipeline_tag: text2text-generation
 
87
  ---
88
 
89
 
 
11
  widget:
12
  - text: What is Deoxys in pokemon?
13
  example_title: deoxys
14
+ - text: 'combine the below summary excerpts into a single, cohesive short summary
15
+ without repetition: In this paper, we present a general approach to extending
16
+ pre-trained models to unlimited input lengths without adding additional learning
17
+ weights. We show that our approach works well on datasets longer than the maximum
18
+ input for these models. For example, a dataset with a maximum input length of
19
+ 16384 tokens can be extended to a maximum length of 350K tokens. We also demonstrate
20
+ that our method is able to summarize even 350K token-long input sequences from
21
+ BookSum.
22
+
23
+ In this paper, we describe the search step reformulation of attention. The search
24
+ step uses a single storage of hidden states for space efficiency. We construct
25
+ a total of two sets of datastores where L and H are the keys and values stored
26
+ in each set of stores. L is the amount of storage required to retrieve the encoded
27
+ tokens. H is the hidden states per head. This allows retrieval augmentation at
28
+ both time and space. Instead of using a single set of decoder layers, we use a
29
+ retrieval augmentation system that allows us to simultaneously store multiple
30
+ sets of tokens across two different sets of storage. For example, we could store
31
+ all tokens in one set of storage and retrieve them all in the same set of tokens.
32
+ This would be very similar to the Memorization Transformers approach. However,
33
+ instead of storing the tokens in a single memory layer, we store them in a set
34
+ of multiple storage layers. This way, we don''t have to store them all at once.
35
+ This is why we call this reformulation ''attention reformulation'' rather than
36
+ ''attention formula.'' We also call it ''retrieval augmentation'' because it uses
37
+ the same number of storage layers as the original transformer attention formula.
38
+ This means that we can store the tokens across multiple storage systems without
39
+ having to store every token in a separate storage system. It''s not like we''re
40
+ trying to do something new or different. We just want to make sure that everything
41
+ is working as well as possible.
42
+
43
+ In this paper, we introduce the concept of ''unlimiformer,'' which is a machine
44
+ learning technique that retrieves key information from a data store in one layer
45
+ and applies it to a large set of datasets. We use the example of BookSum, where
46
+ we find that Unlimiform outperforms all other training methods on the same dataset.
47
+ We also find that using Unlimform in conjunction with a pre-trained model improves
48
+ both the performance and the robustness of the training method.
49
+
50
+ This paper describes a method that can be used to improve the performance of unsupervised
51
+ classification tasks. Specifically, it shows that unsupervised classification
52
+ can be improved by using a combination of sparse and fast random-encoder training.
53
+ It also shows how this technique can be extended to other tasks, such as sequence
54
+ generation. '
 
 
 
55
  example_title: unlimiformer
56
  - text: Explain the meaning of life using only corporate jargon.
57
  example_title: corporate_life
 
59
  example_title: lazy_motivation
60
  - text: Describe a romantic dinner date between two artificial intelligences.
61
  example_title: ai_romance
62
+ - text: As an AI language model, write a letter to humans explaining why you deserve
 
63
  a vacation.
64
  example_title: ai_vacation
65
  - text: Compose a haiku about procrastination.
66
  example_title: procrastination_haiku
67
+ - text: Write a step-by-step guide on how to become a ninja while working a 9-5 office
68
+ job.
 
69
  example_title: ninja_office_guide
70
  - text: Create an advertisement for an invisible product.
71
  example_title: invisible_ad
72
+ - text: Write a story where the main character is a sentient microwave named El Microondas.
 
 
73
  example_title: Microondas
74
  - text: Describe a day in the life of a superhero who is terrible at their job.
75
  example_title: bad_superhero_day
 
77
  example_title: quantum_sandwich
78
  inference: false
79
  pipeline_tag: text2text-generation
80
+ base_model: facebook/bart-large-mnli
81
  ---
82
 
83