Update README.md
Browse files
README.md
CHANGED
@@ -1,51 +1,41 @@
|
|
1 |
-
|
|
|
2 |
|
|
|
|
|
3 |
|
|
|
|
|
4 |
|
5 |
-
|
|
|
6 |
|
7 |
-
|
|
|
8 |
|
9 |
-
|
|
|
10 |
|
11 |
-
|
|
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
22 |
|
23 |
-
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
-
Dynamic Convolutional Attention and Hybrid Attention
|
29 |
|
30 |
-
|
31 |
|
32 |
-
|
33 |
|
34 |
-
Biased Attention and Augmented Memory
|
35 |
-
|
36 |
-
Natural Synergy: Biased attention can prioritize important features, while augmented memory can store and retrieve long-term dependencies. Together, they can ensure that important features are not only highlighted but also remembered over long sequences.
|
37 |
-
|
38 |
-
Integration Idea: Embed bias terms in the augmented memory retrieval process, allowing the model to focus on and recall important features over extended periods.
|
39 |
-
|
40 |
-
Rotary and Learned Sinusoidal Embeddings with Hybrid Attention
|
41 |
-
|
42 |
-
Natural Synergy: Rotary and learned sinusoidal embeddings enhance positional encoding, which can be crucial for hybrid attention mechanisms that need to maintain the order of information while attending to both local and global contexts.
|
43 |
-
|
44 |
-
Integration Idea: Apply rotary and learned sinusoidal embeddings within the hybrid attention layers to improve positional awareness and ensure the model accurately captures the order and structure of the input data.
|
45 |
-
|
46 |
-
Enhancements to Rotary Embeddings
|
47 |
-
Rotary Embeddings: We incorporated rotary embeddings to encode positional information into the queries and keys, helping the model capture the relative positions of tokens within a sequence.
|
48 |
-
|
49 |
-
Rotational Dynamics Layer: We integrated a custom rotation layer that applies orthogonal transformations to the embeddings. This helps reduce interference between sensory inputs and memory representations by rotating the embeddings, which enhances the model’s ability to process and differentiate sequential data.
|
50 |
-
|
51 |
-
Dynamic Integration of Attention Mechanisms: Introduced a mechanism that dynamically weights and integrates the outputs of multiple attention mechanisms based on their relevance and performance. This ensures that the model leverages the most effective attention mechanism for a given input, leading to more contextually appropriate and robust processing.
|
|
|
1 |
+
Dynamic Base Adjustment
|
2 |
+
Self-Adjusting Parameters: The model dynamically adjusts the base parameter in response to training loss, optimizing positional embeddings in real-time. This adaptive mechanism enhances the model's ability to fine-tune itself during training, ensuring better performance and efficiency.
|
3 |
|
4 |
+
RotaryEmbeddingWithRotation
|
5 |
+
Orthogonally Initialized Rotation Matrix: This component combines rotary embeddings with an orthogonally initialized rotation matrix, providing robust and stable positional embeddings. This novel approach enhances the model’s capacity to represent positional information effectively.
|
6 |
|
7 |
+
LearnedSinusoidalEmbeddings
|
8 |
+
Learned Sinusoidal Embeddings with Checkpointing: This unique integration of sinusoidal embeddings with optional checkpointing helps manage memory efficiently during training while maintaining stable embedding magnitudes through L2 normalization.
|
9 |
|
10 |
+
MultiHeadAttention
|
11 |
+
Dynamic Positional Bias: Supports rotary embeddings and includes relative positional bias, capturing dependencies effectively. The attention mechanism is finely tuned with a dynamically adjustable base parameter, providing flexibility and precision.
|
12 |
|
13 |
+
HybridAttention
|
14 |
+
Combining Local and Global Attention: This component leverages both local and global attention mechanisms, ensuring that the model captures both fine-grained and broad context. The sliding window approach for local attention enhances its ability to process long sequences efficiently.
|
15 |
|
16 |
+
DynamicConvAttention
|
17 |
+
Integrating Convolution and Attention: This component enriches feature representation by combining convolutional layers with attention mechanisms, enabling the model to extract local context while attending to global information simultaneously.
|
18 |
|
19 |
+
Model Components
|
20 |
+
LayerNorm: Custom normalization with gamma and beta parameters.
|
21 |
|
22 |
+
Linear: Custom linear layer with batch normalization and various activation functions.
|
23 |
|
24 |
+
Conv1d: Custom 1D convolution layer with Kaiming initialization.
|
25 |
|
26 |
+
RotaryEmbeddingWithRotation: Orthogonally initialized rotary embeddings with dynamic base adjustment.
|
27 |
|
28 |
+
LearnedSinusoidalEmbeddings: Sinusoidal embeddings with optional checkpointing and L2 normalization.
|
29 |
|
30 |
+
MultiHeadAttention: Dynamic positional bias with rotary embeddings and optional caching.
|
31 |
|
32 |
+
ResidualAttentionBlock: Integrates self and cross-attention with GELU-activated MLP.
|
33 |
|
34 |
+
AudioEncoder: Convolutional layers with learned sinusoidal embeddings and rotary embeddings.
|
35 |
|
36 |
+
TextDecoder: Token embeddings with rotary embeddings and cross-attention.
|
|
|
37 |
|
38 |
+
DynamicConvAttention: Combines convolution and attention for enriched feature extraction.
|
39 |
|
40 |
+
HybridAttention: Merges local and global attention mechanisms using sliding window and multi-head attention.
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|