Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,8 @@ tags:
|
|
18 |
|
19 |
---
|
20 |
|
21 |
-
|
|
|
22 |
|
23 |
**Self-Adjusting Parameters**: The model dynamically adjusts the base parameter in response to training loss, optimizing positional embeddings in real-time. This adaptive mechanism enhances the model's ability to fine-tune itself during training, ensuring better performance and efficiency.
|
24 |
|
@@ -95,3 +96,48 @@ model.apply_initialization()
|
|
95 |
model.save_pretrained("./models/echo2")
|
96 |
|
97 |
model = Echo.from_pretrained("./models/echo2")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
---
|
20 |
|
21 |
+
"The concept of making the adaptation factor itself adaptive opens up a fascinating research direction. Imagine a model that doesn't just learn, but learns how to learn - continuously calibrating its own learning dynamics." copilot.
|
22 |
+
|
23 |
|
24 |
**Self-Adjusting Parameters**: The model dynamically adjusts the base parameter in response to training loss, optimizing positional embeddings in real-time. This adaptive mechanism enhances the model's ability to fine-tune itself during training, ensuring better performance and efficiency.
|
25 |
|
|
|
96 |
model.save_pretrained("./models/echo2")
|
97 |
|
98 |
model = Echo.from_pretrained("./models/echo2")
|
99 |
+
|
100 |
+
|
101 |
+
#########
|
102 |
+
Roadmap--
|
103 |
+
|
104 |
+
Dynamically Adjusted Base: -- The base parameter, which affects the frequencies used in positional encodings, will dynamically adjust based on the model’s performance during training. This can lead to better capture of positional information, especially if the data characteristics vary over time. done
|
105 |
+
|
106 |
+
Responsive Hyperparameter Auto-Tuning: -- The model will continuously fine-tune itself in response to the loss observed during training. This means it can potentially learn more effectively by automatically adjusting the positional encoding's influence on learning. done for base frequency - rope embeddings. done
|
107 |
+
|
108 |
+
Proactive Adjustments: -- Just like learning rate schedulers, adjusting base based on performance can help in avoiding overfitting or underfitting, and ensure the model remains effective across different training phases. done--
|
109 |
+
|
110 |
+
Next for responsive tuning:
|
111 |
+
Multi-Dimensional Adaptation --
|
112 |
+
|
113 |
+
Track adaptation not just for base frequency, but for:
|
114 |
+
- Rotation matrices
|
115 |
+
- Attention span
|
116 |
+
- Embedding dimensionality
|
117 |
+
- Dropout rates
|
118 |
+
- ??
|
119 |
+
|
120 |
+
Conceptual Question:--
|
121 |
+
- How do you measure "homeostasis" in a neural network?
|
122 |
+
-- Create a "learning dynamics" module that observes and adjusts multiple model parameters simultaneously.?
|
123 |
+
|
124 |
+
|
125 |
+
--Learnable Frequencies and Phases: Introduce learnable frequencies and phase shifts in rotary embeddings to allow the model to adapt positional encodings dynamically. done
|
126 |
+
|
127 |
+
--Learnable Positional Scaling: Add a learnable scaling factor for positional embeddings to adjust the emphasis on positional information during training. done
|
128 |
+
|
129 |
+
--Multiple Rotation Matrices per Head: Use separate rotation matrices for each attention head to increase expressiveness and capture diverse positional patterns. wip
|
130 |
+
|
131 |
+
--Orthogonal Parameterization with Givens Rotations: Parameterize the rotation matrix using Givens rotations to enforce orthogonality without explicit re-orthogonalization. done
|
132 |
+
|
133 |
+
--Per-Layer Rotation Matrices: Implement different rotation matrices for each layer of the model to enhance representational capacity. wip
|
134 |
+
|
135 |
+
--Conditional Rotation Matrices: Generate rotation matrices conditioned on the input using a small neural network for dynamic positional relationships. wip
|
136 |
+
|
137 |
+
--Multi-Scale Rotary Embeddings: Use multiple sets of inverse frequencies to capture positional information at various scales within the same model. done
|
138 |
+
|
139 |
+
--Relative Positional Biases: Incorporate learnable relative positional biases into the attention scores to enhance positional understanding.done
|
140 |
+
|
141 |
+
--Regularization for Rotation Matrix: Add a regularization term to the loss function to encourage the rotation matrix to remain orthogonal during training. done
|
142 |
+
|
143 |
+
Anyone who has any new ideas, old ideas rehashed, bad ideas reimagined, even just bad ideas.. Send me a message.
|