Sin2pi commited on
Commit
a79dda1
·
verified ·
1 Parent(s): a2346fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -1
README.md CHANGED
@@ -18,7 +18,8 @@ tags:
18
 
19
  ---
20
 
21
- Anyone who has any new ideas, old ideas rehashed, bad ideas reimagined, even just bad ideas.. Send me a message.
 
22
 
23
  **Self-Adjusting Parameters**: The model dynamically adjusts the base parameter in response to training loss, optimizing positional embeddings in real-time. This adaptive mechanism enhances the model's ability to fine-tune itself during training, ensuring better performance and efficiency.
24
 
@@ -95,3 +96,48 @@ model.apply_initialization()
95
  model.save_pretrained("./models/echo2")
96
 
97
  model = Echo.from_pretrained("./models/echo2")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ---
20
 
21
+ "The concept of making the adaptation factor itself adaptive opens up a fascinating research direction. Imagine a model that doesn't just learn, but learns how to learn - continuously calibrating its own learning dynamics." copilot.
22
+
23
 
24
  **Self-Adjusting Parameters**: The model dynamically adjusts the base parameter in response to training loss, optimizing positional embeddings in real-time. This adaptive mechanism enhances the model's ability to fine-tune itself during training, ensuring better performance and efficiency.
25
 
 
96
  model.save_pretrained("./models/echo2")
97
 
98
  model = Echo.from_pretrained("./models/echo2")
99
+
100
+
101
+ #########
102
+ Roadmap--
103
+
104
+ Dynamically Adjusted Base: -- The base parameter, which affects the frequencies used in positional encodings, will dynamically adjust based on the model’s performance during training. This can lead to better capture of positional information, especially if the data characteristics vary over time. done
105
+
106
+ Responsive Hyperparameter Auto-Tuning: -- The model will continuously fine-tune itself in response to the loss observed during training. This means it can potentially learn more effectively by automatically adjusting the positional encoding's influence on learning. done for base frequency - rope embeddings. done
107
+
108
+ Proactive Adjustments: -- Just like learning rate schedulers, adjusting base based on performance can help in avoiding overfitting or underfitting, and ensure the model remains effective across different training phases. done--
109
+
110
+ Next for responsive tuning:
111
+ Multi-Dimensional Adaptation --
112
+
113
+ Track adaptation not just for base frequency, but for:
114
+ - Rotation matrices
115
+ - Attention span
116
+ - Embedding dimensionality
117
+ - Dropout rates
118
+ - ??
119
+
120
+ Conceptual Question:--
121
+ - How do you measure "homeostasis" in a neural network?
122
+ -- Create a "learning dynamics" module that observes and adjusts multiple model parameters simultaneously.?
123
+
124
+
125
+ --Learnable Frequencies and Phases: Introduce learnable frequencies and phase shifts in rotary embeddings to allow the model to adapt positional encodings dynamically. done
126
+
127
+ --Learnable Positional Scaling: Add a learnable scaling factor for positional embeddings to adjust the emphasis on positional information during training. done
128
+
129
+ --Multiple Rotation Matrices per Head: Use separate rotation matrices for each attention head to increase expressiveness and capture diverse positional patterns. wip
130
+
131
+ --Orthogonal Parameterization with Givens Rotations: Parameterize the rotation matrix using Givens rotations to enforce orthogonality without explicit re-orthogonalization. done
132
+
133
+ --Per-Layer Rotation Matrices: Implement different rotation matrices for each layer of the model to enhance representational capacity. wip
134
+
135
+ --Conditional Rotation Matrices: Generate rotation matrices conditioned on the input using a small neural network for dynamic positional relationships. wip
136
+
137
+ --Multi-Scale Rotary Embeddings: Use multiple sets of inverse frequencies to capture positional information at various scales within the same model. done
138
+
139
+ --Relative Positional Biases: Incorporate learnable relative positional biases into the attention scores to enhance positional understanding.done
140
+
141
+ --Regularization for Rotation Matrix: Add a regularization term to the loss function to encourage the rotation matrix to remain orthogonal during training. done
142
+
143
+ Anyone who has any new ideas, old ideas rehashed, bad ideas reimagined, even just bad ideas.. Send me a message.