Sin2pi
/

whisper_experimental

@@ -1,38 +1,10 @@
-Untrained whisper-like model (Echo) with code and trainer (hf). ASR model for experimentation. Working. Initialized model is "medium":
-""
-config = WhisperConfig(
-    n_mels=80,
-    n_audio_ctx=1500,
-    n_audio_state=1024,
-    n_audio_head=16,
-    n_audio_layer=24,
-    vocab_size=51865,
-    n_text_ctx=448,
-    n_text_state=1024,
-    n_text_head=16,
-    n_text_layer=20,
-    max_rel_dist=15,
-    cross_attention=True,
-    checkpointing=True,
-    base=10000,
-    bos_token_id = 50257,
-    eos_token_id = 50257,
-    pad_token_id = 50257,
-    decoder_start_token_id = 50258,
-    is_encoder_decoder = True,
-    init_std=0.02,
-    )
-model = Echo(config).to(device)
-""
 Dynamic Base Adjustment
 Self-Adjusting Parameters: The model dynamically adjusts the base parameter in response to training loss, optimizing positional embeddings in real-time. This adaptive mechanism enhances the model's ability to fine-tune itself during training, ensuring better performance and efficiency.
-Rotary embedding with ortho-rotation matrix + Givens rotation - Block
 Orthogonally Initialized Rotation Matrix: This component combines rotary embeddings with an orthogonally initialized rotation matrix, providing robust and stable positional embeddings. This novel approach enhances the model’s capacity to represent positional information effectively.
 LearnedSinusoidalEmbeddings
@@ -41,63 +13,32 @@ Learned Sinusoidal Embeddings with Checkpointing: This unique integration of sin
 MultiHeadAttention
 Dynamic Positional Bias: Supports rotary embeddings and includes relative positional bias, capturing dependencies effectively. The attention mechanism is finely tuned with a dynamically adjustable base parameter, providing flexibility and precision.
-HybridAttention - not included in initialized model
 Combining Local and Global Attention: This component leverages both local and global attention mechanisms, ensuring that the model captures both fine-grained and broad context. The sliding window approach for local attention enhances its ability to process long sequences efficiently.
-DynamicConvAttention - not included in initialized model
 Integrating Convolution and Attention: This component enriches feature representation by combining convolutional layers with attention mechanisms, enabling the model to extract local context while attending to global information simultaneously.
 LayerNorm: Custom normalization with gamma and beta parameters.
 Linear: Custom linear layer with batch normalization and various activation functions.
 Conv1d: Custom 1D convolution layer with Kaiming initialization.
-Part of rotation block:We initialize the model with:
-$$
-n_{\text{state}}, n_{\text{head}}, \text{num\_rotations}, \text{base}=10000, \text{checkpointing}=\text{False}
-$$
-The hidden dimension \( \text{h\_dim} \) is calculated as:
-$$
-\text{h\_dim} = \frac{n_{\text{state}}}{n_{\text{head}}}
-$$
-The parameters \texttt{thetas} and \texttt{rotation\_pairs} are initialized as:
-$$
-\texttt{thetas} = \mathbf{0}
-$$
-$$
-\texttt{rotation\_pairs} = \text{rand}(\text{num\_rotations}, 2) \times \text{h\_dim}
-$$
-The rotation matrix is an identity matrix:
-$$
-\texttt{rotation\_matrix} = \mathbf{I}_{\text{h\_dim}}
-$$
-The inverse frequency is computed as:
-$$
-\texttt{inv\_freq} = \frac{1.0}{\text{base}^{\frac{\text{torch.arange}(0, \text{h\_dim}, 2)}{\text{h\_dim}}}}
-$$
-The Givens rotation matrix \( G \) is defined as:
-$$
-G = \mathbf{I}_{n_{\text{state}}}
-$$
-$$
-G_{ii} = \cos(\theta), \quad G_{ij} = -\sin(\theta)
-$$
-$$
-G_{ji} = \sin(\theta), \quad G_{jj} = \cos(\theta)
-$$
-The rotary orthogonal matrix \( R \) used in the forward pass is computed as:
-$$
-R = \text{rotation\_matrix} \cdot G
-$$
-$$ \mathbf{x}{\text{transformed}} = \mathbf{x} \cdot \left( \prod{k=1}^{N} G_k \right) \cdot R $$

+---
+{}
+---
 Dynamic Base Adjustment
 Self-Adjusting Parameters: The model dynamically adjusts the base parameter in response to training loss, optimizing positional embeddings in real-time. This adaptive mechanism enhances the model's ability to fine-tune itself during training, ensuring better performance and efficiency.
+RotaryEmbeddingWithRotation
 Orthogonally Initialized Rotation Matrix: This component combines rotary embeddings with an orthogonally initialized rotation matrix, providing robust and stable positional embeddings. This novel approach enhances the model’s capacity to represent positional information effectively.
 LearnedSinusoidalEmbeddings
 MultiHeadAttention
 Dynamic Positional Bias: Supports rotary embeddings and includes relative positional bias, capturing dependencies effectively. The attention mechanism is finely tuned with a dynamically adjustable base parameter, providing flexibility and precision.
+HybridAttention
 Combining Local and Global Attention: This component leverages both local and global attention mechanisms, ensuring that the model captures both fine-grained and broad context. The sliding window approach for local attention enhances its ability to process long sequences efficiently.
+DynamicConvAttention
 Integrating Convolution and Attention: This component enriches feature representation by combining convolutional layers with attention mechanisms, enabling the model to extract local context while attending to global information simultaneously.
+Model Components
 LayerNorm: Custom normalization with gamma and beta parameters.
 Linear: Custom linear layer with batch normalization and various activation functions.
 Conv1d: Custom 1D convolution layer with Kaiming initialization.
+RotaryEmbeddingWithRotation: Orthogonally initialized rotary embeddings with dynamic base adjustment.
+LearnedSinusoidalEmbeddings: Sinusoidal embeddings with optional checkpointing and L2 normalization.
+MultiHeadAttention: Dynamic positional bias with rotary embeddings and optional caching.
+ResidualAttentionBlock: Integrates self and cross-attention with GELU-activated MLP.
+AudioEncoder: Convolutional layers with learned sinusoidal embeddings and rotary embeddings.
+TextDecoder: Token embeddings with rotary embeddings and cross-attention.
+DynamicConvAttention: Combines convolution and attention for enriched feature extraction.
+HybridAttention: Merges local and global attention mechanisms using sliding window and multi-head attention.

config.json ADDED Viewed

	@@ -0,0 +1,61 @@

+{
+  "activation_dropout": 0.0,
+  "activation_function": "gelu",
+  "apply_spec_augment": false,
+  "architectures": [
+    "Echo"
+  ],
+  "attention_dropout": 0.0,
+  "base": 10000,
+  "begin_suppress_tokens": [
+    220,
+    50256
+  ],
+  "bos_token_id": 50257,
+  "checkpointing": true,
+  "classifier_proj_size": 256,
+  "cross_attention": true,
+  "d_model": 384,
+  "decoder_attention_heads": 6,
+  "decoder_ffn_dim": 1536,
+  "decoder_layerdrop": 0.0,
+  "decoder_layers": 4,
+  "decoder_start_token_id": 50258,
+  "dropout": 0.0,
+  "encoder_attention_heads": 6,
+  "encoder_ffn_dim": 1536,
+  "encoder_layerdrop": 0.0,
+  "encoder_layers": 4,
+  "eos_token_id": 50257,
+  "init_std": 0.02,
+  "is_encoder_decoder": true,
+  "mask_feature_length": 10,
+  "mask_feature_min_masks": 0,
+  "mask_feature_prob": 0.0,
+  "mask_time_length": 10,
+  "mask_time_min_masks": 2,
+  "mask_time_prob": 0.05,
+  "max_rel_dist": 15,
+  "max_source_positions": 1500,
+  "max_target_positions": 448,
+  "median_filter_width": 7,
+  "model_type": "whisper",
+  "n_audio_ctx": 1500,
+  "n_audio_head": 16,
+  "n_audio_layer": 24,
+  "n_audio_state": 1024,
+  "n_mels": 80,
+  "n_text_ctx": 448,
+  "n_text_head": 16,
+  "n_text_layer": 20,
+  "n_text_state": 1024,
+  "num_hidden_layers": 4,
+  "num_mel_bins": 80,
+  "pad_token_id": 50257,
+  "scale_embedding": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.47.0",
+  "use_cache": true,
+  "use_weighted_layer_sum": false,
+  "vocab_size": 51865
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "_from_model_config": true,
+  "begin_suppress_tokens": [
+    220,
+    50256
+  ],
+  "bos_token_id": 50257,
+  "decoder_start_token_id": 50258,
+  "eos_token_id": 50257,
+  "pad_token_id": 50257,
+  "transformers_version": "4.47.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ead546823f5ff3feaccdc9e6d236f78a407f7954cbf4f383e164d84499b528d1
+size 3204676144