Diffora Attention vs Normal Attention[LLM]

New Attention 0.16M params
vs
Multi-Head Attention 0.42M params
🤖 Humanoid-v5
🎯 Reinforce Algorithm
New Attention
0.16M
Parameters — 2.6× fewer than MHA
Multi-Head Attention (MHA)
🧠
0.42M
Parameters — Standard transformer attention
Parameter Efficiency
📊
2.6×
New Attention uses 62% fewer parameters

🏗️ Architecture Details

New Attention Policy Novel
  • action_dim17
  • d_model256
  • d_k256
  • d_v256
  • N (heads)1
  • Parameters0.16M
Param ratio38%
MHA Policy Baseline
  • action_dim17
  • d_model256
  • d_k256
  • d_v256
  • N (heads)1
  • Parameters0.42M
Param ratio100%

📈 Training Performance

Return — New Attention
Reward per episode · Steps 30–59
smooth-wave-1
Return — MHA
Reward per episode · Steps 0–29
smooth-wave-1
Loss — New Attention
Policy loss · Steps 30–59
smooth-wave-1
Loss — MHA
Policy loss · Steps 0–29
smooth-wave-1
Episode Count — New Attention
Cumulative episodes · Steps 30–59
Episode Count — MHA
Cumulative episodes · Steps 0–29

🖥️ System Metrics

Network Traffic (Bytes) ~35 MB sent
Disk I/O (MB) ~15 GB written
Disk Utilization (GB) ~21.21 GB
Disk Utilization (%) ~19.7%
Process Memory Available (MB) ~10 GB
Process Memory In Use (%) ~12%
Process Memory In Use (MB) ~1,750 MB
System Memory Utilization (%) ~20%
Process CPU Utilization (%) Peak 100%