We propose Mixture-of-Head attention (MoH), a new architecture that treats attention heads as experts in the Mixture-of-Experts (MoE) mechanism. MoH has two significant advantages: First, MoH enables ...
If you like our project, please give us a star ⭐ on GitHub for the latest update. MARTI is an open-source framework for training LLM-based Multi-Agent Systems (MAS ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results