
Cross-Entropy Loss Analysis in Transformer Networks
19 Jun 2025
An in-depth analysis of cross-entropy loss in Transformer networks, including its connection to attention, theoretical bounds, and empirical observations.

Modeling Transformer Layers: Majorization Minimization & Hopfield Networks
19 Jun 2025
Explore how majorization minimization (MM) technique is used to adapt Hopfield network models to the multi-layered structure of Transformers

New Energy Function for Transformers: No External Regularization
19 Jun 2025
Introducing a new energy function for Transformer models that operates without additional regularization, offering a simpler way to model attention.

Transformer Block Architecture: Attention and Feed-Forward Integration
18 Jun 2025
Explore the core components of a Transformer block, focusing on how multi-head attention and feed-forward layers can be conceptually integrated

Associative Memories: Transformer Memorization & Performance Dynamics
18 Jun 2025
Empirical studies on large language models have shown that the larger they are, the more they tend to memorize training data.

Related Work: Scaling Laws and Hopfield Models in LLM Research
18 Jun 2025
Explore existing research on neural scaling laws in large language models and the evolution of Hopfield networks as associative memories

Theoretical Framework: Transformer Memorization & Performance Dynamics
18 Jun 2025
This study sheds light on why larger Transformers don't always perform better, theorizing the memorization process

Rules, Exceptions, and Exploration: The Secret to EXPLORER’s Success
1 Apr 2025
EXPLORER outperforms baselines by combining neural exploration and symbolic reasoning, excelling in text-based games with unseen entities.

Beyond Seen Worlds: EXPLORER’s Journey into Generalized Reasoning
1 Apr 2025
EXPLORER dynamically generalizes symbolic rules using WordNet hypernyms to improve performance on unseen entities in text-based games.