cover

Cross-Entropy Loss Analysis in Transformer Networks

19 Jun 2025

An in-depth analysis of cross-entropy loss in Transformer networks, including its connection to attention, theoretical bounds, and empirical observations.

cover

Modeling Transformer Layers: Majorization Minimization & Hopfield Networks

19 Jun 2025

Explore how majorization minimization (MM) technique is used to adapt Hopfield network models to the multi-layered structure of Transformers

cover

New Energy Function for Transformers: No External Regularization

19 Jun 2025

Introducing a new energy function for Transformer models that operates without additional regularization, offering a simpler way to model attention.

cover

Transformer Block Architecture: Attention and Feed-Forward Integration

18 Jun 2025

Explore the core components of a Transformer block, focusing on how multi-head attention and feed-forward layers can be conceptually integrated

cover

Associative Memories: Transformer Memorization & Performance Dynamics

18 Jun 2025

Empirical studies on large language models have shown that the larger they are, the more they tend to memorize training data.

cover

Related Work: Scaling Laws and Hopfield Models in LLM Research

18 Jun 2025

Explore existing research on neural scaling laws in large language models and the evolution of Hopfield networks as associative memories

cover

Theoretical Framework: Transformer Memorization & Performance Dynamics

18 Jun 2025

This study sheds light on why larger Transformers don't always perform better, theorizing the memorization process

cover

Rules, Exceptions, and Exploration: The Secret to EXPLORER’s Success

1 Apr 2025

EXPLORER outperforms baselines by combining neural exploration and symbolic reasoning, excelling in text-based games with unseen entities.

cover

Beyond Seen Worlds: EXPLORER’s Journey into Generalized Reasoning

1 Apr 2025

EXPLORER dynamically generalizes symbolic rules using WordNet hypernyms to improve performance on unseen entities in text-based games.