avatar
reinforcement
reinforcement's Blog
Reinforcement Technology Advancements
reinforcement
reinforcement's Blog
Reinforcement Technology Advancements
  • REINFORCEMENT . tech
About Reinforcement Technology Advancements
Login
  • About Reinforcement Technology Advancements
  • Login

Transformer Performance: Hopfield Theory & Cross-Entropy Loss Data

cover
24 Jun 2025

Table of Links

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

References

Appendix A. Deferred Tables

Table 1: Table of selected related works for Hopfield network, enumerating their domain, energy function, and memory capacity. For all the works above, n represents the dimension of the input vector. W is the outer product of the patterns. M is the matrix of patterns. r is the order of polynomial F(·), d is the number of patterns, and c is a positive constant.

Table 2: Large transformer-based language models and their reported cross-entropy loss.

Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo (8@huawei.com);

(3) Lei Deng (deng.lei2@huawei.com);

(4) Wei Han (harvey.hanwei@huawei.com).


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


← Previous

New Regularization-Free Energy Function for Transformer Analysis

Up Next →

LogSumExp Function Properties: Lemmas for Energy Functions

avatar
reinforcement
reinforcement's Blog
Reinforcement Technology Advancements
reinforcement
reinforcement's Blog
Reinforcement Technology Advancements
  • About
  • Stories
  • Random Story
  • Terms
  • Privacy
  • Publish Your Story