Published On Mar 7, 2024
Language models can't learn as well in reverse? In this video, we look at the paper "Arrows of Time for Large Language Models" which studies this phenomenon and offers some theories to explain it.
Paper: https://arxiv.org/abs/2401.17505
Timestamps:
0:00 - Sequential prediction in language models
0:43 - Does order matter?
1:42 - Forwards and backwards factorizations
2:47 - Loss
3:44 - Results on different languages
4:57 - Ablation: Context window size
5:44 - Ablation: Model size
6:06 - Toy experiment: Prime factorization
8:28 - Linear languages
9:58 - Thought experiment
10:57 - Open questions