Thread Reader
elvis

elvis
@omarsar0

Mar 8, 2023
6 tweets
Twitter

Interesting findings on how LLMs do in-context learning. TL;DR: with scale, LLMs can override semantic priors when presented with enough flipped labels; these models can also perform well when replacing targets with semantically-unrelated targets. arxiv.org/abs/2303.03846

These are the different setups with examples: Regular ICL, Flipped-Label ICL, and Sematnicall-Unrelated Label ICL (SUL-ICL).
There are many cool results in the paper but this one is interesting: instruction-tuned LMs perform better at learning input-label mappings than pretraining-only LMs. Generally, better results with the bigger models and more exemplars per class too.
The paper claims that in-context learning with semantically unrelated labels emerges with scale. You can see in the chart below that "performance decreases more for small models than for large models when using semantically-unrelated targets instead or NL targets."
Some more interesting results for SUL-ICL setup: (top) Larger models benefit more from additional exemplars than smaller models. (bottom) SUL-ICL emerges with scale (using k=8 exemplars per class) for both PaLM and Codex models.
From a practical perspective, it's good to provide more exemplars when using in-context learning and to put effort into formatting, etc. I am curious how the paper findings generalize across different task categories, especially where semantic prior knowledge is not available and you can instead combine large LM to do ICL using input-label mappings. Really cool research thread to follow. Adding some of these notes to the Prompt Engineering guide too.
elvis

elvis

@omarsar0
Machine Learning & NLP Research • PhD • Building @dair_ai • Previously: Meta AI, Elastic
Follow on Twitter
Missing some tweets in this thread? Or failed to load images or videos? You can try to .