2303.03846

These are the different setups with examples: Regular ICL, Flipped-Label ICL, and Sematnicall-Unrelated Label ICL (SUL-ICL).

There are many cool results in the paper but this one is interesting: instruction-tuned LMs perform better at learning input-label mappings than pretraining-only LMs. Generally, better results with the bigger models and more exemplars per class too.

The paper claims that in-context learning with semantically unrelated labels emerges with scale. You can see in the chart below that "performance decreases more for small models than for large models when using semantically-unrelated targets instead or NL targets."

Some more interesting results for SUL-ICL setup: (top) Larger models benefit more from additional exemplars than smaller models. (bottom) SUL-ICL emerges with scale (using k=8 exemplars per class) for both PaLM and Codex models.

From a practical perspective, it's good to provide more exemplars when using in-context learning and to put effort into formatting, etc. I am curious how the paper findings generalize across different task categories, especially where semantic prior knowledge is not available and you can instead combine large LM to do ICL using input-label mappings. Really cool research thread to follow. Adding some of these notes to the Prompt Engineering guide too.

elvis

@omarsar0

Machine Learning & NLP Research • PhD • Building @dair_ai • Previously: Meta AI, Elastic

Follow on 𝕏

twitter-thread.com/t/1633299753877266432

elvis@omarsar0

Interesting findings on how LLMs do in-context learning. TL;DR: with scale, LLMs can override semantic priors when presented with enough flipped labels; these models can also perform well when replacing targets with semantically-unrelated targets. https://arxiv.org/abs/2303.03846

elvis

elvis
@omarsar0

Interesting findings on how LLMs do in-context learning. TL;DR: with scale, LLMs can override semantic priors when presented with enough flipped labels; these models can also perform well when replacing targets with semantically-unrelated targets. arxiv.org/abs/2303.03846