Summary of what we have learned during AMA hour with the OpenAI o1 team today
Model Names and Reasoning Paradigm
- OpenAI o1 is named to represent a new level of AI capability; the counter is reset to 1
- "Preview" indicates it's an early version of the full model
- "Mini" means it's a smaller version of the o1 model, optimized for speed
- o - as OpenAI
- o1 is not a "system"; it's a model trained to generate long chains of thought before returning a final answer
- The icon of o1 is metaphorically an alien of extraordinary ability
Size and Performance of o1 Models
- o1-mini is much smaller and faster than o1-preview, hence offered to free users in future
- o1-preview is an early checkpoint of the o1 model, neither bigger nor smaller
- o1-mini performs better in STEM tasks, but has limited world knowledge
- o1-mini excels at some tasks, especially in code-related tasks, compared to o1-preview
- Input tokens for o1 are calculated the same way as GPT-4o, using the same tokenizer
- o1-mini can explore more thought chains compared to o1-preview
Input Token Context and Model Capabilities
- Larger input contexts are coming soon for o1 models
- o1 models can handle longer, more open-ended tasks with less need for chunking input compared to GPT-4o
- o1 can generate long chains of thought before providing an answer, unlike previous models
- There is no current way to pause inference during CoT to add more context, but this is being explored for future models
Tools, Functionality, and Upcoming Features
- o1-preview doesn't use tools yet, but support for function calling, code interpreter, and browsing is planned
- Tool support, structured outputs, and system prompts will be added in future updates
- Users might eventually get control over thinking time and token limits in future versions
- Plans are underway to enable streaming and considering reasoning progress in the API
- Multimodal capabilities are built into o1, aiming for state-of-the-art performance in tasks like MMMU
CoT (Chain of Thought) Reasoning
- o1 generates hidden chains of thought during reasoning
- No plans to reveal CoT tokens to API users or ChatGPT
- CoT tokens are summarized, but there is no guarantee of faithfulness to the actual reasoning
- Instructions in prompts can influence how the model thinks about a problem
- Reinforcement learning (RL) is used to improve CoT in o1, and GPT-4o cannot match its CoT performance through prompting alone
- Thinking stage appears slower because it summarizes the thought process, even though answer generation is typically faster
API and Usage Limits
- o1-mini has a weekly rate limit of 50 prompts for ChatGPT Plus users
- All prompts count the same in ChatGPT
- More tiers of API access and higher rate limits will be rolled out over time
- Prompt caching in the API is a popular request, but no timeline is available yet
Pricing, Fine-tuning, and Scaling
- Pricing of o1 models is expected to follow the trend of price reductions every 1-2 years
- Batch API pricing will be supported once rate limits increase
- Fine-tuning is on the roadmap, but no timeline is available yet
- Scaling up o1 is bottlenecked by research and engineering talent
- New scaling paradigms for inference compute could bring significant gains in future generations of models
- Inverse scaling isn't significant yet, but personal writing prompts show o1-preview performing only slightly better than GPT-4o (or even slightly worse)
Model Development and Research Insights
- o1 was trained using reinforcement learning to achieve reasoning performance
- The model demonstrates creative thinking and strong performance in lateral tasks like poetry
- o1's philosophical reasoning and ability to generalize, such as deciphering ciphers, are impressive
- o1 was used by researchers to create a GitHub bot that pings the right CODEOWNERS for review
- In internal tests, o1 quizzed itself on difficult problems to gauge its capabilities
- Broad world domain knowledge is being added and will improve with future versions
- Fresher data for o1-mini is planned for future iterations of the model (Oct 2023 currently)
Prompting Techniques and Best Practices
- o1 benefits from prompting styles that provide edge cases or reasoning styles
- o1 models are more receptive to reasoning cues in prompts compared to earlier models
- Providing relevant context in retrieval-augmented generation (RAG) improves performance; irrelevant chunks may worsen reasoning
General Feedback and Future Enhancements
- Rate limits are low for o1-preview due to early-stage testing but will be increased
- Improvements in latency and inference times are actively being worked on
Remarkable Model Capabilities
- o1 can think through philosophical questions like "What is life?"
- Researchers found o1 impressive in its ability to handle complex tasks and generalize from limited instruction
- o1's creative reasoning abilities, such as quizzing itself to gauge its capabilities, showcase its high-level problem-solving
We’re hosting an AMA for developers from 10–11 AM PT today. Reply to this thread with any questions and the OpenAI o1 team will answer as many as they can.