Microsoft Research have published a very promising paper about their efforts and success in Progressive Learning from Complex Explanation Traces of GPT-4. And the most exciting part is that they are going to release it soon and are currently working with their legal team to publicly release a diff of the model weights in accordance with LLaMA’s release policy.
What is Orca LLM and why is it significant?
Recent research has been working on making smaller models more powerful by teaching them using the knowledge generated by larger models. There are several challenges in this process:
- Limited imitation signals: The smaller models have limited information to learn from because the larger models only provide partial outputs.
- Small-scale homogeneous training data: The training data for the smaller models is often small and similar, which limits their learning potential.
- Lack of rigorous evaluation: The smaller models tend to imitate the style of the larger models but struggle to replicate their reasoning abilities. This is because there hasn’t been a thorough evaluation of the smaller models, leading to an overestimation of their capabilities.
To overcome these challenges, researchers have developed a new model called Orca.
Orca LLM is a model with 13 billion parameters, and it has been designed to learn the reasoning process of the larger models. It learns from a wealth of information provided by GPT-4, including explanations of each step, detailed thought processes, and complex instructions. Additionally, it receives guidance from ChatGPT to assist its learning process.
To make the learning more effective, Orca LLM uses a diverse and extensive range of imitation data. Careful sampling and selection techniques are employed to ensure that the model learns from a wide variety of examples. The results have been impressive:
- Orca LLM outperforms other state-of-the-art models that are specifically tuned for following instructions, such as Vicuna-13B, by more than 100% in challenging reasoning tasks like Big-Bench Hard (BBH) and 42% on AGIEval.
- Moreover, Orca LLM performs at a similar level to ChatGPT on the BBH benchmark and shows competitive performance (with only a 4-point gap compared to an optimized system message) in professional and academic exams like the SAT, LSAT, GRE, and GMAT. This is achieved without any previous exposure to the specific questions or tasks, making it a zero-shot setting.
- However, Orca LLM still falls slightly behind GPT-4 in terms of performance.
Overall, this research indicates that learning from step-by-step explanations, whether they come from humans or more advanced AI models, is a promising direction to enhance the capabilities and skills of models like Orca.