News center > News > Headlines > Context
Heavy! Microsoft's most powerful open source small model Phi
Editor
2025-01-09 14:01 4,278

Heavy! Microsoft's most powerful open source small model Phi

Image source: generated by Unbounded AI

Early this morning, Microsoft Research open sourced the most powerful small parameter model - phi-4.

On December 12 last year, Microsoft demonstrated phi-4 for the first time. It had only 14 billion parameters but extremely powerful performance. It surpassed OpenAI’s GPT-4o in the GPQA graduate level and MATH mathematics benchmark tests, and also surpassed OpenAI’s GPT-4o. Exceeding similar top open source models Qwen 2.5-14B and Llama-3.3-70B.

In the test of the American Mathematics Competition AMC, phi-4 reached 91.8 points, surpassing well-known open and closed source models such as Gemini Pro 1.5, GPT-4o, Claude 3.5 Sonnet, Qwen 2.5, and even the overall The performance is comparable to Llama-3.1 with 405 billion parameters.

At that time, many people hoped that Microsoft would open source this super powerful small parameter model, and some people even uploaded pirated phi-4 weights on HuggingFace. Now, it is finally open source and supports commercial use under the MIT license.

Open source address: https://huggingface.co/microsoft/phi-4/tree/main

Even HuggingFace’s official website came to congratulate it, phi-4 is very proud.

A wonderful start to 2025! The best 14B model ever! ! !

The 140-parameter model scored 84.8 points in MMLU, which is crazy. Congratulations!

Thank you for the model and license changes! Awesome.

You are all heroes, get down quickly!

I look forward to Phi-4 implementing serverless capabilities on Azure. When will it be available?

The small parameter model is very good.

Phi's small parameters are amazing for creative writing.

Wow, the phi-4 model can run smoothly on Apple's M4 Pro notebook at a speed of about 12 tokens per second. This is great, thank you!

A brief introduction to phi-4

What can phi-4 do in this way? Small parameters beat well-known open and closed source models in many test benchmarks, and high-quality synthetic data played an important role.

High-quality synthetic data has advantages over traditional organic data scraped from the web. Synthetic data can provide structured, step-by-step learning materials, allowing the model to learn the logic and reasoning process of language more efficiently. For example, in solving mathematical problems, synthetic data can be presented step by step according to the steps of solving the problem, helping the model to better understand the problem.structure and problem-solving ideas.

In addition, synthetic data can be better aligned with the model's inference context and closer to the output format that the model needs to generate in actual applications, which helps the model adapt to actual application scenarios in the pre-training stage. needs. For example, rewriting factual information in online forums into a style similar to large model interaction makes this information more natural and reasonable in the dialogue generated by the model.

PHI-4's synthetic data generation follows the principles of diversity, delicacy and complexity, accuracy and chain of reasoning. Covering more than 50 different types of synthetic data sets, approximately 400 billion unweighted tokens are generated through multiple methods such as multi-stage prompting process, seed curation, rewriting and enhancement, and self-revision.

In addition to synthetic data, phi-4 also conducts strict screening and filtering of organic data. The research team collected data from multiple sources such as online content, authorized books, and code libraries, and through a two-stage filtering process, extracted seed data with high educational value and inference depth.

These seed data provide the basis for the generation of synthetic data, and are also directly used for pre-training, further enriching the knowledge reserve of the model. During the screening process, Microsoft adopted a filtering method based on small classifiers to select high-quality documents from large-scale network data. Special processing for multilingual data is also done to ensure that the model can handle multiple languages ​​including German, Spanish, French, Portuguese, Italian, Hindi and Japanese.

In terms of pre-training, phi-4 mainly uses synthetic data for training, supplemented by a small amount of high-quality organic data. This data mixing strategy enables the model to absorb rich knowledge content while learning reasoning and problem-solving abilities.

In the mid-training phase, phi-4 extends the context length from 4096 to 16384 to improve the model's ability to handle long texts. The help model further increases training on long text data, including samples longer than 8K context filtered from high-quality non-synthetic datasets, as well as newly created synthetic datasets that meet the 4K sequence requirement.

The post-training phase is the key to phi-4 optimization model. Microsoft uses supervised fine-tuning (SFT) and direct preference optimization (DPO) techniques. In the SFT stage, the pre-trained model was fine-tuned using ~8B tokens generated with high-quality data from different domains, with a learning rate of 10 - 6, and multilingual data in 40 languages ​​was added, all in chatml format.

DPO technology adjusts the model’s output by generating preference data to make it more consistent with human preferences. Microsoft also introduced Key Tokens Search (PTS) technology to generate DPO pairs, which can identify key tokens that have a significant impact on the correctness of model answers and create biased tokens for these tokens.good data, thereby improving model performance in inference tasks.

To evaluate the performance of phi-4, Microsoft tested it on multiple benchmarks. On academic benchmarks such as MMLU, GPQA, MATH, HumanEval, etc., phi-4 performs well.

In the MMLU test, phi-4 achieved a high score of 84.8. In the GPQA and MATH tests, it even surpassed GPT-4o, showing strong reasoning capabilities in tasks related to mathematics competitions. In comparisons with other models of similar size and larger, phi-4 outperformed comparable open source models Qwen - 2.5 - 14B - Instruct on 9 out of 12 benchmarks.

Keywords: Bitcoin
Share to: