<div class="content">
<div>
<p style="margin-left:0px; margin-right:0px; text-align:center"><img alt="jahb.png" src="https://intranetproxy.alipay.com/skylark/lark/0/2024/png/98756342/1706606628857-bd1dbf7f-50d0-4b4e-a704-2f26ae5a3814.png?x-oss-process=image%2Fresize%2Cw_900%2Climit_0" width="900" referrerpolicy="no-referrer"></p>
<p style="margin-left:0; margin-right:0"><strong><span>使用多任务高效微调框架MFTCoder,以DeepSeek-Coder-33b模型为底座,微调获得的CodeFuse-DeepSeek-33b模型在</span></strong><strong><span style="color:#1f2937">Big<span> </span></span></strong><strong><span style="color:#e6b800">Code</span></strong><strong><span style="color:#1f2937"><span> </span>Models<span> </span></span></strong><strong><span style="color:#e6b800">Leaderboard</span></strong><strong><span>代码大模型榜单上以43.58% WinRate成为新晋榜首,同时模型在NLP任务上也取得了很好的表现。本文我们将介绍该模型的得来和使用,包括训练数据、训练超参设置、模型评测效果以及如何获取该模型和基于它继续微调。我们已经在HuggingFace和ModelScope开放了模型下载(下载地址在文末),并同步提供了4bit量化版本供大家直接部署到生产环境。</span></strong></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0; text-align:center"><img src="https://intranetproxy.alipay.com/skylark/lark/0/2024/png/347737/1706581702847-565b14ab-7236-4bab-b74f-b8ce640e2eb9.png" width="1734" referrerpolicy="no-referrer"></p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>图1: Big Code Models LeaderBoard榜单截图(截取时间2024-01-30)。</span></strong><strong><span style="color:#1f2937">Big<span> </span></span></strong><strong><span style="color:#e6b800">Code</span></strong><strong><span style="color:#1f2937"><span> </span>Models<span> </span></span></strong><strong><span style="color:#e6b800">Leaderboard</span></strong><strong><span>(</span></strong><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fhuggingface.co%2Fspaces%2Fbigcode%2Fbigcode-models-leaderboard" target="_blank"><strong><span>https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard</span></strong></a><strong><span>)是由HuggingFace BigCode团队维护的代码大模型榜单,是代码大模型领域比较权威的评测榜单。</span></strong></p>
<p style="margin-left:0; margin-right:0"> </p>
<span id="OSC_h1_1"></span>
<h1><span>多任务微调MFT</span></h1>
<p style="margin-left:0; margin-right:0"><span>我们选择以DeepSeek-Coder-33b模型为底座,使用多任务微调框架MFTCoder对5个下游任务数据进行微调,得到CodeFuse-DeepSeek-33b模型。以下将更为详细地进行介绍。</span></p>
<span id="OSC_h2_2"></span>
<h2><span>训练数据</span></h2>
<p style="margin-left:0; margin-right:0"><span>本次训练我们设置了5个下游任务,如下表1所示,包括代码补全任务、文本生成代码任务、单测生成任务、自然语言表述对齐任务和代码练习题任务,共约168万样本数据。得益于我们开源的多任务微调框架MFTCoder,这些下游任务能一定程度上相互促进,比直接将所有任务数据混合为一后微调表现更优。</span></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>表1: 下游任务训练数据统计</span></strong></p>
<table border="1" cellspacing="0" style="border-collapse:collapse; border:1px solid #d9d9d9; table-layout:fixed; width:700px">
<tbody>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span><strong><span>序号</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span><strong><span>MFT下游任务</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>任务能力</span></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span><strong><span>#Samples<span> </span></span></strong></span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>1</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>单测用例生成</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>给定函数级代码生成单元测试用例</span></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>390,393</span></span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>2</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>代码补全</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>根据前文补全代码(方法级)</span></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>192,547</span></span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>3</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>文本生成代码</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>基于文本描述生成功能代码</span></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>66,862</span></span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>4</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>NLP表述对齐</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>增强NLP理解能力</span></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>951,278</span></span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>5</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>代码练习题 (JAVA/CPP/GO)</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>基于文本描述生成基础功能代码</span></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span>82,603</span></span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><strong><span>#Total</span></strong></span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span><span style="color:#000000">1,683,683</span></span></p> </td>
</tr>
</tbody>
</table>
<span id="OSC_h2_3"></span>
<h2><span>关键超参设置</span></h2>
<p style="margin-left:0; margin-right:0"><span>本次微调使用的是我们已经开源的多任务微调框架MFTCoder</span><strong><span>(</span></strong><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fgithub.com%2Fcodefuse-ai%2FMFTCoder%2Ftree%2Fmain%2Fmftcoder_accelerate" target="_blank"><strong><span>https://github.com/codefuse-ai/MFTCoder/tree/main/mftcoder_accelerate</span></strong></a><strong><span>)</span></strong><span>,MFTCoder支持多模型适配(包括Llama 1/2、CodeLlama、Qwen、Baichuan 2、ChatGLM 2/3、CodeGeex 2、GPT-NEOX、Mistral、DeepSeek等)、多任务并行、多种均衡Loss设计、PEFT(Lora和QLora)高效微调,此前已被采纳为Qwen Code AI竞赛初赛推荐微调框架(</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Ftianchi.aliyun.com%2Fcompetition%2Fentrance%2F532169%2Finformation" target="_blank"><span>https://tianchi.aliyun.com/competition/entrance/532169/information</span></a><span>)。本次训练使用的关键超参设置如下表2所示,更多详细的参数说明可参考</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fgithub.com%2Fcodefuse-ai%2FMFTCoder%2Ftree%2Fmain%2Fmft_peft_hf%2332-loraqlora" target="_blank"><span>https://github.com/codefuse-ai/MFTCoder/tree/main/mft_peft_hf#32-loraqlora</span></a></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>表2: MFTCoder微调关键超参设置及解释</span></strong></p>
<table border="1" cellspacing="0" style="border-collapse:collapse; border:1px solid #d9d9d9; table-layout:fixed; width:734px">
<tbody>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><strong><span>参数名称</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><strong><span>参数值</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><strong><span>简要解释</span></strong></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>data_split</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>"98,2,0"</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>98%数据用于训练,2%用于验证</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>padding_mode</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>"padding"</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>使用动态填充模式,即每张卡每个batch大小是由每次其中的最长者动态决定而不是固定大小。另一种可选数据模式是"pack"。</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:48px"> <p style="margin-left:0; margin-right:0"><span>dynamic_padding</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:48px"> <p style="margin-left:0; margin-right:0"><span>True</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>weighted_loss_mode</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>"case3"</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>使用数据均衡Loss函数,更多细节可见论文</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Farxiv.org%2Fabs%2F2311.02303" target="_blank"><span>https://arxiv.org/abs/2311.02303</span></a></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>peft_type</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>"qlora"</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>采取QLora 4bit量化微调模式</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>quantization</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>"4bit"</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>lora_rank</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>192</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>决定可训练参数比例</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>lora_alpha</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>32</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>per_device_train_batch_size</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>4</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>训练时单卡batch大小</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>per_device_eval_batch_size</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>4</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>验证时单卡batch大小</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>learning_rate</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>5e-5</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>初始学习率</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>min_lr</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>1e-6</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>最小学习率</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>gradient_accumulation_steps</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>1</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>梯度累积步数,如果为2,则每累积2步再更新参数,资源不足是一种间接增加global batch size的方式</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>world_size</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>64</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>GPU卡数,使用64张A100/A100卡</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>evalation_steps</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>500</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>每500步验证一次</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>checkpointing_steps</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>500</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>每500步保存一次检查点</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>num_train_epochs</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>10</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>最大训练轮数,最大10轮</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>early_stopping</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>True</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>开启early-stopping机制,即当连续3个检查点的eval loss均比倒数第4个检查点的eval loss大时终止训练</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>early_stopping_stall_num</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>3</span></p> </td>
</tr>
</tbody>
</table>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0"><span>使用前述训练数据和配置,经过156.5小时,模型在完成5.09 Epochs训练后触发Early-Stopping策略后终止。</span></p>
<p style="margin-left:0; margin-right:0"> </p>
<span id="OSC_h1_4"></span>
<h1><span>模型效果</span></h1>
<p style="margin-left:0; margin-right:0"><span>我们从代码能力和NLP能力两个方面对训练获得的CodeFuse-DeepSeek-33b进行了测试,pass@1测试均采用greedy解码模式(即</span><span style="background-color:#d8dad9">doSample=False, num_beams=1, num_return_sequences=1</span><span>)。</span></p>
<span id="OSC_h2_5"></span>
<h2><span>代码能力</span></h2>
<p style="margin-left:0; margin-right:0"><span>我们选取了常用的代码评测集对模型进行评测,首先我们使用自己的CodeFuse-Evaluation评测框架(</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fgithub.com%2Fcodefuse-ai%2Fcodefuse-evaluation" target="_blank"><span>https://github.com/codefuse-ai/codefuse-evaluation</span></a><span>)对模型在HumanEval-X(含HumanEval)和MBPP测试集上的表现进行了测试并与CodeFus此前微调过的模型进行了比较,如下表3和表4所示。</span></p>
<p style="margin-left:0; margin-right:0"><strong><span>CodeFuse-DeepSeek-33b在HumanEval上pass@1指标值为78.65%、在MBPP上为71%(zero-shot),两项平均为74.83%,略高于DeepSeek-Coder-Instruct-33B</span></strong><span>。</span></p>
<p style="margin-left:0; margin-right:0"><strong><span>CodeFuse-DeepSeek-33b在多语言评测集HumanEval-X上pass@1指标值平均为67.07%,比此前我们开放的CodeFuse-CodeLlama-34b模型高6.69%,在具体各种语言上高出3.48%~12.19%不等</span></strong><span>。</span></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>表3: CodeFuse-DeepSeek-33b模型与其他开源底座模型及微调模型在HumanEval和MBPP上的对比</span></strong></p>
<p style="margin-left:0; margin-right:0; text-align:center"><img src="https://intranetproxy.alipay.com/skylark/lark/0/2024/png/347737/1704200172671-e3b1411a-19eb-4305-878d-99f7f95945e5.png" width="636" referrerpolicy="no-referrer"></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>表4: CodeFuse-DeepSeek-33b模型与其他开源底座模型及MFT微调模型在HumanEval-X上的对比</span></strong></p>
<p style="margin-left:0; margin-right:0; text-align:center"><img src="https://intranetproxy.alipay.com/skylark/lark/0/2024/png/347737/1704200193212-a4061f95-d847-46d6-853b-2f0848c16050.png" width="719" referrerpolicy="no-referrer"></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0"><span>由于不同评测框架在代码后处理和生成终止条件(Stop Words)等方面常存在差异,除了用我们自己的CodeFuse-Evaluation评测框架,我们也用代码大模型榜单Big Code Models LeaderBoard所用的开源评测框架bigcode-evaluation-harness (</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fgithub.com%2Fbigcode-project%2Fbigcode-evaluation-harness" target="_blank"><span>https://github.com/bigcode-project/bigcode-evaluation-harness</span></a><span>)进行了评测,并与榜单上的模型进行了比较。榜单会测试模型在Python代码补全测试集HumenEval和多语言代码补全测试集MultiPL-E共12种语言上的表现,并根据各语言表现进行WinRate排序。(结果复现代码地址:</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fgithub.com%2Ftwelveand0%2Fbigcode-evaluation-harness" target="_blank"><span>https://github.com/twelveand0/bigcode-evaluation-harness</span></a><span>)</span></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>表5: 采用bigcode-evaluation-harness评测CodeFuse-DeepSeek-33b模型后的新榜单</span></strong><img src="https://intranetproxy.alipay.com/skylark/lark/0/2024/png/347737/1706582301694-43665f48-da68-422d-ad5a-86454c8f627f.png" width="1400" referrerpolicy="no-referrer"></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0"><span>如表5所示,</span><strong><span>CodeFuse-DeepSeek-33b模型的WinRate为43.58%,超过原榜首DeepSeek-Coder-33b-instruct。在HumanEval评测集上,CodeFuse-DeepSeek-33b表现不如DeepSeek-Coder-33b-instruct,但在其他8种语言(包括Java和JS等)上超过后者,均值(Average Score)亦超过后者1.7%</span></strong><span>。</span></p>
<span id="OSC_h2_6"></span>
<h2><span>NLP通用能力</span></h2>
<p style="margin-left:0; margin-right:0"><span>对于NLP通用能力测试,我们参照OpenCompass选择了18个评测集,包括语言能力(AFQMC、CHID、Wic、WSC)、推理能力(COPA、CMNLI、OCNLI、Ax-b、Ax-g、RTE)、理解能力(CSL、C3、EPRSTMT)、学科综合能力(MMLU、C-Eval、ARC-c)、代码能力(HumanEval、MBPP)。对于每个模型,我们会使用生成式和PPL方式计算每个指标,并在每个维度上选取两种方式中较高的值作为指标值。</span></p>
<p style="margin-left:0px; margin-right:0px; text-align:center"><img src="https://intranetproxy.alipay.com/skylark/lark/0/2024/png/347737/1704260089384-cb664a24-5d58-4a7d-9081-a61159a469e3.png" width="1230" referrerpolicy="no-referrer"></p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>图2: CodeFuse-DeepSeek-33b NLP通用能力雷达图</span></strong></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0"><span>CodeFuse-DeepSeek-33b模型的评测结果如图3雷达图所示,我们将其与底座模型DeepSeek-Coder-33b和DeepSeek通用模型DeepSeek-67b-Chat进行了对比。</span><strong><span>从图中可以看出,相较于底座模型DeepSeek-Coder-33b,CodeFuse-DeepSeek-33b在所有维度上均有正向提升;相较于我们此前开源的CodeFuse-CodeLlama-34b,CodeFuse-DeepSeek-33b在绝大多数维度上表现更优;相较于通用模型DeepSeek-67b-Chat,CodeFuse-DeepSeek-33b在语言能力、代码能力和理解能力上整体表现更优,在推理能力上表现稍差,在学科综合能力上差距较大。考虑到模型参数规模差距和底座目标功能类型差异,我们认为CodeFuse-DeepSeek-33b已经表现很好。</span></strong></p>
<p style="margin-left:0; margin-right:0"> </p>
<span id="OSC_h1_7"></span>
<h1><span>模型INT4量化</span></h1>
<p style="margin-left:0; margin-right:0"><span>为了便于直接部署投入生产,我们同步提供了CodeFuse-DeepSeek-33b-INT4量化版本。对于量化后的模型,我们测试了它的代码能力,如表5所示,量化后模型在代码补全任务上只有微弱降幅。</span></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>表5:模型量化前后在HumanEval-X和MBPP上的指标对比</span></strong></p>
<table border="1" cellspacing="0" style="border-collapse:collapse; border:1px solid #d9d9d9; table-layout:fixed; width:749px">
<tbody>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>Model</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>HumanEval-X</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>MBPP</span></strong></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>Python</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>Java</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>C++</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>JS</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>Go</span></strong></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>CodeFuse-DeepSeek-33b</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>78.65%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>67.68%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>65.85%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>67.07%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>56.10%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"><span>71.0%</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>CodeFuse-DeepSeek-33b-INT4</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>78.05%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>68.29%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>62.19%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>64.63%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>55.49%</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> </td>
</tr>
</tbody>
</table>
<p style="margin-left:0; margin-right:0"><span>此外,我们测试了该模型实际部署后的性能。测试环境为单张A10(24G显存)、部署框架为NVIDIA开源的tensorRT。测试结果具体如表6所示:</span></p>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0; text-align:center"><strong><span>表6: CodeFuse-DeepSeek-33b-INT4在单张A10的推理性能</span></strong></p>
<table border="1" cellspacing="0" style="border-collapse:collapse; border:1px solid #d9d9d9; table-layout:fixed; width:640px">
<tbody>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>模型版本</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>CodeFuse-DeepSeek-33b</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>推理速度指标</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>Tokens/s</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>模型并行/gpu型号</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"> </p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>单卡A10</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>量化格式</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"> </p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>int4 </span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0"> </p> <p style="margin-left:0; margin-right:0"><span>输入/输出长度</span><br> <span>(batch_size=1)</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>16/8</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:37px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>21.7</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:40px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>64/32</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:40px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>21.5</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:38px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>256/128</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:38px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>21.1</span></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:36px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>1024/512</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:36px"> <p style="margin-left:0; margin-right:0; text-align:center"><span>20.5</span></p> </td>
</tr>
</tbody>
</table>
<span id="OSC_h1_8"></span>
<h1> </h1>
<span id="OSC_h1_9"></span>
<h1><span>模型下载试用</span></h1>
<p style="margin-left:0; margin-right:0"><span>我们开放了量化前后2个模型的下载,提供了推理格式和推理示例,并说明了如何在此基础上继续微调。</span></p>
<span id="OSC_h2_10"></span>
<h2><span>下载</span></h2>
<p style="margin-left:0; margin-right:0"><span>我们已经将2个模型(CodeFuse-DeepSeek-33b和CodeFuse-DeepSeek-33b-INT4)发布到HuggingFace和ModelScope社区,大家可以选择通过以下链接下载:</span></p>
<table border="1" cellspacing="0" style="border-collapse:collapse; border:1px solid #d9d9d9; table-layout:fixed; width:726px">
<tbody>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>Model</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>HuggingFace</span></strong></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0; text-align:center"><strong><span>ModelScope</span></strong></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>CodeFuse-DeepSeek-33b</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fhuggingface.co%2Fcodefuse-ai%2FCodeFuse-DeepSeek-33B" target="_blank"><span>https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B</span></a></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fmodelscope.cn%2Fmodels%2Fcodefuse-ai%2FCodeFuse-DeepSeek-33B%2Fsummary" target="_blank"><span>https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B/summary</span></a></p> </td>
</tr>
<tr>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><span>CodeFuse-DeepSeek-33b-4bits</span></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fhuggingface.co%2Fcodefuse-ai%2FCodeFuse-DeepSeek-33B-4bits" target="_blank"><span>https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B-4bits</span></a></p> </td>
<td style="border-color:#d9d9d9; border-style:solid; border-width:1px; height:33px"> <p style="margin-left:0; margin-right:0"><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fmodelscope.cn%2Fmodels%2Fcodefuse-ai%2FCodeFuse-DeepSeek-33B-4bits%2Fsummary" target="_blank"><span>https://modelscope.cn/models/codefuse-ai/CodeFuse-DeepSeek-33B-4bits/summary</span></a></p> </td>
</tr>
</tbody>
</table>
<span id="OSC_h2_11"></span>
<h2><span>推理格式与代码</span></h2>
<p style="margin-left:0; margin-right:0"><span>CodeFuse-DeepSeek-33b使用的推理格式如下:</span></p>
<pre><code><s>system
{THIS IS YOUR SYSTEM PROMPT}
<s>human
{THIS IS YOUR PROMPT of ROUND 1}
<s>bot
{THIS IS MODEL's RESPONSE of ROUND 1}
<s>human
{THIS IS YOUR PROMPT of ROUND 2}
<s>bot</code></pre>
<p style="margin-left:0; margin-right:0"><span>在这个格式中,sysem部分是可选的,对话轮数可以是多轮的也可以是单轮的。下面是一些具体的案例展示:</span></p>
<pre><code><s>human
请写一个快排程序
#Python
<s>bot
</code></pre>
<pre><code><s>system
你是CodeFuse,你会被给定一个任务,你必须按照用户的要求完成任务。
<s>human
请写一个快排程序
#Python
<s>bot
下面是一个使用Python编写的快速排序算法的示例代码:
```python
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quick_sort(left) + middle + quick_sort(right)
# 测试快速排序算法
arr = [3, 6, 8, 10, 1, 2, 1]
print(quick_sort(arr))
```
这段代码定义了一个`quick_sort`函数,它接受一个列表作为参数,并返回排序后的列表。在函数内部,首先判断列表的长度是否小于等于1,如果是,则直接返回列表本身。否则,选择列表中间的元素作为基准值,并将列表分成三个部分:小于基准值的部分、等于基准值的部分和大于基准值的部分。然后,对这三个部分分别递归调用`quick_sort`函数进行排序,最后将排序后的三个部分合并并返回。
通过这个示例,你可以了解到如何使用Python编写快速排序算法,并对其进行测试。
<s>human
请为以上代码生成单测用例
<s>bot
</code></pre>
<p style="margin-left:0; margin-right:0"> </p>
<p style="margin-left:0; margin-right:0"><span>下面是一段示例的推理代码:</span></p>
<pre><code>import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
model_dir = "codefuse-ai/CodeFuse-DeepSeek-33B"
def load_model_tokenizer(model_path):
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
tokenizer.eos_token = "<|end▁of▁sentence|>"
tokenizer.pad_token = "<|end▁of▁sentence|>"
tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids(tokenizer.eos_token)
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
tokenizer.padding_side = "left"
model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto',torch_dtype=torch.bfloat16, trust_remote_code=True)
return model, tokenizer
HUMAN_ROLE_START_TAG = "<s>human\n"
BOT_ROLE_START_TAG = "<s>bot\n"
text_list = [f'{HUMAN_ROLE_START_TAG}Write a QuickSort program\n#Python\n{BOT_ROLE_START_TAG}']
model, tokenizer = load_model_tokenizer(model_dir)
inputs = tokenizer(text_list, return_tensors='pt', padding=True, add_special_tokens=False).to('cuda')
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]
generation_config = GenerationConfig(
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
temperature=0.1,
max_new_tokens=512,
num_return_sequences=1,
num_beams=1,
top_p=0.95,
do_sample=False
)
outputs = model.generate(
inputs= input_ids,
attention_mask=attention_mask,
**generation_config.to_dict()
)
gen_text = tokenizer.batch_decode(outputs[:, input_ids.shape[1]:], skip_special_tokens=True)
print(gen_text[0])</code></pre>
<span id="OSC_h2_12"></span>
<h2><span>继续微调</span></h2>
<p style="margin-left:0; margin-right:0"><span>如果你想在这两个模型基础上继续微调,欢迎使用我们开源的多任务高效微调框架MFTCoder(</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fgithub.com%2Fcodefuse-ai%2FMFTCoder%2Ftree%2Fmain%2Fmftcoder_accelerate" target="_blank"><span>https://github.com/codefuse-ai/MFTCoder/tree/main/mftcoder_accelerate</span></a><span>)。要继续微调,你需要准备好训练数据集(CodeFuse-ChatML格式)、设置训练配置文件、设置运行配置文件并启动训练。这里提供一个对Qwen-1.8模型用MFTCoder进行微调的案例供参考:</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fgithub.com%2Fcodefuse-ai%2FMFTCoder%2Ftree%2Fcodeqwen_competition%2Fmft_peft_hf" target="_blank"><span>https://github.com/codefuse-ai/MFTCoder/tree/codeqwen_competition/mft_peft_hf</span></a><span>。</span></p>
<p style="margin-left:0; margin-right:0"> </p>
<span id="OSC_h1_13"></span>
<h1><span>联系我们</span></h1>
<p style="margin-left:0; margin-right:0"><span>MFTCoder已经开源,本文中提到的模型和数据集也在陆续开源中,如果您喜欢我们的工作,欢迎试用、指正错误和贡献代码,可以的话请给我们的项目增加Star以支持我们。</span></p>
<ul>
<li><span>GitHub项目主页:</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fgithub.com%2Fcodefuse-ai%2FMFTCoder" target="_blank"><span>https://github.com/codefuse-ai/MFTCoder</span></a></li>
<li><span>HuggingFace主页:</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fhuggingface.co%2Fcodefuse-ai" target="_blank"><span>https://huggingface.co/codefuse-ai</span></a></li>
<li><span>魔搭社区主页:</span><a href="https://www.oschina.net/action/GoToLink?url=https%3A%2F%2Fmodelscope.cn%2Forganization%2Fcodefuse-ai" target="_blank"><span>https://modelscope.cn/organization/codefuse-ai</span></a></li>
</ul>
</div>
</div>