英特爾Day0完成文心大模型4.5系列開源模型的端側部署-數碼-漫步新聞-陪你看看

【ZOL中關村在線原創新聞】今天，百度正式發佈文心大模型4.5系列開源模型。英特爾Open VINO與百度飛槳多年來一直保持着緊密的合作。在此次文心繫列模型的發佈過程中，英特爾藉助OpenVINO在模型發佈的第零日即實現對文心端側模型的適配和在英特爾酷睿Ultra平臺上的端側部署。

OpenVINO工具套件是由英特爾開發的開源工具套件，旨在優化和加速深度學習模型的推理性能，支持跨平臺部署並充分利用英特爾硬件資源。OpenVINO助力行業中廣泛的先進模型在英特爾人工智能產品和解決方案中的性能，應用在AIPC、邊緣AI和更多人工智能的使用場景當中。

從2021年開始，百度飛槳和英特爾OpenVINO進行深入合作，雙方進行深度適配，爲開發者提供了更有效更便捷的AI開發工具鏈。經過雙方適配的衆多模型，如PaddleOCR，PaddleSeg，PaddleDection等，在金融、醫療、智能智造等領域被廣泛應用，開發者可以直接將飛槳模型用OpenVINO推理和部署，或通過OpenVINO的模型優化器轉化爲IR格式，進一步部署和推理。

今天，百度基於多年積累的雄厚的AI技術實力，爲業界帶來了開源的文心4.5系列大模型。英特爾宣佈OpenVINO已經對0.3B參數量的稠密模型成功適配，並在英特爾酷睿Ultra平臺上成功部署且獲得了優異的推理性能。

英特爾助力百度文心大模型的首次亮相，共同爲行業帶來全新的人工智能體驗。接下來，英特爾將持續與百度保持緊密合作，適配更多的文心繫列模型，攜手拓寬AI技術的新邊界。

快速上手指南(Get Started)

第一步，環境準備

基於以下命令可以完成模型部署任務在Python上的環境安裝。

python -m venv py_venv

./py_venv/Scripts/activate.bat

pip install --pre -U openvino-genai--extra-index-urlhttps://storage.openvinotoolkit.org/simple/wheels/nightly

pip install nncf

pip installgit+https://github.com/openvino-dev-samples/optimum-intel.git@ernie

第二步，模型下載和轉換

在部署模型之前，我們首先需要將原始的PyTorch模型轉換爲OpenVINO的IR靜態圖格式，並對其進行壓縮，以實現更輕量化的部署和最佳的性能表現。通過Optimum提供的命令行工具optimum-cli，我們可以一鍵完成模型的格式轉換和權重量化任務：

optimum-cli export openvino --modelbaidu/ERNIE-4.5-0.3B-PT --task text-generation-with-past--weight-format fp16 --trust-remote-code ERNIE-4.5-0.3B-PT-OV

開發者可以根據模型的輸出結果，調整其中的量化參數，包括：

--model：爲模型在HuggingFace上的model id，這裏我們也提前下載原始模型，並將model id替換爲原始模型的本地路徑，針對國內開發者，推薦使用ModelScope魔搭社區作爲原始模型的下載渠道，具體加載方式可以參考ModelScope官方指南：
https://www.modelscope.cn/docs/models/download

--weight-format：量化精度，可以選擇fp32,fp16,int8,int4,int4_sym_g128,int4_asym_g128,int4_sym_g64,int4_asym_g64

--group-size：權重裏共享量化參數的通道數量

--ratio：int4/int8權重比例，默認爲1.0，0.6表示60%的權重以int4表，40%以int8表示

--sym：是否開啓對稱量化

第三步，模型部署

針對ERNIE-4.5系列的文本生成類模型，我們可以使用Optimum-Intel進行任務部署和加速。Optimum-Intel可以通過調用OpenVINO runtime後端，以實現在IntelCPU及GPU平臺上的性能優化，同時由於其兼容Transformers庫，因此我們可以直接參考官方示例，將其遷移至Optimum-Intel執行。

fromtransformersimportAutoTokenizer

fromoptimum.intelimportOVModelForCausalLM

model_path="ERNIE-4.5-0.3B-PT-OV"

#load the tokenizer and the model

tokenizer=AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)

model=OVModelForCausalLM.from_pretrained(model_path,trust_remote_code=True)

#prepare the model input

prompt="Giveme a short introduction to large language model."

messages=[

{"role":"user","content":prompt}

]

text=tokenizer.apply_chat_template(

messages,

tokenize=False,

add_generation_prompt=True

)

model_inputs=tokenizer([text],add_special_tokens=False,return_tensors="pt").to(model.device)

#conduct text completion

generated_ids=model.generate(

model_inputs.input_ids,

max_new_tokens=1024

)

output_ids=generated_ids[0][len(model_inputs.input_ids[0]):].tolist

#decode the generated ids

generate_text=tokenizer.decode(output_ids,skip_special_tokens=True).strip(" ")

print("generate_text:",generate_text)

輸入結果參考：

generate_text: "Large LanguageModels (LLMs) are AI-powered tools that use natural languageprocessing (NLP) techniques to generate human-like text, answerquestions, and perform reasoning tasks. They leverage massivedatasets, advanced algorithms, and computational power to process,analyze, and understand human language, enabling conversational AIthat can understand, interpret, and respond to a wide range ofinputs. Their applications range from customer support to academicresearch, from language translation to creative content generation."

(10064498)