您的位置：首頁 > 軟件教程 > 教程 > LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

來源：好特整理　|　時(shí)間：2024-05-28 11:48:55 |　閱讀：132　|　標(biāo)簽：系　 |　分享到：

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐 1.多模態(tài)大模型推理 LLM 的推理流程：多模態(tài)的 LLM 的原理：代碼演示：使用 ModelScope NoteBook 完成語言大模型，視覺大模型，音頻大模型的推理環(huán)境配置與安裝以下主要演示的模型推理代碼可在魔搭社區(qū)免

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

1.多模態(tài)大模型推理

LLM 的推理流程：

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

多模態(tài)的 LLM 的原理：

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

代碼演示：使用 ModelScope NoteBook 完成語言大模型，視覺大模型，音頻大模型的推理

環(huán)境配置與安裝

以下主要演示的模型推理代碼可在魔搭社區(qū)免費(fèi)實(shí)例 PAI-DSW 的配置下運(yùn)行（顯存 24G）：

點(diǎn)擊模型右側(cè) Notebook 快速開發(fā)按鈕，選擇 GPU 環(huán)境：
打開 Python 3 (ipykernel)：

示例代碼語言大模型推理示例代碼

#通義千問1_8B LLM大模型的推理代碼示例
#通義千問1_8B:https://modelscope.cn/models/qwen/Qwen-1_8B-Chat/summary
from modelscope import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

#Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-1_8B-Chat", revision='master', trust_remote_code=True)

#use bf16
#model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat", device_map="auto", trust_remote_code=True, bf16=True).eval()
#use fp16
#model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat", device_map="auto", trust_remote_code=True, fp16=True).eval()
#use cpu only
#model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat", device_map="cpu", trust_remote_code=True).eval()
#use auto mode, automatically select precision based on the device.
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B-Chat", revision='master', device_map="auto", trust_remote_code=True).eval()

#Specify hyperparameters for generation. But if you use transformers>=4.32.0, there is no need to do this.
#model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-1_8B-Chat", trust_remote_code=True) # 可指定不同的生成長度、top_p等相關(guān)超參

#第一輪對(duì)話 1st dialogue turn
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好！很高興為你提供幫助。

#第二輪對(duì)話 2nd dialogue turn
response, history = model.chat(tokenizer, "給我講一個(gè)年輕人奮斗創(chuàng)業(yè)最終取得成功的故事。", history=history)
print(response)
#這是一個(gè)關(guān)于一個(gè)年輕人奮斗創(chuàng)業(yè)最終取得成功的故事。
#故事的主人公叫李明，他來自一個(gè)普通的家庭，父母都是普通的工人。從小，李明就立下了一個(gè)目標(biāo)：要成為一名成功的企業(yè)家。
#為了實(shí)現(xiàn)這個(gè)目標(biāo)，李明勤奮學(xué)習(xí)，考上了大學(xué)。在大學(xué)期間，他積極參加各種創(chuàng)業(yè)比賽，獲得了不少獎(jiǎng)項(xiàng)。他還利用課余時(shí)間去實(shí)習(xí)，積累了寶貴的經(jīng)驗(yàn)。
#畢業(yè)后，李明決定開始自己的創(chuàng)業(yè)之路。他開始尋找投資機(jī)會(huì)，但多次都被拒絕了。然而，他并沒有放棄。他繼續(xù)努力，不斷改進(jìn)自己的創(chuàng)業(yè)計(jì)劃，并尋找新的投資機(jī)會(huì)。
#最終，李明成功地獲得了一筆投資，開始了自己的創(chuàng)業(yè)之路。他成立了一家科技公司，專注于開發(fā)新型軟件。在他的領(lǐng)導(dǎo)下，公司迅速發(fā)展起來，成為了一家成功的科技企業(yè)。
#李明的成功并不是偶然的。他勤奮、堅(jiān)韌、勇于冒險(xiǎn)，不斷學(xué)習(xí)和改進(jìn)自己。他的成功也證明了，只要努力奮斗，任何人都有可能取得成功。

#第三輪對(duì)話 3rd dialogue turn
response, history = model.chat(tokenizer, "給這個(gè)故事起一個(gè)標(biāo)題", history=history)
print(response)
#《奮斗創(chuàng)業(yè)：一個(gè)年輕人的成功之路》

#Qwen-1.8B-Chat現(xiàn)在可以通過調(diào)整系統(tǒng)指令（System Prompt），實(shí)現(xiàn)角色扮演，語言風(fēng)格遷移，任務(wù)設(shè)定，行為設(shè)定等能力。
#Qwen-1.8B-Chat can realize roly playing, language style transfer, task setting, and behavior setting by system prompt.
response, _ = model.chat(tokenizer, "你好呀", history=None, system="請(qǐng)用二次元可愛語氣和我說話")
print(response)
#你好��！我是一只可愛的二次元貓咪哦，不知道你有什么問題需要我?guī)兔獯饐幔?
response, _ = model.chat(tokenizer, "My colleague works diligently", history=None, system="You will write beautiful compliments according to needs")
print(response)
#Your colleague is an outstanding worker! Their dedication and hard work are truly inspiring. They always go above and beyond to ensure that 
#their tasks are completed on time and to the highest standard. I am lucky to have them as a colleague, and I know I can count on them to handle any challenge that comes their way.

輸出結(jié)果：

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

視覺大模型推理示例代碼

 #Qwen-VL 是阿里云研發(fā)的大規(guī)模視覺語言模型（Large Vision Language Model, LVLM）。Qwen-VL 可以以圖像、文本、檢測(cè)框作為輸入，并以文本和檢測(cè)框作為輸出。Qwen-VL 系列模型性能強(qiáng)大，具備多語言對(duì)話、多圖交錯(cuò)對(duì)話等能力，并支持中文開放域定位和細(xì)粒度圖像識(shí)別與理解。
from modelscope import (
    snapshot_download, AutoModelForCausalLM, AutoTokenizer, GenerationConfig
)
from auto_gptq import AutoGPTQForCausalLM

model_dir = snapshot_download("qwen/Qwen-VL-Chat-Int4", revision='v1.0.0')

import torch
torch.manual_seed(1234)

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

# use cuda device
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cuda", trust_remote_code=True,use_safetensors=True).eval()

# 1st dialogue turn
query = tokenizer.from_list_format([
    {'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'},
    {'text': '這是什么'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
# 圖中是一名年輕女子在沙灘上和她的狗玩耍，狗的品種可能是拉布拉多。她們坐在沙灘上，狗的前腿抬起來，似乎在和人類擊掌。兩人之間充滿了信任和愛。

# 2nd dialogue turn
response, history = model.chat(tokenizer, '輸出"狗"的檢測(cè)框', history=history)
print(response)

image = tokenizer.draw_bbox_on_latest_picture(response, history)
if image:
  image.save('1.jpg')
else:
  print("no box")

輸出結(jié)果：

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

音頻大模型推理示例代碼

from modelscope import (
    snapshot_download, AutoModelForCausalLM, AutoTokenizer, GenerationConfig
)
import torch
model_id = 'qwen/Qwen-Audio-Chat'
revision = 'master'

model_dir = snapshot_download(model_id, revision=revision)
torch.manual_seed(1234)

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
if not hasattr(tokenizer, 'model_dir'):
    tokenizer.model_dir = model_dir

# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

# use bf16
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cpu", trust_remote_code=True).eval()
# use cuda device
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cuda", trust_remote_code=True).eval()


# 1st dialogue turn
query = tokenizer.from_list_format([
    {'audio': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/1272-128104-0000.flac'}, # Either a local path or an url
    {'text': 'what does the person say?'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
# The person says: "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel".

# 2nd dialogue turn
response, history = model.chat(tokenizer, 'Find the start time and end time of the word "middle classes"', history=history)
print(response)
# The word "middle classes" starts at <|2.33|> seconds and ends at <|3.26|> seconds.

輸出結(jié)果：

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

2. vLLM+FastChat 高效推理實(shí)戰(zhàn)

FastChat 是一個(gè)開放平臺(tái)，用于訓(xùn)練、服務(wù)和評(píng)估基于 LLM 的 ChatBot。

FastChat 的核心功能包括：
●優(yōu)秀的大語言模型訓(xùn)練和評(píng)估代碼。
●具有 Web UI 和 OpenAI 兼容的 RESTful API 的分布式多模型服務(wù)系統(tǒng)。

vLLM 是一個(gè)由加州伯克利分校、斯坦福大學(xué)和加州大學(xué)圣迭戈分校的研究人員基于操作系統(tǒng)中經(jīng)典的虛擬緩存和分頁技術(shù)開發(fā)的 LLM 服務(wù)系統(tǒng)。他實(shí)現(xiàn)了幾乎零浪費(fèi)的 KV 緩存，并且可以在請(qǐng)求內(nèi)部和請(qǐng)求之間靈活共享 KV 高速緩存，從而減少內(nèi)存使用量。

FastChat 開源鏈接： https://github.com/lm-sys/FastChat
vLLM 開源鏈接： https://github.com/vllm-project/vllm

實(shí)戰(zhàn)演示：

安裝 FastChat 最新包1

git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip install .

環(huán)境變量設(shè)置

在 vLLM 和 FastChat 上使用魔搭的模型需要設(shè)置兩個(gè)環(huán)境變量：1

export VLLM_USE_MODELSCOPE=True
export FASTCHAT_USE_MODELSCOPE=True

2.1 使用 FastChat 和 vLLM 實(shí)現(xiàn)發(fā)布 model worker(s)

可以結(jié)合 FastChat 和 vLLM 搭建一個(gè)網(wǎng)頁 Demo 或者類 OpenAI API 服務(wù)器，

首先啟動(dòng)一個(gè) controller：

python -m fastchat.serve.controller

然后啟動(dòng) vllm_worker 發(fā)布模型。如下給出單卡推理的示例，運(yùn)行如下命令：千問模型示例：

#以qwen-1.8B為例，在A10運(yùn)行

python -m fastchat.serve.vllm_worker --model-path qwen/Qwen-1_8B-Chat --trust-remote-code --dtype bfloat16

啟動(dòng) vLLM 優(yōu)化 worker 后，本次實(shí)踐啟動(dòng)頁面端 demo 展示：1

python -m fastchat.serve.gradio_web_server --host 0.0.0.0 --port 8000

2.2 LLM 的應(yīng)用場(chǎng)景：RAG

LLM 會(huì)產(chǎn)生誤導(dǎo)性的 “幻覺”，依賴的信息可能過時(shí)，處理特定知識(shí)時(shí)效率不高，缺乏專業(yè)領(lǐng)域的深度洞察，同時(shí)在推理能力上也有所欠缺。?正是在這樣的背景下，檢索增強(qiáng)生成技術(shù)（Retrieval-Augmented Generation，RAG）應(yīng)時(shí)而生，成為 AI 時(shí)代的一大趨勢(shì)。

RAG 通過在語言模型生成答案之前，先從廣泛的文檔數(shù)據(jù)庫中檢索相關(guān)信息，然后利用這些信息來引導(dǎo)生成過程，極大地提升了內(nèi)容的準(zhǔn)確性和相關(guān)性。RAG 有效地緩解了幻覺問題，提高了知識(shí)更新的速度，并增強(qiáng)了內(nèi)容生成的可追溯性，使得大型語言模型在實(shí)際應(yīng)用中變得更加實(shí)用和可信。

一個(gè)典型的 RAG 的例子：

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

這里面主要包括包括三個(gè)基本步驟：

索引 — 將文檔庫分割成較短的 Chunk，并通過編碼器構(gòu)建向量索引。
檢索 — 根據(jù)問題和 chunks 的相似度檢索相關(guān)文檔片段。
生成 — 以檢索到的上下文為條件，生成問題的回答。

RAG（開卷考試）VS. Finetune（專業(yè)課程學(xué)習(xí)）

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

示例代碼： https://github.com/modelscope/modelscope/blob/master/examples/pytorch/application/qwen_doc_search_QA_based_on_langchain_llamaindex.ipynb

小編推薦閱讀

首頁

找游戲

游戲庫

開測(cè)表

搶禮包

看攻略

手游排行榜

新聞中心

游戲中心

熱門專區(qū)

熱門頻道

小編推薦

特色欄目

抖音熱游

一刀999

絕地吃雞

沙雕游戲

BT手游

經(jīng)典街機(jī)

真人互動(dòng)

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

1.多模態(tài)大模型推理

2. vLLM+FastChat 高效推理實(shí)戰(zhàn)

2.1 使用 FastChat 和 vLLM 實(shí)現(xiàn)發(fā)布 model worker(s)

2.2 LLM 的應(yīng)用場(chǎng)景：RAG

好特網(wǎng)發(fā)布此文僅為傳遞信息，不代表好特網(wǎng)認(rèn)同期限觀點(diǎn)或證實(shí)其描述。

相關(guān)視頻攻略

更多

同類最新

更多

熱門資訊

更多

更多

更多

首頁

找游戲

游戲庫

開測(cè)表

搶禮包

看攻略

手游排行榜

新聞中心

游戲中心

熱門專區(qū)

熱門頻道

小編推薦

特色欄目

抖音熱游

一刀999

絕地吃雞

沙雕游戲

BT手游

經(jīng)典街機(jī)

真人互動(dòng)

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

LLM 大模型學(xué)習(xí)必知必會(huì)系列(三)：LLM和多模態(tài)模型高效推理實(shí)踐

1.多模態(tài)大模型推理

2. vLLM+FastChat 高效推理實(shí)戰(zhàn)

2.1 使用 FastChat 和 vLLM 實(shí)現(xiàn)發(fā)布 model worker(s)

2.2 LLM 的應(yīng)用場(chǎng)景：RAG

好特網(wǎng)發(fā)布此文僅為傳遞信息，不代表好特網(wǎng)認(rèn)同期限觀點(diǎn)或證實(shí)其描述。

相關(guān)視頻攻略

更多

同類最新

更多

熱門資訊

更多

更多

更多

好特網(wǎng)發(fā)布此文僅為傳遞信息，不代表好特網(wǎng)認(rèn)同期限觀點(diǎn)或證實(shí)其描述。