You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

46 lines
1.9 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

1、利用vllm可以显著推理加速大模型
conda create -n vllm python=3.10
conda activate vllm
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
2、启动推理
python -m vllm.entrypoints.openai.api_server --tensor-parallel-size=1 --trust-remote-code --max-model-len 1024 --model THUDM/chatglm3-6b
指定ip和端口--host 127.0.0.1 --port 8101
python -m vllm.entrypoints.openai.api_server --port 8101 --tensor-parallel-size=1 --trust-remote-code --max-model-len 1024 --model THUDM/chatglm3-6b
3、测试
curl http://127.0.0.1:8101/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "THUDM/chatglm3-6b",
"prompt": "请用20字内回复我,你今年多大了",
"max_tokens": 20,
"temperature": 0
}'
多轮对话
curl -X POST "http://127.0.0.1:8101/v1/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"THUDM/chatglm3-6b\",\"prompt\": \"你叫什么名字\", \"history\": [{\"role\": \"user\", \"content\": \"你出生在哪里.\"}, {\"role\": \"assistant\", \"content\": \"出生在北京\"}]}"
多轮对话
curl -X POST "http://127.0.0.1:8101/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{\"model\": \"THUDM/chatglm3-6b\", \"messages\": [{\"role\": \"system\", \"content\": \"You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.\"}, {\"role\": \"user\", \"content\": \"你好给我讲一个故事大概100字\"}], \"stream\": false, \"max_tokens\": 100, \"temperature\": 0.8, \"top_p\": 0.8}"
4、启动前端访问
docker run -d \
--network=host \
--name nginx2 --restart=always \
-v $PWD/nginx/conf/nginx.conf:/etc/nginx/nginx.conf \
-v $PWD/nginx/html:/usr/share/nginx/html \
-v $PWD/nginx/logs:/var/log/nginx \
--privileged=true \
--restart=always \
nginx
参考文档https://docs.vllm.ai/en/latest/