Integration: vLLM Invocation Layer

Use the vLLM inference engine with Haystack

Authors
Lukas Kreussel

PyPI - Version PyPI - Python Version

Simply use vLLM in your haystack pipeline, to utilize fast, self-hosted LLMs.

vLLM Haystack

Table of Contents

  • Overview
  • Haystack 2.0
    • Installation
    • Usage
  • Haystack 1.x
    • Installation (1.x)
    • Usage (1.x)

Overview

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It is an open-source project that allows serving open models in production, when you have GPU resources available.

For Haystack 1.x, the integration is available as a separate package, while for Haystack 2.x, the integration comes out of the box.

Haystack 2.x

vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used with the OpenAIGenerator and OpenAIChatGenerator components in Haystack.

For an end-to-end example of vLLM + Haystack 2.x, see this notebook.

Installation

vLLM should be installed.

  • you can use pip: pip install vllm (more information in the vLLM documentation)
  • for production use cases, there are many other options, including Docker ( docs)

Usage

You first need to run an vLLM OpenAI-compatible server. You can do that using Python or Docker.

Then, you can use the OpenAIGenerator and OpenAIChatGenerator components in Haystack to query the vLLM server.

from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack.utils import Secret

generator = OpenAIChatGenerator(
    api_key=Secret.from_token("VLLM-PLACEHOLDER-API-KEY"),  # for compatibility with the OpenAI API, a placeholder api_key is needed
    model="mistralai/Mistral-7B-Instruct-v0.1",
    api_base_url="http://localhost:8000/v1",
    generation_kwargs = {"max_tokens": 512}
)

response = generator.run(messages=[ChatMessage.from_user("Hi. Can you help me plan my next trip to Italy?")])

Haystack 1.x

Installation (1.x)

Install the wrapper via pip: pip install vllm-haystack

Usage (1.x)

This integration provides two invocation layers:

  • vLLMInvocationLayer: To use models hosted on a vLLM server
  • vLLMLocalInvocationLayer: To use locally hosted vLLM models

Use a Model Hosted on a vLLM Server

To utilize the wrapper the vLLMInvocationLayer has to be used.

Here is a simple example of how a PromptNode can be created with the wrapper.

from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMInvocationLayer


model = PromptModel(model_name_or_path="", invocation_layer_class=vLLMInvocationLayer, max_length=256, api_key="EMPTY", model_kwargs={
        "api_base" : API, # Replace this with your API-URL
        "maximum_context_length": 2048,
    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

The model will be inferred based on the model served on the vLLM server. For more configuration examples, take a look at the unit-tests.

Hosting a vLLM Server

To create an OpenAI-Compatible Server via vLLM you can follow the steps in the Quickstart section of their documentation.

Use a Model Hosted Locally

⚠️To run vLLM locally you need to have vllm installed and a supported GPU.

If you don’t want to use an API-Server this wrapper also provides a vLLMLocalInvocationLayer which executes the vLLM on the same node Haystack is running on.

Here is a simple example of how a PromptNode can be created with the vLLMLocalInvocationLayer.

from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMLocalInvocationLayer

model = PromptModel(model_name_or_path=MODEL, invocation_layer_class=vLLMLocalInvocationLayer, max_length=256, model_kwargs={
        "maximum_context_length": 2048,
    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

玻璃钢生产厂家无锡玻璃钢人物雕塑定制绍兴玻璃钢卡通雕塑定制鸡西玻璃钢雕塑定做玻璃钢子母座椅雕塑定制工业玻璃钢花盆费用安庆定制玻璃钢雕塑铜山玻璃钢花盆花器订购罗马柱玻璃钢雕塑玻璃钢雕塑材质质感江苏拉丝玻璃钢雕塑优质商家玻璃钢模型雕塑制作厂湘潭玻璃钢南瓜屋雕塑玻璃钢小品人物雕塑安徽欧式玻璃钢雕塑仿铜玻璃钢雕塑造型广东节日商场美陈铁岭玻璃钢雕塑制作厂家钟祥玻璃钢胸像雕塑玻璃钢熊出没卡通雕塑马鞍山动物玻璃钢雕塑厂家天津玻璃钢雕塑制造厂家玻璃钢花盆栽手绘玻璃钢雕塑酒图片西宁玻璃钢雕塑销售泡沫玻璃钢雕塑厂家玻璃钢花盆简笔画ins扬州玻璃钢雕塑厂家新安玻璃钢雕塑厂家岳阳湖南玻璃钢雕塑厂家哪家好玻璃钢雕塑运输注意事项香港通过《维护国家安全条例》两大学生合买彩票中奖一人不认账让美丽中国“从细节出发”19岁小伙救下5人后溺亡 多方发声单亲妈妈陷入热恋 14岁儿子报警汪小菲曝离婚始末遭遇山火的松茸之乡雅江山火三名扑火人员牺牲系谣言何赛飞追着代拍打萧美琴窜访捷克 外交部回应卫健委通报少年有偿捐血浆16次猝死手机成瘾是影响睡眠质量重要因素高校汽车撞人致3死16伤 司机系学生315晚会后胖东来又人满为患了小米汽车超级工厂正式揭幕中国拥有亿元资产的家庭达13.3万户周杰伦一审败诉网易男孩8年未见母亲被告知被遗忘许家印被限制高消费饲养员用铁锨驱打大熊猫被辞退男子被猫抓伤后确诊“猫抓病”特朗普无法缴纳4.54亿美元罚金倪萍分享减重40斤方法联合利华开始重组张家界的山上“长”满了韩国人?张立群任西安交通大学校长杨倩无缘巴黎奥运“重生之我在北大当嫡校长”黑马情侣提车了专访95后高颜值猪保姆考生莫言也上北大硕士复试名单了网友洛杉矶偶遇贾玲专家建议不必谈骨泥色变沉迷短剧的人就像掉进了杀猪盘奥巴马现身唐宁街 黑色着装引猜测七年后宇文玥被薅头发捞上岸事业单位女子向同事水杯投不明物质凯特王妃现身!外出购物视频曝光河南驻马店通报西平中学跳楼事件王树国卸任西安交大校长 师生送别恒大被罚41.75亿到底怎么缴男子被流浪猫绊倒 投喂者赔24万房客欠租失踪 房东直发愁西双版纳热带植物园回应蜉蝣大爆发钱人豪晒法院裁定实锤抄袭外国人感慨凌晨的中国很安全胖东来员工每周单休无小长假白宫:哈马斯三号人物被杀测试车高速逃费 小米:已补缴老人退休金被冒领16年 金额超20万

玻璃钢生产厂家 XML地图 TXT地图 虚拟主机 SEO 网站制作 网站优化