-
Notifications
You must be signed in to change notification settings - Fork 17
Adding scripts & Readme steps for vLLM based workloads over IBM LSF #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,270 @@ | ||
| IBM LSF for vLLM Persistent Inference Service | ||
| ========================================== | ||
|
|
||
| Overview | ||
| -------- | ||
| In this repository we demonstrate how to deploy a large-language model inference service on an LSF cluster using vLLM. The service exposes an OpenAI-compatible API. We show how various clients can use the model for interactive or batch inference. | ||
|
|
||
| What this implementation demonstrates | ||
| ------------------------------------- | ||
| - IBM LSF launching and managing a persistent inference runtime as a service job | ||
| - vLLM exposing an OpenAI-compatible endpoint | ||
| - endpoint discovery through a small registry file written by the service job | ||
| - interactive validation using curl and Jupyter | ||
| - downstream reuse through a separate IBM LSF batch job | ||
|
|
||
| Repository layout | ||
| ----------------- | ||
| - scripts/start_vllm_lsf.sh | ||
| Starts the vLLM container, waits for readiness, writes the registry file, and keeps the | ||
| service attached to the IBM LSF job lifecycle. | ||
| - scripts/resolve_endpoint.py | ||
| Reads the registry file for a given IBM LSF job ID and prints the resolved base URL. | ||
| - scripts/batch_client.py | ||
| Reads a prompt corpus and sends requests to the registered vLLM service. | ||
| - notebook/LSF_vLLM_Client.ipynb | ||
| Jupyter notebook for interactive validation against the IBM LSF-managed runtime. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no notebook subdirectory
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed in the latest commit |
||
| - corpus/prompts.txt | ||
| Sample prompt corpus for downstream batch validation. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no corpus subdirectory
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed in the latest commit |
||
|
|
||
| Prerequisites | ||
| ------------- | ||
| - IBM LSF installed and operational | ||
| - podman installed | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this must be installed on all compute nodes of the cluster right? Not sure whether we need to use the LSF podman integration? I guess likely not (which is fine)
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. We dont need LSF podmain integration |
||
| - python3 installed | ||
| - curl installed | ||
| - network access from the execution host to pull the vLLM image and model | ||
| - a single-node IBM LSF setup is sufficient for this implementation | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we also require a shared $HOME directory, correct? That is not strictly necessary for LSF, but is a common deployment.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added this in the latest README |
||
| - shared $HOME directory across the cluster | ||
|
|
||
| Note: | ||
| Replace "your-host" with the hostname or IP address of the system where the vLLM service is running. | ||
|
|
||
| The examples below assume you are running as the same user for all steps. | ||
|
|
||
| Get the repository and move into it | ||
| ----------------------------------- | ||
|
|
||
| ```bash | ||
| git clone https://github.com/IBMSpectrumComputing/lsf-integrations.git | ||
| cd lsf-integrations/LSF-vLLM | ||
| ``` | ||
| After this follow the instructions step by step given below. | ||
|
|
||
| Part 1: Deploy the LLM | ||
| ====================== | ||
|
|
||
| Step 1: Create the working directories | ||
| -------------------------------------- | ||
|
|
||
| ```bash | ||
| mkdir -p ~/lsf_vllm_poc/{logs,registry,cache,corpus,results,notebook} | ||
| ``` | ||
|
|
||
| ```bash | ||
| cp corpus/prompts.txt ~/lsf_vllm_poc/corpus/prompts.txt | ||
| ``` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest to include some lines to say to clone this repo, and cd into whatever base directory. Just make it easy for people to cut-and-paste lines so that they can reproduce this without having to think too much. Also, need to update corpus -> scripts
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have addressed it and added the below ..hope this is fine . Please verify git clone https://github.com/IBMSpectrumComputing/lsf-integrations.git
cd lsf-integrations/LSF-vLLMAfter this follow the instructions step by step given below. |
||
|
|
||
| ```bash | ||
| cp scripts/start_vllm_lsf.sh ~/lsf_vllm_poc/ | ||
| cp scripts/resolve_endpoint.py ~/lsf_vllm_poc/ | ||
| cp scripts/batch_client.py ~/lsf_vllm_poc/ | ||
| ``` | ||
|
|
||
| ```bash | ||
| chmod +x ~/lsf_vllm_poc/start_vllm_lsf.sh | ||
| chmod +x ~/lsf_vllm_poc/resolve_endpoint.py | ||
| chmod +x ~/lsf_vllm_poc/batch_client.py | ||
| ``` | ||
|
|
||
| Step 2: Review the service script defaults | ||
| ------------------------------------------ | ||
|
|
||
| ```bash | ||
| MODEL=Qwen/Qwen3-0.6B PORT=8001 API_KEY=local-vllm-key | ||
| ``` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how to do this? grep a line in one of the config files? sounds like the step should be to update the API_KEY. Where do users get this key from? Should this be a prerequisite?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added the below note, in the updated README. NOTE : The service script uses this value unless API_KEY is explicitly set before submission. |
||
|
|
||
| NOTE : | ||
| Default demo API key: local-vllm-key | ||
|
|
||
| The service script uses this value unless API_KEY is explicitly set before submission. | ||
| If you choose a different value, update the curl commands, notebook cells, and batch client inputs accordingly. | ||
|
|
||
| Step 3: Submit the persistent service job | ||
| ----------------------------------------- | ||
|
|
||
| ```bash | ||
| JOBID=$( | ||
| bsub -J vllm_service -q normal -n 1 -R 'rusage[mem=12GB]' -oo ~/lsf_vllm_poc/logs/vllm.%J.out -eo ~/lsf_vllm_poc/logs/vllm.%J.err ~/lsf_vllm_poc/start_vllm_lsf.sh | awk '{print $2}' | tr -d '<>' | ||
| ) | ||
|
|
||
| echo "Submitted service JOBID=$JOBID" | ||
| ``` | ||
|
|
||
| Step 4: Monitor the service startup | ||
| ----------------------------------- | ||
|
|
||
| ```bash | ||
| bjobs | ||
| bjobs -l ${JOBID} | ||
| bpeek ${JOBID} | ||
| ``` | ||
|
|
||
| ```bash | ||
| podman ps -a | grep vllm-job-${JOBID} | ||
| podman logs -f vllm-job-${JOBID} | ||
| ``` | ||
|
|
||
| Step 5: Wait for the registry file | ||
| ---------------------------------- | ||
|
|
||
| ```bash | ||
| until [[ -f ~/lsf_vllm_poc/registry/${JOBID}.json ]]; do | ||
| sleep 2 | ||
| done | ||
|
|
||
| cat ~/lsf_vllm_poc/registry/${JOBID}.json | ||
| ``` | ||
|
|
||
| Step 6: Resolve the endpoint | ||
| ---------------------------- | ||
|
|
||
| ```bash | ||
| python3 ~/lsf_vllm_poc/resolve_endpoint.py ${JOBID} | ||
| ``` | ||
|
|
||
| ```bash | ||
| ENDPOINT=$(python3 ~/lsf_vllm_poc/resolve_endpoint.py ${JOBID}) | ||
| echo "${ENDPOINT}" | ||
| ``` | ||
|
|
||
| Kill the LLM service | ||
| -------------------- | ||
|
|
||
| ```bash | ||
| bkill ${JOBID} | ||
| ``` | ||
|
|
||
| Part 2: Use the LLM | ||
| =================== | ||
|
|
||
| Use the LLM with curl | ||
| --------------------- | ||
|
|
||
| ```bash | ||
| curl -sS "${ENDPOINT}/models" -H "Authorization: Bearer local-vllm-key" | ||
| ``` | ||
|
|
||
| ```bash | ||
| curl -sS "${ENDPOINT}/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer local-vllm-key" -d '{ | ||
| "model": "Qwen/Qwen3-0.6B", | ||
| "messages": [ | ||
| {"role": "user", "content": "Explain the top 5 deserts in the world."} | ||
| ], | ||
| "temperature": 0, | ||
| "max_tokens": 120 | ||
| }' | ||
| ``` | ||
|
|
||
| Use the LLM from Jupyter | ||
| ------------------------ | ||
|
|
||
| Additional prerequisite: | ||
| - You must have SSH access from your laptop to the IBM LSF host where Jupyter will run. | ||
|
|
||
| Run the following commands on the IBM LSF host / cluster node: | ||
|
|
||
| ```bash | ||
| python3 -m venv ~/lsf_vllm_poc/notebook/.venv | ||
| source ~/lsf_vllm_poc/notebook/.venv/bin/activate | ||
| pip install --upgrade pip | ||
| pip install notebook jupyterlab requests openai ipykernel | ||
| python -m ipykernel install --user --name lsf-vllm --display-name "Python (lsf-vllm)" | ||
| ``` | ||
|
|
||
| Start Jupyter on the IBM LSF host / cluster node: | ||
|
|
||
| ```bash | ||
| jupyter notebook --no-browser --ip=0.0.0.0 --port 8888 --allow-root | ||
| ``` | ||
|
|
||
| Jupyter will print a URL containing a token. Keep that terminal running. | ||
|
|
||
| Run the following command on your laptop to create an SSH tunnel: | ||
|
|
||
| ```bash | ||
| ssh -L 8888:127.0.0.1:8888 user@your-host | ||
| ``` | ||
|
|
||
| Open the following URL in a web browser on your laptop: | ||
|
|
||
| ``` | ||
| http://127.0.0.1:8888 | ||
| ``` | ||
|
|
||
| When prompted, use the token printed by Jupyter on the IBM LSF host. | ||
|
|
||
| If the notebook kernel is running on the same IBM LSF host as the vLLM service, use the following base URL inside the notebook: | ||
|
|
||
| ``` | ||
| http://127.0.0.1:8001/v1 | ||
| ``` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for this one, looks like you are starting the notebook on the cluster node, and then connecting from the laptop through ssh tunnel. You should mention which host each command gets run on (laptop vs. LSF compute host) and also for the URL to use that in the web browser. Also mention that a prerequisite for this is to have ssh access to a cluster node.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the README with these steps explaining where to run the commands |
||
|
|
||
| In this flow, the browser runs on the laptop, but the notebook kernel runs on the IBM LSF host. That is why the notebook can access the local vLLM endpoint at `http://127.0.0.1:8001/v1`. | ||
|
|
||
| Use the LLM from an IBM LSF batch job | ||
| ------------------------------------- | ||
|
|
||
| ```bash | ||
| python3 ~/lsf_vllm_poc/batch_client.py ${JOBID} ~/lsf_vllm_poc/corpus/prompts.txt | ||
| ``` | ||
|
|
||
| ```bash | ||
| cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl | ||
| ``` | ||
|
|
||
| ```bash | ||
| BATCH_JOBID=$( | ||
| bsub -J vllm_batch -q normal -n 1 -R 'rusage[mem=1GB]' -oo ~/lsf_vllm_poc/logs/batch.%J.out -eo ~/lsf_vllm_poc/logs/batch.%J.err "python3 ~/lsf_vllm_poc/batch_client.py ${JOBID} ~/lsf_vllm_poc/corpus/prompts.txt" | awk '{print $2}' | tr -d '<>' | ||
| ) | ||
|
|
||
| echo "Submitted batch JOBID=$BATCH_JOBID" | ||
| ``` | ||
|
|
||
| ```bash | ||
| bjobs | ||
| bpeek ${BATCH_JOBID} | ||
| cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl | ||
| ``` | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overall, I suggest to break this into a few sections: (1) Deploy the LLM
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please review the new Restructured readme file. |
||
|
|
||
| Use the LLM with Open WebUI (Linux) | ||
| ----------------------------------- | ||
|
|
||
| ```bash | ||
| podman run -d -p 3000:8080 -e OPENAI_API_BASE_URL=http://your-host:8001/v1 -e OPENAI_API_KEY=local-vllm-key -e WEBUI_SECRET_KEY=my-openwebui-secret -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main | ||
| ``` | ||
|
|
||
| ```bash | ||
| podman ps | ||
| podman logs -f open-webui | ||
| ``` | ||
|
|
||
| http://localhost:3000 | ||
|
|
||
| - Settings → Connections → OpenAI | ||
| - Base URL: http://your-host:8001/v1 | ||
| - API Key: local-vllm-key | ||
|
|
||
| Model: | ||
| Qwen/Qwen3-0.6B | ||
|
|
||
| Test: | ||
| Say one short line about LSF-managed model serving. | ||
|
|
||
| Cleanup | ||
| ------- | ||
|
|
||
| ```bash | ||
| bkill ${BATCH_JOBID} | ||
| bkill ${JOBID} | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Why do rivers flow towards the sea? | ||
| What is the difference between a lion and a tiger? | ||
| How do fish survive in water without breathing air like humans? |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# IBM LSF + vLLM Notebook Validation\\n", | ||
| "Run cells top to bottom. Update values if your endpoint differs." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import requests\\n", | ||
| "base_url = 'http://127.0.0.1:8001/v1'\\n", | ||
| "api_key = 'local-vllm-key'\\n", | ||
| "model = 'Qwen/Qwen3-0.6B'\\n", | ||
| "print(base_url)\\n", | ||
| "print(model)\\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "resp = requests.get(\\n", | ||
| " f'{base_url}/models',\\n", | ||
| " headers={'Authorization': f'Bearer {api_key}'},\\n", | ||
| " timeout=60,\\n", | ||
| ")\\n", | ||
| "print(resp.status_code)\\n", | ||
| "print(resp.json())\\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "payload = {\\n", | ||
| " 'model': model,\\n", | ||
| " 'messages': [{'role': 'user', 'content': 'Explain the top 5 deserts in the world.'}],\\n", | ||
| " 'temperature': 0,\\n", | ||
| " 'max_tokens': 120,\\n", | ||
| " 'chat_template_kwargs': {'enable_thinking': False},\\n", | ||
| "}\\n", | ||
| "resp = requests.post(\\n", | ||
| " f'{base_url}/chat/completions',\\n", | ||
| " headers={'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json'},\\n", | ||
| " json=payload,\\n", | ||
| " timeout=120,\\n", | ||
| ")\\n", | ||
| "print(resp.status_code)\\n", | ||
| "data = resp.json()\\n", | ||
| "print(data['choices'][0]['message']['content'])\\n" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "from openai import OpenAI\\n", | ||
| "client = OpenAI(base_url=base_url, api_key=api_key)\\n", | ||
| "resp = client.chat.completions.create(\\n", | ||
| " model=model,\\n", | ||
| " messages=[{'role': 'user', 'content': 'Explain the top 5 deserts in the world.'}],\\n", | ||
| " temperature=0,\\n", | ||
| " max_tokens=120,\\n", | ||
| ")\\n", | ||
| "print(resp.choices[0].message.content)\\n" | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python (lsf-vllm)", | ||
| "language": "python", | ||
| "name": "python3" | ||
| }, | ||
| "language_info": { | ||
| "name": "python", | ||
| "version": "3.9" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little wordsmithing:
In this repository we demonstrate how to deploy a large-language model inference service on an LSF cluster using vLLM. The service exposes an OpenAI-compatible API. We show how various clients can use the model for interactive or batch inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the latest commit