Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions LSF-vLLM/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
IBM LSF for vLLM Persistent Inference Service
==========================================

Overview
--------
In this repository we demonstrate how to deploy a large-language model inference service on an LSF cluster using vLLM. The service exposes an OpenAI-compatible API. We show how various clients can use the model for interactive or batch inference.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little wordsmithing:

In this repository we demonstrate how to deploy a large-language model inference service on an LSF cluster using vLLM. The service exposes an OpenAI-compatible API. We show how various clients can use the model for interactive or batch inference.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

What this implementation demonstrates
-------------------------------------
- IBM LSF launching and managing a persistent inference runtime as a service job
- vLLM exposing an OpenAI-compatible endpoint
- endpoint discovery through a small registry file written by the service job
- interactive validation using curl and Jupyter
- downstream reuse through a separate IBM LSF batch job

Repository layout
-----------------
- scripts/start_vllm_lsf.sh
Starts the vLLM container, waits for readiness, writes the registry file, and keeps the
service attached to the IBM LSF job lifecycle.
- scripts/resolve_endpoint.py
Reads the registry file for a given IBM LSF job ID and prints the resolved base URL.
- scripts/batch_client.py
Reads a prompt corpus and sends requests to the registered vLLM service.
- notebook/LSF_vLLM_Client.ipynb
Jupyter notebook for interactive validation against the IBM LSF-managed runtime.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no notebook subdirectory

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit

- corpus/prompts.txt
Sample prompt corpus for downstream batch validation.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no corpus subdirectory

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit


Prerequisites
-------------
- IBM LSF installed and operational
- podman installed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this must be installed on all compute nodes of the cluster right? Not sure whether we need to use the LSF podman integration? I guess likely not (which is fine)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We dont need LSF podmain integration

- python3 installed
- curl installed
- network access from the execution host to pull the vLLM image and model
- a single-node IBM LSF setup is sufficient for this implementation
Copy link
Copy Markdown

@michaelspriggs michaelspriggs Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also require a shared $HOME directory, correct? That is not strictly necessary for LSF, but is a common deployment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this in the latest README

- shared $HOME directory across the cluster

Note:
Replace "your-host" with the hostname or IP address of the system where the vLLM service is running.

The examples below assume you are running as the same user for all steps.

Get the repository and move into it
-----------------------------------

```bash
git clone https://github.com/IBMSpectrumComputing/lsf-integrations.git
cd lsf-integrations/LSF-vLLM
```
After this follow the instructions step by step given below.

Part 1: Deploy the LLM
======================

Step 1: Create the working directories
--------------------------------------

```bash
mkdir -p ~/lsf_vllm_poc/{logs,registry,cache,corpus,results,notebook}
```

```bash
cp corpus/prompts.txt ~/lsf_vllm_poc/corpus/prompts.txt
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to include some lines to say to clone this repo, and cd into whatever base directory. Just make it easy for people to cut-and-paste lines so that they can reproduce this without having to think too much.

Also, need to update corpus -> scripts

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed it and added the below ..hope this is fine . Please verify

git clone https://github.com/IBMSpectrumComputing/lsf-integrations.git
cd lsf-integrations/LSF-vLLM

After this follow the instructions step by step given below.


```bash
cp scripts/start_vllm_lsf.sh ~/lsf_vllm_poc/
cp scripts/resolve_endpoint.py ~/lsf_vllm_poc/
cp scripts/batch_client.py ~/lsf_vllm_poc/
```

```bash
chmod +x ~/lsf_vllm_poc/start_vllm_lsf.sh
chmod +x ~/lsf_vllm_poc/resolve_endpoint.py
chmod +x ~/lsf_vllm_poc/batch_client.py
```

Step 2: Review the service script defaults
------------------------------------------

```bash
MODEL=Qwen/Qwen3-0.6B PORT=8001 API_KEY=local-vllm-key
```
Copy link
Copy Markdown

@michaelspriggs michaelspriggs Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to do this? grep a line in one of the config files?

sounds like the step should be to update the API_KEY. Where do users get this key from? Should this be a prerequisite?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the below note, in the updated README.

NOTE :
Default demo API key: local-vllm-key

The service script uses this value unless API_KEY is explicitly set before submission.
If you choose a different value, update the curl commands, notebook cells, and batch client inputs accordingly.


NOTE :
Default demo API key: local-vllm-key

The service script uses this value unless API_KEY is explicitly set before submission.
If you choose a different value, update the curl commands, notebook cells, and batch client inputs accordingly.

Step 3: Submit the persistent service job
-----------------------------------------

```bash
JOBID=$(
bsub -J vllm_service -q normal -n 1 -R 'rusage[mem=12GB]' -oo ~/lsf_vllm_poc/logs/vllm.%J.out -eo ~/lsf_vllm_poc/logs/vllm.%J.err ~/lsf_vllm_poc/start_vllm_lsf.sh | awk '{print $2}' | tr -d '<>'
)

echo "Submitted service JOBID=$JOBID"
```

Step 4: Monitor the service startup
-----------------------------------

```bash
bjobs
bjobs -l ${JOBID}
bpeek ${JOBID}
```

```bash
podman ps -a | grep vllm-job-${JOBID}
podman logs -f vllm-job-${JOBID}
```

Step 5: Wait for the registry file
----------------------------------

```bash
until [[ -f ~/lsf_vllm_poc/registry/${JOBID}.json ]]; do
sleep 2
done

cat ~/lsf_vllm_poc/registry/${JOBID}.json
```

Step 6: Resolve the endpoint
----------------------------

```bash
python3 ~/lsf_vllm_poc/resolve_endpoint.py ${JOBID}
```

```bash
ENDPOINT=$(python3 ~/lsf_vllm_poc/resolve_endpoint.py ${JOBID})
echo "${ENDPOINT}"
```

Kill the LLM service
--------------------

```bash
bkill ${JOBID}
```

Part 2: Use the LLM
===================

Use the LLM with curl
---------------------

```bash
curl -sS "${ENDPOINT}/models" -H "Authorization: Bearer local-vllm-key"
```

```bash
curl -sS "${ENDPOINT}/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer local-vllm-key" -d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{"role": "user", "content": "Explain the top 5 deserts in the world."}
],
"temperature": 0,
"max_tokens": 120
}'
```

Use the LLM from Jupyter
------------------------

Additional prerequisite:
- You must have SSH access from your laptop to the IBM LSF host where Jupyter will run.

Run the following commands on the IBM LSF host / cluster node:

```bash
python3 -m venv ~/lsf_vllm_poc/notebook/.venv
source ~/lsf_vllm_poc/notebook/.venv/bin/activate
pip install --upgrade pip
pip install notebook jupyterlab requests openai ipykernel
python -m ipykernel install --user --name lsf-vllm --display-name "Python (lsf-vllm)"
```

Start Jupyter on the IBM LSF host / cluster node:

```bash
jupyter notebook --no-browser --ip=0.0.0.0 --port 8888 --allow-root
```

Jupyter will print a URL containing a token. Keep that terminal running.

Run the following command on your laptop to create an SSH tunnel:

```bash
ssh -L 8888:127.0.0.1:8888 user@your-host
```

Open the following URL in a web browser on your laptop:

```
http://127.0.0.1:8888
```

When prompted, use the token printed by Jupyter on the IBM LSF host.

If the notebook kernel is running on the same IBM LSF host as the vLLM service, use the following base URL inside the notebook:

```
http://127.0.0.1:8001/v1
```
Copy link
Copy Markdown

@michaelspriggs michaelspriggs Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this one, looks like you are starting the notebook on the cluster node, and then connecting from the laptop through ssh tunnel.

You should mention which host each command gets run on (laptop vs. LSF compute host) and also for the URL to use that in the web browser.

Also mention that a prerequisite for this is to have ssh access to a cluster node.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the README with these steps explaining where to run the commands


In this flow, the browser runs on the laptop, but the notebook kernel runs on the IBM LSF host. That is why the notebook can access the local vLLM endpoint at `http://127.0.0.1:8001/v1`.

Use the LLM from an IBM LSF batch job
-------------------------------------

```bash
python3 ~/lsf_vllm_poc/batch_client.py ${JOBID} ~/lsf_vllm_poc/corpus/prompts.txt
```

```bash
cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl
```

```bash
BATCH_JOBID=$(
bsub -J vllm_batch -q normal -n 1 -R 'rusage[mem=1GB]' -oo ~/lsf_vllm_poc/logs/batch.%J.out -eo ~/lsf_vllm_poc/logs/batch.%J.err "python3 ~/lsf_vllm_poc/batch_client.py ${JOBID} ~/lsf_vllm_poc/corpus/prompts.txt" | awk '{print $2}' | tr -d '<>'
)

echo "Submitted batch JOBID=$BATCH_JOBID"
```

```bash
bjobs
bpeek ${BATCH_JOBID}
cat ~/lsf_vllm_poc/results/batch_${JOBID}.jsonl
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I suggest to break this into a few sections:

(1) Deploy the LLM

  • deploy
  • monitor
  • kill
    (2) Use the LLM
  • curl
  • Jupyter
  • LSF job

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the new Restructured readme file.


Use the LLM with Open WebUI (Linux)
-----------------------------------

```bash
podman run -d -p 3000:8080 -e OPENAI_API_BASE_URL=http://your-host:8001/v1 -e OPENAI_API_KEY=local-vllm-key -e WEBUI_SECRET_KEY=my-openwebui-secret -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main
```

```bash
podman ps
podman logs -f open-webui
```

http://localhost:3000

- Settings → Connections → OpenAI
- Base URL: http://your-host:8001/v1
- API Key: local-vllm-key

Model:
Qwen/Qwen3-0.6B

Test:
Say one short line about LSF-managed model serving.

Cleanup
-------

```bash
bkill ${BATCH_JOBID}
bkill ${JOBID}
```
3 changes: 3 additions & 0 deletions LSF-vLLM/corpus/prompts.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Why do rivers flow towards the sea?
What is the difference between a lion and a tiger?
How do fish survive in water without breathing air like humans?
95 changes: 95 additions & 0 deletions LSF-vLLM/notebook/LSF_vLLM_Client.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# IBM LSF + vLLM Notebook Validation\\n",
"Run cells top to bottom. Update values if your endpoint differs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\\n",
"base_url = 'http://127.0.0.1:8001/v1'\\n",
"api_key = 'local-vllm-key'\\n",
"model = 'Qwen/Qwen3-0.6B'\\n",
"print(base_url)\\n",
"print(model)\\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"resp = requests.get(\\n",
" f'{base_url}/models',\\n",
" headers={'Authorization': f'Bearer {api_key}'},\\n",
" timeout=60,\\n",
")\\n",
"print(resp.status_code)\\n",
"print(resp.json())\\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"payload = {\\n",
" 'model': model,\\n",
" 'messages': [{'role': 'user', 'content': 'Explain the top 5 deserts in the world.'}],\\n",
" 'temperature': 0,\\n",
" 'max_tokens': 120,\\n",
" 'chat_template_kwargs': {'enable_thinking': False},\\n",
"}\\n",
"resp = requests.post(\\n",
" f'{base_url}/chat/completions',\\n",
" headers={'Authorization': f'Bearer {api_key}', 'Content-Type': 'application/json'},\\n",
" json=payload,\\n",
" timeout=120,\\n",
")\\n",
"print(resp.status_code)\\n",
"data = resp.json()\\n",
"print(data['choices'][0]['message']['content'])\\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from openai import OpenAI\\n",
"client = OpenAI(base_url=base_url, api_key=api_key)\\n",
"resp = client.chat.completions.create(\\n",
" model=model,\\n",
" messages=[{'role': 'user', 'content': 'Explain the top 5 deserts in the world.'}],\\n",
" temperature=0,\\n",
" max_tokens=120,\\n",
")\\n",
"print(resp.choices[0].message.content)\\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (lsf-vllm)",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading