Open WebUI support

I could connect to Open WebUI, but it returns errors:  

To run the server: 

```bash
python run_inference_server.py -m models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf --host 127.0.0.1 --port 8080
``` 

Connect on Open WebUI as OpenAI API with the URL `http://127.0.0.1:8080/v1`, use a bearer token, and type anything as the token.

The model should be available for you on chats already, but when trying to talk I receive:

`Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>`

And the server dies with the error:

```
Error occurred while running command: Command '['build/bin/llama-server', '-m', 'models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf', '-c', '2048', '-t', '2', '-n', '4096', '-ngl', '0', '--temp', '0.8', '--host', '127.0.0.1', '--port', '8080', '--no-mmap', '-np', '1', '-b', '1', '-nocb']' died with <Signals.SIGBUS: 10>.
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open WebUI support #527

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Open WebUI support #527

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions