Skip to content

Open WebUI support #527

@arlm

Description

@arlm

I could connect to Open WebUI, but it returns errors:

To run the server:

python run_inference_server.py -m models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf --host 127.0.0.1 --port 8080

Connect on Open WebUI as OpenAI API with the URL http://127.0.0.1:8080/v1, use a bearer token, and type anything as the token.

The model should be available for you on chats already, but when trying to talk I receive:

Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>

And the server dies with the error:

Error occurred while running command: Command '['build/bin/llama-server', '-m', 'models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf', '-c', '2048', '-t', '2', '-n', '4096', '-ngl', '0', '--temp', '0.8', '--host', '127.0.0.1', '--port', '8080', '--no-mmap', '-np', '1', '-b', '1', '-nocb']' died with <Signals.SIGBUS: 10>.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions