Skip to content

feat: Add support for OpenAI-compatible LLM endpoints (Groq, Ollama, Azure OpenAI, etc.) #1198

@Shaamam

Description

@Shaamam

🔴 Required Information

Is your feature request related to a specific problem?

Yes. Currently, ADK-Java only supports Google's Gemini and Claude models out of the box. Developers cannot easily use other LLM
providers that implement the OpenAI Chat Completions API format, such as:

  • Groq (fast inference with open models)
  • Ollama (local models for offline development)
  • Azure OpenAI (enterprise deployments)
  • Perplexity (search-augmented responses)
  • Any custom OpenAI-compatible endpoint

Python ADK supports this via LiteLLM integration, but Java ADK lacks a native solution. This creates friction for developers who
want to:

  • Experiment with different model providers
  • Reduce inference costs by using alternative providers
  • Develop/test locally without external API calls
  • Support enterprise requirements (Azure OpenAI)

Describe the Solution You'd Like

Add a new OpenAiCompatibleLlm class that wraps the existing ChatCompletionsHttpClient to support any OpenAI-compatible endpoint
with a simple builder pattern.

Key capabilities:

  1. Builder pattern for easy configuration (baseUrl, headers, timeout)
  2. Pattern-based registry integration (e.g., groq-.*, ollama-.*)
  3. Reuses existing ChatCompletionsHttpClient infrastructure
  4. Non-streaming requests (matches current ChatCompletionsHttpClient capabilities)
  5. Comprehensive tests (unit + integration + manual verification)

Impact on your work

Impact Level: High

This feature is critical for:

  • My current project: Building a multi-model agent system that needs to switch between Gemini (complex reasoning) and Groq
    (fast responses) based on task type
  • Cost optimization: Groq is 10x cheaper than Gemini for simple tasks
  • Local development: Need Ollama support for offline testing without API costs

Timeline: Would like this in the next release if possible. Currently using a workaround with direct HTTP calls, but it's not
maintainable.

Willingness to contribute

Yes - I have already implemented this feature with:

  • ✅ Core OpenAiCompatibleLlm class with builder pattern
  • ✅ 13 unit tests (builder validation, registry integration, error handling)
  • ✅ Integration tests with Ollama
  • ✅ Manual verification with Groq API (tested successfully)
  • ✅ README documentation with examples
  • ✅ Follows Google Java Style Guide (google-java-format applied)
  • ✅ Aligned with CONTRIBUTING.md requirements

Can submit PR immediately if maintainers approve this approach.


🟡 Recommended Information

Describe Alternatives You've Considered

  1. Direct HTTP calls: Manually building requests to OpenAI-compatible endpoints

    • ❌ Doesn't integrate with LlmRegistry
    • ❌ No pattern matching for model resolution
    • ❌ Duplicates HTTP client logic
    • ❌ Not maintainable
  2. Separate library wrapper: Create external library for each provider

    • ❌ Fragments ecosystem
    • ❌ Each provider needs separate maintenance
    • ❌ Doesn't leverage existing ADK infrastructure
  3. Use Python ADK instead: Switch to Python for LiteLLM support

    • ❌ Not viable for Java-based projects
    • ❌ Team expertise is in Java
  4. Wait for official provider support: Request Google add each provider individually

    • ❌ Slow (requires coordination with each provider)
    • ❌ Doesn't scale to custom/internal endpoints

Why the proposed solution is better: Single implementation supports all OpenAI-compatible providers, reuses existing
infrastructure, and allows custom endpoints.

Proposed API / Implementation

// Example 1: Groq (fast inference)
OpenAiCompatibleLlm groq = OpenAiCompatibleLlm.builder()
    .baseUrl("https://api.groq.com/openai/v1/")
    .headers(ImmutableMap.of("Authorization", "Bearer " + apiKey))
    .modelName("llama-3.3-70b-versatile")
    .timeoutMillis(30_000)
    .build();

// Register pattern for model resolution
groq.registerWithPattern("groq-.*");

// Use with LlmAgent
LlmAgent agent = LlmAgent.builder()
    .model("groq-llama-3.3-70b-versatile")
    .instruction("You are a helpful assistant.")
    .build();

String response = agent.runAsync(invocationContext).blockingFirst();

// Example 2: Ollama (local models)
OpenAiCompatibleLlm ollama = OpenAiCompatibleLlm.builder()
    .baseUrl("http://localhost:11434/v1/")
    .headers(ImmutableMap.of())  // No auth for local
    .modelName("ollama-llama2")
    .build();

ollama.registerWithPattern("ollama-.*");

// Example 3: Azure OpenAI (enterprise)
OpenAiCompatibleLlm azure = OpenAiCompatibleLlm.builder()
    .baseUrl("https://<resource>.openai.azure.com/openai/deployments/<deployment>/")
    .headers(ImmutableMap.of("api-key", azureApiKey))
    .modelName("azure-gpt-4")
    .build();

azure.registerWithPattern("azure-.*");

Implementation Architecture:

  • OpenAiCompatibleLlm extends BaseLlm
  • Wraps existing ChatCompletionsHttpClient (no code duplication)
  • Uses LlmRegistry.registerLlm(pattern, factory) for pattern matching
  • Throws UnsupportedOperationException for live connections (OpenAI API limitation)

Files to be added/modified:
M README.md
A core/src/main/java/com/google/adk/models/OpenAiCompatibleLlm.java
A core/src/test/java/com/google/adk/models/OpenAiCompatibleLlmTest.java
A core/src/test/java/com/google/adk/models/OpenAiCompatibleLlmIntegrationTest.java
A core/src/test/java/com/google/adk/models/ManualGroqTest.java

Additional Context

Testing completed:

  • ✅ Unit tests: 13 tests, all passing
  • ✅ Integration tests: Ollama-based tests for real endpoints
  • ✅ Manual verification: Successfully tested with Groq API (llama-3.3-70b-versatile)
    • Simple completion test
    • Registry integration test
    • Multi-turn conversation with context retention

Design decisions:

  • Non-streaming only: Matches ChatCompletionsHttpClient capabilities (streaming can be added in future PR)
  • Pattern-based registration: Allows multiple providers with different prefixes (e.g., groq-, ollama-, azure-*)
  • No live connections: OpenAI Chat Completions API doesn't support bidirectional live connections

Questions for maintainers:

  1. Should we add streaming support in a follow-up PR, or include it now?
  2. Any concerns with the pattern-based registration approach?
  3. Should integration tests be tagged/skipped in CI (currently requires manual Ollama setup)?

Related: Addresses similar functionality to Python ADK's LiteLLM integration, but using native Java implementation that reuses
existing ADK infrastructure.


Implementation ready: I have a fully tested, working implementation and can submit a PR immediately upon approval of this approach.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions