feat: Add support for OpenAI-compatible LLM endpoints (Groq, Ollama, Azure OpenAI, etc.)

## 🔴 Required Information

  ### Is your feature request related to a specific problem?

  Yes. Currently, ADK-Java only supports Google's Gemini and Claude models out of the box. Developers cannot easily use other LLM
  providers that implement the OpenAI Chat Completions API format, such as:

  - **Groq** (fast inference with open models)
  - **Ollama** (local models for offline development)
  - **Azure OpenAI** (enterprise deployments)
  - **Perplexity** (search-augmented responses)
  - Any custom OpenAI-compatible endpoint

  Python ADK supports this via LiteLLM integration, but Java ADK lacks a native solution. This creates friction for developers who
  want to:
  - Experiment with different model providers
  - Reduce inference costs by using alternative providers
  - Develop/test locally without external API calls
  - Support enterprise requirements (Azure OpenAI)

  ### Describe the Solution You'd Like

  Add a new `OpenAiCompatibleLlm` class that wraps the existing `ChatCompletionsHttpClient` to support any OpenAI-compatible endpoint
   with a simple builder pattern.

  **Key capabilities:**
  1. Builder pattern for easy configuration (baseUrl, headers, timeout)
  2. Pattern-based registry integration (e.g., `groq-.*`, `ollama-.*`)
  3. Reuses existing `ChatCompletionsHttpClient` infrastructure
  4. Non-streaming requests (matches current `ChatCompletionsHttpClient` capabilities)
  5. Comprehensive tests (unit + integration + manual verification)

  ### Impact on your work

  **Impact Level**: High

  This feature is critical for:
  - **My current project**: Building a multi-model agent system that needs to switch between Gemini (complex reasoning) and Groq
  (fast responses) based on task type
  - **Cost optimization**: Groq is 10x cheaper than Gemini for simple tasks
  - **Local development**: Need Ollama support for offline testing without API costs

  **Timeline**: Would like this in the next release if possible. Currently using a workaround with direct HTTP calls, but it's not
  maintainable.

  ### Willingness to contribute

  **Yes** - I have already implemented this feature with:
  - ✅ Core `OpenAiCompatibleLlm` class with builder pattern
  - ✅ 13 unit tests (builder validation, registry integration, error handling)
  - ✅ Integration tests with Ollama
  - ✅ Manual verification with Groq API (tested successfully)
  - ✅ README documentation with examples
  - ✅ Follows Google Java Style Guide (google-java-format applied)
  - ✅ Aligned with CONTRIBUTING.md requirements

  **Can submit PR immediately if maintainers approve this approach.**

  ---

  ## 🟡 Recommended Information

  ### Describe Alternatives You've Considered

  1. **Direct HTTP calls**: Manually building requests to OpenAI-compatible endpoints
     - ❌ Doesn't integrate with `LlmRegistry`
     - ❌ No pattern matching for model resolution
     - ❌ Duplicates HTTP client logic
     - ❌ Not maintainable

  2. **Separate library wrapper**: Create external library for each provider
     - ❌ Fragments ecosystem
     - ❌ Each provider needs separate maintenance
     - ❌ Doesn't leverage existing ADK infrastructure

  3. **Use Python ADK instead**: Switch to Python for LiteLLM support
     - ❌ Not viable for Java-based projects
     - ❌ Team expertise is in Java

  4. **Wait for official provider support**: Request Google add each provider individually
     - ❌ Slow (requires coordination with each provider)
     - ❌ Doesn't scale to custom/internal endpoints

  **Why the proposed solution is better**: Single implementation supports all OpenAI-compatible providers, reuses existing
  infrastructure, and allows custom endpoints.

  ### Proposed API / Implementation

  ```java
  // Example 1: Groq (fast inference)
  OpenAiCompatibleLlm groq = OpenAiCompatibleLlm.builder()
      .baseUrl("https://api.groq.com/openai/v1/")
      .headers(ImmutableMap.of("Authorization", "Bearer " + apiKey))
      .modelName("llama-3.3-70b-versatile")
      .timeoutMillis(30_000)
      .build();

  // Register pattern for model resolution
  groq.registerWithPattern("groq-.*");

  // Use with LlmAgent
  LlmAgent agent = LlmAgent.builder()
      .model("groq-llama-3.3-70b-versatile")
      .instruction("You are a helpful assistant.")
      .build();

  String response = agent.runAsync(invocationContext).blockingFirst();

  // Example 2: Ollama (local models)
  OpenAiCompatibleLlm ollama = OpenAiCompatibleLlm.builder()
      .baseUrl("http://localhost:11434/v1/")
      .headers(ImmutableMap.of())  // No auth for local
      .modelName("ollama-llama2")
      .build();

  ollama.registerWithPattern("ollama-.*");

  // Example 3: Azure OpenAI (enterprise)
  OpenAiCompatibleLlm azure = OpenAiCompatibleLlm.builder()
      .baseUrl("https://<resource>.openai.azure.com/openai/deployments/<deployment>/")
      .headers(ImmutableMap.of("api-key", azureApiKey))
      .modelName("azure-gpt-4")
      .build();

  azure.registerWithPattern("azure-.*");
```
  Implementation Architecture:
  - OpenAiCompatibleLlm extends BaseLlm
  - Wraps existing ChatCompletionsHttpClient (no code duplication)
  - Uses LlmRegistry.registerLlm(pattern, factory) for pattern matching
  - Throws UnsupportedOperationException for live connections (OpenAI API limitation)

  Files to be added/modified:
  M  README.md
  A  core/src/main/java/com/google/adk/models/OpenAiCompatibleLlm.java
  A  core/src/test/java/com/google/adk/models/OpenAiCompatibleLlmTest.java
  A  core/src/test/java/com/google/adk/models/OpenAiCompatibleLlmIntegrationTest.java
  A  core/src/test/java/com/google/adk/models/ManualGroqTest.java

  Additional Context

  Testing completed:
  - ✅ Unit tests: 13 tests, all passing
  - ✅ Integration tests: Ollama-based tests for real endpoints
  - ✅ Manual verification: Successfully tested with Groq API (llama-3.3-70b-versatile)
    - Simple completion test
    - Registry integration test
    - Multi-turn conversation with context retention

  Design decisions:
  - Non-streaming only: Matches ChatCompletionsHttpClient capabilities (streaming can be added in future PR)
  - Pattern-based registration: Allows multiple providers with different prefixes (e.g., groq-*, ollama-*, azure-*)
  - No live connections: OpenAI Chat Completions API doesn't support bidirectional live connections

  Questions for maintainers:
  1. Should we add streaming support in a follow-up PR, or include it now?
  2. Any concerns with the pattern-based registration approach?
  3. Should integration tests be tagged/skipped in CI (currently requires manual Ollama setup)?

  Related: Addresses similar functionality to Python ADK's LiteLLM integration, but using native Java implementation that reuses
  existing ADK infrastructure.

  ---
  Implementation ready: I have a fully tested, working implementation and can submit a PR immediately upon approval of this approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support for OpenAI-compatible LLM endpoints (Groq, Ollama, Azure OpenAI, etc.) #1198

🔴 Required Information

Is your feature request related to a specific problem?

Describe the Solution You'd Like

Impact on your work

Willingness to contribute

🟡 Recommended Information

Describe Alternatives You've Considered

Proposed API / Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Add support for OpenAI-compatible LLM endpoints (Groq, Ollama, Azure OpenAI, etc.) #1198

Description

🔴 Required Information

Is your feature request related to a specific problem?

Describe the Solution You'd Like

Impact on your work

Willingness to contribute

🟡 Recommended Information

Describe Alternatives You've Considered

Proposed API / Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions