Skip to content

SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming"#4320

Draft
vyatkinv wants to merge 3 commits intoapache:mainfrom
vyatkinv:SOLR-18130-solrj-streaming-solr-cloud-parameter
Draft

SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming"#4320
vyatkinv wants to merge 3 commits intoapache:mainfrom
vyatkinv:SOLR-18130-solrj-streaming-solr-cloud-parameter

Conversation

@vyatkinv
Copy link
Copy Markdown
Contributor

@vyatkinv vyatkinv commented Apr 22, 2026

https://issues.apache.org/jira/browse/SOLR-18130

Description

This PR updates the solrj-streaming module by replacing all usages of zkHost with solrCloud to enable support for HTTP-based quorum configurations.

Solution

Parameters, fields and variables in solrj-streaming, named as zkHost renamed to solrCloud
For backward compatibility, specifying zkHost is still supported.
A shared method has been introduced in an abstract class to resolve the effective solrCloud value using a priority-based approach (e.g., explicit parameter → legacy zkHost → default Zookeeper host).

Tests

To Be Done

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide
  • I have added a changelog entry for my change

Planned follow-ups:

  • Perform more thorough testing of solrj-streaming with both Zookeeper-based and HTTP-based quorum configurations
  • Add unit tests for HTTP quorum support
  • Review and improve error messages and java-doc, where zookeeper still mentioned
  • Add documentation and changelog entry

Open Questions / Discussion Points

I would appreciate feedback on the following topics (also noted as TODOs in the code):

1. Mandatory solrCloud parameter

I made solrCloud required in all stream handlers (an error is thrown if it cannot be resolved from parameters or the default Zookeeper host).
This caused two tests to fail:

  • StreamExpressionTest#testCloudSolrStream

  • StreamExpressionTest#testStatsStream

These tests did not provide any valid host.
I fixed them by adding .withDefaultZkHost(cluster.getZkServer().getZkAddress())

Question:

  • Is this a test issue or a flaw in the implementation?

  • Can such a situation occur in real-world usage?

2. toExpression behavior and backward compatibility

Currently, toExpression reconstructs the expression and always includes solrCloud, even if the user originally provided the legacy zkHost parameter.
As a result, a user who runs an expression with zkHost and then calls explain() will see solrCloud in the output.

Question:

  • Should the original parameter name (zkHost) be preserved?
  • Or is it acceptable to normalize everything to solrCloud?

3. Potential issue with gather parameter
In:
org/apache/solr/client/solrj/io/graph/GatherNodesStream.java:392
there is the following line:
expression.addParameter(new StreamExpressionNamedParameter("gather", solrCloud));

Question:

  • Is this a bug?
  • Or is solrCloud (previously zkHost) actually intended to be passed as gather?

UPDATE: I looked into the context more deeply and yes, it looks like this is a defect that affects explain(). I fixed it as part of my pull request.

4. Inconsistent parameter handling
Some stream handlers use Map<String, String> for parameters, while others use ModifiableSolrParams.

Question:

Would it make sense to standardize on a single approach?

5. ScoreNodesStream limitation
ScoreNodesStream can only obtain zkHost / solrCloud via: factory.getDefaultZkHost();

Question:

  • Should this be improved as part of this PR?
  • Or would it be better to handle it in a separate issue?

@vyatkinv vyatkinv marked this pull request as draft April 22, 2026 11:15
@vyatkinv vyatkinv changed the title [WIP] SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" WIP: SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" Apr 22, 2026
@vyatkinv vyatkinv changed the title WIP: SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming" Apr 22, 2026
@vyatkinv vyatkinv force-pushed the SOLR-18130-solrj-streaming-solr-cloud-parameter branch from 27f4ae5 to c877f2a Compare April 22, 2026 13:36
@epugh
Copy link
Copy Markdown
Contributor

epugh commented Apr 22, 2026

Shouldn't solrCloud be connectionString? I was hoping down the road we could put more into the connection string like username password or other details that are needed....

@vyatkinv
Copy link
Copy Markdown
Contributor Author

vyatkinv commented Apr 23, 2026

Shouldn't solrCloud be connectionString? I was hoping down the road we could put more into the connection string like username password or other details that are needed....

@dsmiley previously suggested naming this parameter solrCloud in the jira ticket comments. However, it looks like there might be some disagreement here, so it would be good to reach a consensus before finalizing the name.

@epugh
Copy link
Copy Markdown
Contributor

epugh commented Apr 23, 2026

Shouldn't solrCloud be connectionString? I was hoping down the road we could put more into the connection string like username password or other details that are needed....

@dsmiley previously suggested naming this parameter solrCloud in the jira ticket comments. However, it looks like there might be some disagreement here, so it would be good to reach a consensus before finalizing the name.

Agreed... I believe the intent is to specify how you connect to your running Solr? In the first PR I believe it was connectionString which makes sense as "hey, I'm going to have to parse this thing, and it tells me how to connect". @dsmiley ?

Copy link
Copy Markdown
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here! Clearly this was a bit more involved than a rename ;-)

"connectionString" is somewhat long and it's ambiguous as to what we're even connecting to. But it's not bad. How about "solrCloudString"? If you don't like that, I'll capitulate and be satisfied with "connectionString".

Incorporating auth is out-of-scope but I'll say now that I'm suspicious that it makes sense to embed secret credentials into this. A conversation for another time.

I imagine there should be some ref-guide updates on this in addition to major-changes-in-solr-10.adoc.

I can see the potential of a follow-on issue to update CLIUtils.getZkHost (and thus CLI commands that call it) to instead resolve a SolrCloud connection string that is not necessarily ZK.

Answers to the PR questions:

  1. It's a test issue -- good catch. Glad you made resolving this mandatory. I wouldn't call this "mandatory SolrCloud parameter" since the param can be blank when it's used inside Solr, defaulting to the server's connection.
  2. Normalize; do not round-trip with "zkHost".
  3. IMO SolrParams should be used for "parameters" like this, not Map. Perhaps too much scope creep here but I leave that to you.
  4. Let's not add support for that to ScoreNodes now, albeit a single comment where we initialize it would be nice. Like "for now limited to the current cluster but we could expand support if needed".

I suggest that you change StreamFactory to not mention "ZK" whatsoever, instead using the name we choose (e.g. getDefaultCloudString). For backwards-compatibility, however, the older method names should exist as deprecated.

@@ -0,0 +1,7 @@
title: Parameter 'zkHost' renamed to 'solrCloud' in solrj-streaming. Parameter 'zkHost' is still supported.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful / fine but omits a bigger feature/point in this PR: this parameter now supports HTTP based SolrCloud connection strings (ZK access no longer required).

collectionName));
}
ModifiableSolrParams params =
getModifiableSolrParamsWithExclusions(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModifiableSolrParams is a specific impl of SolrParams. I don't think this method name should contain the word "Modifiable"; it's a detail.


public ShortestPathStream(
String zkHost,
String solrCloud,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use that nice record you made previously for the connection string instead of String? To be used not just right here but really everywhere in this PR that needs to refer to the connection. It would be so nice to have a real type vs String which is used for anything and everything, defeating the typed nature of Java.
If not, fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String can be replaced with record in private methods, but I'm not sure about public constructors. Even if they are not used within the project, they can still be used by consumers. We can declare them deprecated and introduce new ones with Record

Comment on lines +110 to 111
// we don't
// need to count it twice
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflow newline

Comment on lines +105 to 106
// don't
// need to count it twice
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflow newline

return shards;
}

public static String getSolrCloud(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I know I suggested calling the param "solrCloud" because in context to where a user types it (a streaming expression string), it seems a reasonable choice. But at a code level like here and all the var names, I find it... not so good. getCloudConnectionString would be better IMO and maybe used the record type you made.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I love this refactoring overall; it cuts down on repeated code.

return solrCloud;
}

public static ModifiableSolrParams getModifiableSolrParamsWithExclusions(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the name buildSolrParams or buildSolrParamsExcept.
"Modifiable" is overly specific to the SolrParams type.
"get" is overused in java... suggests we retrieve something when in fact we are creating something here.
Also IMO the varargs should be a Set that the caller can easily pass via Set.of

return mParams;
}

public static Map<String, String> getMapWithExclusions(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the above method uses MapSolrParams instead of ModifiableSolrParams, then we don't even need this version here since the (few?) callers using it can call MapSolrParams.getMap().

Copy link
Copy Markdown
Contributor Author

@vyatkinv vyatkinv Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModifiableSolrParams.getMap() returns Map<String, String[]> but we needs Map<String, String>

That's why I asked if it was possible to convert all streams to ModifiableSolrParams :)

but I'll think about it some more, maybe I can use generics or a common interface to make a single universal method


private void init(
String collectionName, TupleStream tupleSource, String zkHost, int updateBatchSize) {
String collectionName, TupleStream tupleSource, String solrCloud, int updateBatchSize) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

param order

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's happening here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these same streams (gatherNodes and nodes) are already added two lines above, and I removed the duplicates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants