SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming"#4320
SOLR-18130: rename parameter "zkHost" to "solrCloud" in "solrj-streaming"#4320vyatkinv wants to merge 3 commits intoapache:mainfrom
Conversation
27f4ae5 to
c877f2a
Compare
|
Shouldn't |
@dsmiley previously suggested naming this parameter |
Agreed... I believe the intent is to specify how you connect to your running Solr? In the first PR I believe it was |
dsmiley
left a comment
There was a problem hiding this comment.
Great work here! Clearly this was a bit more involved than a rename ;-)
"connectionString" is somewhat long and it's ambiguous as to what we're even connecting to. But it's not bad. How about "solrCloudString"? If you don't like that, I'll capitulate and be satisfied with "connectionString".
Incorporating auth is out-of-scope but I'll say now that I'm suspicious that it makes sense to embed secret credentials into this. A conversation for another time.
I imagine there should be some ref-guide updates on this in addition to major-changes-in-solr-10.adoc.
I can see the potential of a follow-on issue to update CLIUtils.getZkHost (and thus CLI commands that call it) to instead resolve a SolrCloud connection string that is not necessarily ZK.
Answers to the PR questions:
- It's a test issue -- good catch. Glad you made resolving this mandatory. I wouldn't call this "mandatory SolrCloud parameter" since the param can be blank when it's used inside Solr, defaulting to the server's connection.
- Normalize; do not round-trip with "zkHost".
- IMO SolrParams should be used for "parameters" like this, not Map. Perhaps too much scope creep here but I leave that to you.
- Let's not add support for that to ScoreNodes now, albeit a single comment where we initialize it would be nice. Like "for now limited to the current cluster but we could expand support if needed".
I suggest that you change StreamFactory to not mention "ZK" whatsoever, instead using the name we choose (e.g. getDefaultCloudString). For backwards-compatibility, however, the older method names should exist as deprecated.
| @@ -0,0 +1,7 @@ | |||
| title: Parameter 'zkHost' renamed to 'solrCloud' in solrj-streaming. Parameter 'zkHost' is still supported. | |||
There was a problem hiding this comment.
Useful / fine but omits a bigger feature/point in this PR: this parameter now supports HTTP based SolrCloud connection strings (ZK access no longer required).
| collectionName)); | ||
| } | ||
| ModifiableSolrParams params = | ||
| getModifiableSolrParamsWithExclusions( |
There was a problem hiding this comment.
ModifiableSolrParams is a specific impl of SolrParams. I don't think this method name should contain the word "Modifiable"; it's a detail.
|
|
||
| public ShortestPathStream( | ||
| String zkHost, | ||
| String solrCloud, |
There was a problem hiding this comment.
Can we use that nice record you made previously for the connection string instead of String? To be used not just right here but really everywhere in this PR that needs to refer to the connection. It would be so nice to have a real type vs String which is used for anything and everything, defeating the typed nature of Java.
If not, fine.
There was a problem hiding this comment.
String can be replaced with record in private methods, but I'm not sure about public constructors. Even if they are not used within the project, they can still be used by consumers. We can declare them deprecated and introduce new ones with Record
| // we don't | ||
| // need to count it twice |
| // don't | ||
| // need to count it twice |
| return shards; | ||
| } | ||
|
|
||
| public static String getSolrCloud( |
There was a problem hiding this comment.
So I know I suggested calling the param "solrCloud" because in context to where a user types it (a streaming expression string), it seems a reasonable choice. But at a code level like here and all the var names, I find it... not so good. getCloudConnectionString would be better IMO and maybe used the record type you made.
There was a problem hiding this comment.
BTW I love this refactoring overall; it cuts down on repeated code.
| return solrCloud; | ||
| } | ||
|
|
||
| public static ModifiableSolrParams getModifiableSolrParamsWithExclusions( |
There was a problem hiding this comment.
I suggest the name buildSolrParams or buildSolrParamsExcept.
"Modifiable" is overly specific to the SolrParams type.
"get" is overused in java... suggests we retrieve something when in fact we are creating something here.
Also IMO the varargs should be a Set that the caller can easily pass via Set.of
| return mParams; | ||
| } | ||
|
|
||
| public static Map<String, String> getMapWithExclusions( |
There was a problem hiding this comment.
if the above method uses MapSolrParams instead of ModifiableSolrParams, then we don't even need this version here since the (few?) callers using it can call MapSolrParams.getMap().
There was a problem hiding this comment.
ModifiableSolrParams.getMap() returns Map<String, String[]> but we needs Map<String, String>
That's why I asked if it was possible to convert all streams to ModifiableSolrParams :)
but I'll think about it some more, maybe I can use generics or a common interface to make a single universal method
|
|
||
| private void init( | ||
| String collectionName, TupleStream tupleSource, String zkHost, int updateBatchSize) { | ||
| String collectionName, TupleStream tupleSource, String solrCloud, int updateBatchSize) { |
There was a problem hiding this comment.
these same streams (gatherNodes and nodes) are already added two lines above, and I removed the duplicates
https://issues.apache.org/jira/browse/SOLR-18130
Description
This PR updates the solrj-streaming module by replacing all usages of zkHost with solrCloud to enable support for HTTP-based quorum configurations.
Solution
Parameters, fields and variables in solrj-streaming, named as
zkHostrenamed tosolrCloudFor backward compatibility, specifying zkHost is still supported.
A shared method has been introduced in an abstract class to resolve the effective solrCloud value using a priority-based approach (e.g., explicit parameter → legacy zkHost → default Zookeeper host).
Tests
To Be Done
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.Planned follow-ups:
Open Questions / Discussion Points
I would appreciate feedback on the following topics (also noted as TODOs in the code):
1. Mandatory solrCloud parameter
I made solrCloud required in all stream handlers (an error is thrown if it cannot be resolved from parameters or the default Zookeeper host).
This caused two tests to fail:
StreamExpressionTest#testCloudSolrStreamStreamExpressionTest#testStatsStreamThese tests did not provide any valid host.
I fixed them by adding
.withDefaultZkHost(cluster.getZkServer().getZkAddress())Question:
Is this a test issue or a flaw in the implementation?
Can such a situation occur in real-world usage?
2. toExpression behavior and backward compatibility
Currently, toExpression reconstructs the expression and always includes solrCloud, even if the user originally provided the legacy zkHost parameter.
As a result, a user who runs an expression with zkHost and then calls explain() will see solrCloud in the output.
Question:
3. Potential issue with gather parameter
In:
org/apache/solr/client/solrj/io/graph/GatherNodesStream.java:392there is the following line:
expression.addParameter(new StreamExpressionNamedParameter("gather", solrCloud));Question:
UPDATE: I looked into the context more deeply and yes, it looks like this is a defect that affects explain(). I fixed it as part of my pull request.
4. Inconsistent parameter handling
Some stream handlers use Map<String, String> for parameters, while others use ModifiableSolrParams.
Question:
Would it make sense to standardize on a single approach?
5. ScoreNodesStream limitation
ScoreNodesStreamcan only obtain zkHost / solrCloud via:factory.getDefaultZkHost();Question: