Skip to content

Host equality/hash depend on mutable endpoint #867

@dkropachev

Description

@dkropachev

Problem

Host.__eq__ and Host.__hash__ currently use Host.endpoint as identity:

  • Host == Host compares endpoints.
  • Host == "address" is supported for backward compatibility.
  • hash(Host) delegates to hash(endpoint).

This is unsafe because Host.endpoint can be mutated during topology refresh/IP-change handling. It also violates Python's hash/equality contract for the string-compatibility path: host == "127.0.0.1" can be true while hash(host) != hash("127.0.0.1").

Relevant code:

  • cassandra/pool.py: Host.__eq__, Host.__hash__, Host.__lt__
  • cassandra/cluster.py: control connection mutates host.endpoint when a node keeps the same host_id but changes endpoint
  • cassandra/cluster.py: Session._pools is keyed by Host
  • cassandra/policies.py: load-balancing policies store hosts in frozenset/tuples and use equality for dedup/removal

Impact

Internal structures can lose or duplicate entries after endpoint changes:

  • Session._pools lookup/removal can fail after endpoint mutation.
  • Query execution can report Host has been marked down or removed even though the host object still exists.
  • Load-balancing policy host sets can fail to remove/re-add the intended host.
  • Token-aware routing and replica filtering use Host equality for membership checks.
  • Metrics/Insights consume pool state keyed by Host.
  • Custom load-balancing policies may rely on legacy Host == address behavior.

Reproduction

import uuid
from cassandra.pool import Host
from cassandra.policies import SimpleConvictionPolicy
from cassandra.connection import DefaultEndPoint

hosts = [
    Host(DefaultEndPoint("127.0.0.%d" % i), SimpleConvictionPolicy, host_id=uuid.uuid4())
    for i in range(64)
]
target = hosts[32]
pools = {host: "pool" for host in hosts}

target.endpoint = DefaultEndPoint("10.0.0.250")

assert pools.get(target) is None
assert target not in pools

String compatibility also violates the hash contract:

host = Host("127.0.0.1", SimpleConvictionPolicy, host_id=uuid.uuid4())

assert host == "127.0.0.1"
assert "127.0.0.1" == host
assert host not in {"127.0.0.1"}
assert "127.0.0.1" not in {host}

Proposed Direction

Define stable Host identity before removing topology mutability:

  • Use host_id as Host identity for __eq__ and __hash__, or use object identity internally and expose explicit host-id comparison.
  • Stop using endpoint/address equality for Host.__eq__; replace with explicit checks such as host.address == addr or metadata.get_host(addr).
  • Keep endpoint lookup in metadata via _host_id_by_endpoint, but update it through one controlled topology-update path.
  • Keep health mutable (is_up, conviction policy, reconnection handler) while making topology fields replaceable/immutable.
  • Revisit __lt__; endpoint-only ordering is inconsistent if equality becomes host-id based.

Compatibility Risk

This breaks tests and possible user code that expect:

  • two Host instances with the same endpoint but different host_id to be equal;
  • Host == "address" to be true;
  • sets of Host to deduplicate by endpoint.

This likely needs a major-version change or a deprecation phase.

Acceptance Criteria

  • hash(host) remains stable across endpoint/topology updates.
  • Equal objects have equal hashes.
  • Session pool lookup/removal works after IP change.
  • Load-balancing policies remove/re-add the intended host after endpoint or dc/rack changes.
  • Tests are updated for new identity semantics.
  • Generated C artifacts are updated for cassandra/pool.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions