Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new dist_matrix classmethod to the ECTResult class for computing distance matrices between multiple ECT results, and updates the tutorial to demonstrate its usage. The PR also changes the default distance metric from "cityblock" to "frobenius" in the dist method and exports ECTResult in the package's __init__.py.
Changes:
- Added
dist_matrixclassmethod to compute pairwise distance matrices between ECT results - Changed default distance metric from "cityblock" to "frobenius" in the
distmethod - Updated tutorial (example_matisse.ipynb) to demonstrate the new
dist_matrixfunctionality - Exported
ECTResultin__init__.pyfor easier imports - Incremented version from 1.2.3 to 1.2.4
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/ect/results.py | Added dist_matrix classmethod, changed default metric to frobenius, implemented frobenius distance computation |
| tests/test_ect_result.py | Updated tests to reflect the default metric change from cityblock to frobenius |
| src/ect/init.py | Exported ECTResult class |
| pyproject.toml | Incremented version from 1.2.3 to 1.2.4 |
Comments suppressed due to low confidence (4)
src/ect/results.py:402
- The implementation of the Frobenius distance for the dist_matrix method is inefficient. For each row i, it calls results[i].dist(results, metric="frobenius"), which computes distances to all results including itself. This results in O(n²) distance computations even though a distance matrix is symmetric. A more efficient approach would be to compute only the upper triangular portion and mirror it, or use vectorized operations on the stacked data similar to how the string metrics are handled with pdist.
if isinstance(metric, str) and metric.lower() in ("frobenius", "fro"):
return np.vstack([results[i].dist(results, metric="frobenius") for i in range(len(results))])
src/ect/results.py:411
- The new dist_matrix classmethod lacks test coverage. No tests have been added to verify its functionality, including edge cases such as empty input, single element lists, shape mismatches, different metrics (frobenius, cityblock, euclidean, custom callables), and that it produces a proper symmetric distance matrix with zeros on the diagonal. Given that other methods in this file have comprehensive test coverage in test_ect_result.py, this new method should follow the same pattern.
@classmethod
def dist_matrix(
cls,
results: List["ECTResult"],
metric: Union[str, Callable] = "frobenius",
**kwargs,
) -> np.ndarray:
if not results:
return np.empty((0, 0), dtype=np.float64)
shape0 = results[0].shape
for i, r in enumerate(results):
if r.shape != shape0:
raise ValueError(f"Shape mismatch at index {i}: {shape0} vs {r.shape}")
if isinstance(metric, str) and metric.lower() in ("frobenius", "fro"):
return np.vstack([results[i].dist(results, metric="frobenius") for i in range(len(results))])
if isinstance(metric, str):
X = np.stack([np.asarray(r, dtype=np.float64).ravel() for r in results], axis=0)
try:
return squareform(pdist(X, metric=metric, **kwargs))
except TypeError:
return cdist(X, X, metric=metric, **kwargs)
return np.vstack([results[i].dist(results, metric=metric, **kwargs) for i in range(len(results))])
src/ect/results.py:411
- The Frobenius distance implementation for callable metrics (line 411) may have inconsistent behavior. When the metric is a callable, it calls results[i].dist(results, metric=metric, **kwargs) for each i, which will pass through to the cdist code path that uses raveled 1D arrays. However, for Frobenius distance specifically, you're computing the matrix norm on 2D arrays (lines 368-375). This means custom callable distance functions will operate on flattened data while Frobenius operates on the full matrix structure. Consider whether this is the intended behavior or if there should be a note in the documentation about this distinction.
return np.vstack([results[i].dist(results, metric=metric, **kwargs) for i in range(len(results))])
src/ect/results.py:411
- The dist_matrix method is missing a docstring. This is a public classmethod that should have comprehensive documentation explaining its purpose, parameters, return value, and usage examples. The documentation should follow the same style as the dist method above, including details about the metric parameter, **kwargs, error conditions, and example usage.
@classmethod
def dist_matrix(
cls,
results: List["ECTResult"],
metric: Union[str, Callable] = "frobenius",
**kwargs,
) -> np.ndarray:
if not results:
return np.empty((0, 0), dtype=np.float64)
shape0 = results[0].shape
for i, r in enumerate(results):
if r.shape != shape0:
raise ValueError(f"Shape mismatch at index {i}: {shape0} vs {r.shape}")
if isinstance(metric, str) and metric.lower() in ("frobenius", "fro"):
return np.vstack([results[i].dist(results, metric="frobenius") for i in range(len(results))])
if isinstance(metric, str):
X = np.stack([np.asarray(r, dtype=np.float64).ravel() for r in results], axis=0)
try:
return squareform(pdist(X, metric=metric, **kwargs))
except TypeError:
return cdist(X, X, metric=metric, **kwargs)
return np.vstack([results[i].dist(results, metric=metric, **kwargs) for i in range(len(results))])
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self, | ||
| other: Union["ECTResult", List["ECTResult"]], | ||
| metric: Union[str, Callable] = "cityblock", | ||
| metric: Union[str, Callable] = "frobenius", |
There was a problem hiding this comment.
Changing the default metric from "cityblock" to "frobenius" is a breaking change that will alter results for any existing code that calls the dist method without explicitly specifying a metric. This could break existing workflows, analyses, or saved results that depend on the previous default behavior. Consider whether this breaking change is documented in release notes or whether backward compatibility should be maintained (e.g., by keeping cityblock as default or adding a deprecation warning).
| metric: Union[str, Callable] = "frobenius", | |
| metric: Union[str, Callable] = "cityblock", |
Pull Request
Summary
add dist matrix function and update tutorial
Description
uses previous dist command to automatically create a distance matrix
Motivation and Context
protein stuff!
How has this been tested?
ran make tests!
Types of changes
Checklist
make clean)pyproject.tomlfile (if applicable).make tests)pyproject.toml.Additional Notes
Screenshots (if applicable)