Implementation of DBSCAN model#75
Implementation of DBSCAN model#75Kelang-Tian wants to merge 9 commits intosql-machine-learning:developfrom
Conversation
tests/test_dbscan.py
Outdated
| iris_target = iris.target | ||
|
|
||
|
|
||
| def check_model_exist(path): |
There was a problem hiding this comment.
It seems that this function is not used?
|
|
||
| install-requirements: | ||
| pip install -U -e . | ||
| pip install --default-timeout=1000 -U -e . |
There was a problem hiding this comment.
Why increasing the timeout threshold ?
There was a problem hiding this comment.
Timeout occurred while installing the packages because of network instability.
| import pandas as pd | ||
| from sklearn import datasets, metrics | ||
|
|
||
| def optimizer(): |
There was a problem hiding this comment.
It seems that it's not a NN model, should we remove this function?
There was a problem hiding this comment.
Yes, there is no need to set optimizer.
sqlflow_models/dbscan.py
Outdated
|
|
||
| return self | ||
|
|
||
| def _split_dataset(self, dataset): |
There was a problem hiding this comment.
Do we actually need this function?
There was a problem hiding this comment.
Replace this function with _read_Dataset_data, which split tf.dataset type data into features and labels(If it exists).
sqlflow_models/dbscan.py
Outdated
| cluster_labels[self.clusters[i][j]] = i | ||
| return cluster_labels | ||
|
|
||
| def fit(self, X): |
There was a problem hiding this comment.
Instead of overriding fit function, we can implement sqlflow_train_loop to define the custom train loop logic. You may find that sqlflow_train_loop is not the standard API of Keras model, it's just let SQLFlow runtime to know to run the custom train loop.
There was a problem hiding this comment.
Enables sqlflow_train_loop function to handle tf.dataset type of data, because sqlflow server invokes tf.dataset.
https://github.com/sql-machine-learning/sqlflow/blob/483b8676cf93f373d5073d84b0bee311bb122012/python/runtime/tensorflow/input_fn.py#L72
sqlflow_models/dbscan.py
Outdated
| purity_score(y_df, self.labels_)) | ||
| ''' | ||
| if __name__ == '__main__': | ||
| from sklearn.datasets.samples_generator import make_blobs |
There was a problem hiding this comment.
Can we remove the comment code and testing this model in the test_db_scan.py? A referenced test script: https://github.com/sql-machine-learning/models/blob/develop/tests/test_arima_with_stl_decomposition.py
There was a problem hiding this comment.
Comment code removed and rename the test_db_scan.py.
Implementation of the clustering model.
The test process will be submitted next time.