Feat/mth ids paper reproduction#10
Open
limaraujo wants to merge 22 commits into
Open
Conversation
- Added phase 2: Sampling with MiniBatchKMeans and Label Encoding. - Added phase 3: Stratified train-test split on the sampled dataset. - Added phase 4: Feature selection using Information Gain and FCBF. - Added phase 5: SMOTE oversampling on the training set. - Added phase 6: Training of supervised models (XGBoost, RF, DT, ET) with optional hyperparameter optimization and stacking. - Added phase 7: Generation of anomaly datasets with binary labels. - Added phase 8: Normalization, feature selection, and Kernel PCA for anomaly datasets. - Added phase 9: SMOTE and clustering for anomaly detection using CL-k-means. - Created a run_all script to orchestrate the entire pipeline. - Added requirements.txt for necessary dependencies.
Co-authored-by: Copilot <copilot@github.com>
…s, fixed run_all.py
…atasets into a single .csv file. Note: download the datasets and extract them to `/data` directory before running the script, and rename them by prepending "CAN_" in their names.
…ity. Modularize phases 1-2 and add core modules (clustering, biased B1/B2, validation). Extend anomaly branch with BO-GP for k (phase 10), biased trees with auto B1 selection (phase 11), and leave-one-attack-out runner (phase 12). Add experiment_runner, paper-protocol flags, and methodological reproduction docs. Co-authored-by: Cursor <cursoragent@cursor.com>
Supervised defaults follow MTH_IDS_IoTJ.ipynb: k-means sampling, IG+FCBF with re-split, SMOTE {2,4}->1000, HPO on hold-out, and XGBoost stacking. Skip orphan phase 3, fix Z-score for pandas string dtypes, and keep paper/LOAO options available via config flags.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ements. - Added validation checks for zero-day labels in `validate_loao_partition`. - Updated `main` functions in multiple phases to include label counts and row statistics for training and testing datasets. - Integrated logging of LOAO partition details across phases 10, 11, and 12 for better traceability. Co-authored-by: Cursor <cursoragent@cursor.com>
…de files - Added entries for Python bytecode files (*.py[cod], *.pyo) and cache directories (__pycache__, .pytest_cache/, .mypy_cache/). - Retained existing data directory exclusion and removed specific __pycache__ entry for consistency.
Reorganize modules for clearer separation of ML logic, I/O, and phase scripts; add paper protocol support, refresh docs, and remove tracked bytecode. Co-authored-by: Cursor <cursoragent@cursor.com>
…ture and usage instructions. Added commands for generating merged CSV profiles and clarified pipeline artifact organization.
…eline. - Enhanced README with detailed dataset paths and commands for generating merged CSV profiles. - Clarified logging structure for supervised runs, including the creation of `supervised_run.log` to capture phase execution details. - Updated architecture and phase documentation to reflect changes in logging and phase descriptions. - Adjusted minority label definitions to include ultra-rare attacks in the fine profile. These changes improve clarity and usability for users working with the MTH-IDS pipeline.
…functionality. - Updated `GUIA_ARQUITETURA_MTH_IDS.md` to include new scripts for global anomaly detection and evaluation phases. - Added `MERGED_VS_FINE_E_TABELAS.md` as a comprehensive guide for selecting label profiles and understanding table outputs. - Introduced `run_global_anomaly.py` and `run_eval.py` scripts for executing global anomaly detection and full system evaluation. - Revised existing documentation to clarify the roles of merged and fine profiles, including updates to the README and other related documents. These changes improve the usability and clarity of the MTH-IDS pipeline, facilitating better understanding and execution of the anomaly detection processes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.