Skip to content

Feat/mth ids paper reproduction#10

Open
limaraujo wants to merge 22 commits into
Western-OC2-Lab:mainfrom
limaraujo:feat/mth-ids-paper-reproduction
Open

Feat/mth ids paper reproduction#10
limaraujo wants to merge 22 commits into
Western-OC2-Lab:mainfrom
limaraujo:feat/mth-ids-paper-reproduction

Conversation

@limaraujo

Copy link
Copy Markdown

No description provided.

limaraujo and others added 22 commits May 18, 2026 11:28
- Added phase 2: Sampling with MiniBatchKMeans and Label Encoding.
- Added phase 3: Stratified train-test split on the sampled dataset.
- Added phase 4: Feature selection using Information Gain and FCBF.
- Added phase 5: SMOTE oversampling on the training set.
- Added phase 6: Training of supervised models (XGBoost, RF, DT, ET) with optional hyperparameter optimization and stacking.
- Added phase 7: Generation of anomaly datasets with binary labels.
- Added phase 8: Normalization, feature selection, and Kernel PCA for anomaly datasets.
- Added phase 9: SMOTE and clustering for anomaly detection using CL-k-means.
- Created a run_all script to orchestrate the entire pipeline.
- Added requirements.txt for necessary dependencies.
Co-authored-by: Copilot <copilot@github.com>
…atasets into a single .csv file. Note: download the datasets and extract them to `/data` directory before running the script, and rename them by prepending "CAN_" in their names.
…ity.

Modularize phases 1-2 and add core modules (clustering, biased B1/B2, validation).
Extend anomaly branch with BO-GP for k (phase 10), biased trees with auto B1 selection (phase 11), and leave-one-attack-out runner (phase 12).
Add experiment_runner, paper-protocol flags, and methodological reproduction docs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Supervised defaults follow MTH_IDS_IoTJ.ipynb: k-means sampling, IG+FCBF with re-split, SMOTE {2,4}->1000, HPO on hold-out, and XGBoost stacking. Skip orphan phase 3, fix Z-score for pandas string dtypes, and keep paper/LOAO options available via config flags.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ements.

- Added validation checks for zero-day labels in `validate_loao_partition`.
- Updated `main` functions in multiple phases to include label counts and row statistics for training and testing datasets.
- Integrated logging of LOAO partition details across phases 10, 11, and 12 for better traceability.

Co-authored-by: Cursor <cursoragent@cursor.com>
…de files

- Added entries for Python bytecode files (*.py[cod], *.pyo) and cache directories (__pycache__, .pytest_cache/, .mypy_cache/).
- Retained existing data directory exclusion and removed specific __pycache__ entry for consistency.
Reorganize modules for clearer separation of ML logic, I/O, and phase scripts; add paper protocol support, refresh docs, and remove tracked bytecode.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ture and usage instructions. Added commands for generating merged CSV profiles and clarified pipeline artifact organization.
…eline.

- Enhanced README with detailed dataset paths and commands for generating merged CSV profiles.
- Clarified logging structure for supervised runs, including the creation of `supervised_run.log` to capture phase execution details.
- Updated architecture and phase documentation to reflect changes in logging and phase descriptions.
- Adjusted minority label definitions to include ultra-rare attacks in the fine profile.

These changes improve clarity and usability for users working with the MTH-IDS pipeline.
…functionality.

- Updated `GUIA_ARQUITETURA_MTH_IDS.md` to include new scripts for global anomaly detection and evaluation phases.
- Added `MERGED_VS_FINE_E_TABELAS.md` as a comprehensive guide for selecting label profiles and understanding table outputs.
- Introduced `run_global_anomaly.py` and `run_eval.py` scripts for executing global anomaly detection and full system evaluation.
- Revised existing documentation to clarify the roles of merged and fine profiles, including updates to the README and other related documents.

These changes improve the usability and clarity of the MTH-IDS pipeline, facilitating better understanding and execution of the anomaly detection processes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants