# Assignment 4 — Discussion (Jonathan Marien) ## What challenges came up when selecting features? - Handling Infinity/-Infinity in rate-based fields (e.g., Flow Bytes/s) required careful imputation or filtering. - High collinearity across derived counters (forward/backward bytes/packets, rates) made rankings unstable across methods. - Class imbalance between BENIGN and specific attacks (e.g., PortScan) required stratified sampling to keep signal. - Some features have narrow ranges or sentinel values (e.g., -1) that complicate chi-square; mutual information was more robust. ## Would your features work well at a place like Bell or CSE? - Yes, the top features (ports, byte/packet rates, flow duration, flag counts) align with scalable, interpretable telemetry used in large SOCs. - They provide quick triage cues for reconnaissance (scans), brute-force/abuse (abnormal ports), and volumetric anomalies (bytes/s). - Feature simplicity supports explainability in escalations and runbooks. ## How would adversaries try to bypass your feature-based model? - Slow-and-low tactics to stay near baseline thresholds (e.g., spreading scans over hours to reduce rate spikes). - Using common service ports (80/443/53) to blend in and evade port-based heuristics. - Tuning packet sizes/timings to mimic benign distributions. - Encrypted tunnels that obfuscate payloads while keeping metadata similar to benign. ## Notes on Week 10 emphasis - Prioritize interpretable features and lean models to reduce alert fatigue. - Report macro-averaged metrics for multi-class settings and include confusion matrices to visualize error trade-offs.