# Assignment 4 — Discussion (Jonathan Marien)
## What challenges came up when selecting features?
- Handling Infinity/-Infinity in rate-based fields (e.g., Flow Bytes/s) required careful imputation or filtering.
- High collinearity across derived counters (forward/backward bytes/packets, rates) made rankings unstable across methods.
- Class imbalance between BENIGN and specific attacks (e.g., PortScan) required stratified sampling to keep signal.
- Some features have narrow ranges or sentinel values (e.g., -1) that complicate chi-square; mutual information was more robust.
## Would your features work well at a place like Bell or CSE?
- Yes, the top features (ports, byte/packet rates, flow duration, flag counts) align with scalable, interpretable telemetry used in large SOCs.
- They provide quick triage cues for reconnaissance (scans), brute-force/abuse (abnormal ports), and volumetric anomalies (bytes/s).
- Feature simplicity supports explainability in escalations and runbooks.
## How would adversaries try to bypass your feature-based model?
- Slow-and-low tactics to stay near baseline thresholds (e.g., spreading scans over hours to reduce rate spikes).
- Using common service ports (80/443/53) to blend in and evade port-based heuristics.
- Tuning packet sizes/timings to mimic benign distributions.
- Encrypted tunnels that obfuscate payloads while keeping metadata similar to benign.
## Notes on Week 10 emphasis
- Prioritize interpretable features and lean models to reduce alert fatigue.
- Report macro-averaged metrics for multi-class settings and include confusion matrices to visualize error trade-offs.