Publications
- CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence — Advances in Neural Information Processing Systems (NeurIPS 2024), vol. 37, Spotlight — PDF
- Sphinx: Visual Perception and Reasoning Gym — Multimodal Algorithmic Reasoning (MAR) Workshop at NeurIPS 2025 — PDF
- Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning — MATH-AI Workshop at NeurIPS 2025 — PDF
- ADAPT: A Pseudo-labeling Approach to Combat Concept Drift in Malware Detection — Proceedings of the 28th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2025) — PDF
- R+R: Revisiting Static Feature-Based Android Malware Detection using Machine Learning — Annual Computer Security Applications Conference (ACSAC 2025) — PDF
- AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence — WAITI 2025 — PDF
- SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory — Annual Computer Security Applications Conference (ACSAC 2024) — PDF
- Looking beyond IoCs: Automatically Extracting Attack Patterns from External CTI — Proceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2023), pp. 92–108 — PDF
- Assessing Effective Token Length of Multimodal Models for Text-to-Image Retrieval — Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2025)
- Actionable Cyber Threat Intelligence Using Knowledge Graphs and Large Language Models — 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
- PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis — 9th IEEE European Symposium on Security and Privacy (EuroS&P 2024)
- Towards Understanding Self-play for LLM Reasoning — MATH-AI Workshop at NeurIPS 2025 — PDF
- Punctuation Restoration using Transformer Models for High- and Low-Resource Languages — Proceedings of the 6th Workshop on Noisy User-generated Text (W-NUT 2020) @ EMNLP
- Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response — International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2020)
- MEDIC: A Multi-task Learning Dataset for Disaster Image Classification — Neural Computing and Applications
- Lightweight CNN for Robust Voice Activity Detection — International Conference on Speech and Computer (SPECOM 2020)