I am a Research Assistant Professor at HKAI-Sci, City University of Hong Kong. I also work closely with Prof. Xingjun Ma at Fudan University. My research develops robust and efficient RL algorithms for trustworthy decision-making in real-world systems, with a focus on red/blue teaming for LLMs, vision-language models, and embodied agents.
| Ph.D. | Computer Science, City University of Hong Kong (2024) |
| M.S. | Control Science & Engineering, Tsinghua University (2019) |
| B.S. | Automation & Mathematics (Dual Degree), Beihang University (2016) |
| Repository | Description | Stars |
|---|---|---|
| Awesome-Embodied-AI-Safety | Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses (400+ Papers) | |
| JustAsk | Curious Code Agents Reveal System Prompts in Frontier LLMs | |
| System-Prompt-Open | Open database of system prompts extracted from frontier LLMs | |
| OpenRedRL | OpenRedRL: A Light-Weight Benchmark for RL-Based Red Teaming | |
| ISC-Bench | ISC-Bench: Internal Safety Collapse in Frontier LLMs |
| Date | Paper | Venue |
|---|---|---|
| 2026.03 | Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | GitHub Preprint |
| 2026.03 | RedRFT: A Light-Weight Benchmark for RL-Based Red Teaming | FCS |
| 2026.02 | GenBreak: Red Teaming Text-to-Image Generators Using LLMs | CVPR |
| 2026.01 | Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs | arXiv Preprint |
| 2026.01 | Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety | FnT P&S |
| 2025.01 | BlueSuffix: Reinforced Blue Teaming for VLMs Against Jailbreak Attacks | ICLR |
| 2024.12 | CALM: Curiosity-Driven Auditing for Large Language Models | AAAI |
| 2024.04 | Constrained Intrinsic Motivation for Reinforcement Learning | IJCAI |
| 2024.03 | Toward Evaluating Robustness of RL with Adversarial Policy | DSN |
| 2020.06 | Clean-Label Backdoor Attacks on Video Recognition Models | CVPR |


