Skip to content
View x-zheng16's full-sized avatar

Highlights

  • Pro

Organizations

@CongGroup @Tsinghua-Space-Robot-Learning-Group

Block or report x-zheng16

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
x-zheng16/README.md

header

Research Interests

Website  Google Scholar  Email  Profile Views


About

I am a Research Assistant Professor at HKAI-Sci, City University of Hong Kong. I also work closely with Prof. Xingjun Ma at Fudan University. My research develops robust and efficient RL algorithms for trustworthy decision-making in real-world systems, with a focus on red/blue teaming for LLMs, vision-language models, and embodied agents.

Ph.D.Computer Science, City University of Hong Kong (2024)
M.S.Control Science & Engineering, Tsinghua University (2019)
B.S.Automation & Mathematics (Dual Degree), Beihang University (2016)

Featured Research

Repository Description Stars
Awesome-Embodied-AI-Safety Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses (400+ Papers)
JustAsk Curious Code Agents Reveal System Prompts in Frontier LLMs
System-Prompt-Open Open database of system prompts extracted from frontier LLMs
OpenRedRL OpenRedRL: A Light-Weight Benchmark for RL-Based Red Teaming
ISC-Bench ISC-Bench: Internal Safety Collapse in Frontier LLMs

Selected Publications

Date Paper Venue
2026.03 Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses GitHub Preprint
2026.03 RedRFT: A Light-Weight Benchmark for RL-Based Red Teaming FCS
2026.02 GenBreak: Red Teaming Text-to-Image Generators Using LLMs CVPR
2026.01 Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs arXiv Preprint
2026.01 Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety FnT P&S
2025.01 BlueSuffix: Reinforced Blue Teaming for VLMs Against Jailbreak Attacks ICLR
2024.12 CALM: Curiosity-Driven Auditing for Large Language Models AAAI
2024.04 Constrained Intrinsic Motivation for Reinforcement Learning IJCAI
2024.03 Toward Evaluating Robustness of RL with Adversarial Policy DSN
2020.06 Clean-Label Backdoor Attacks on Video Recognition Models CVPR

Full list on Google Scholar


GitHub Stats

36.4B total tokens

GitHub Streak

Claude Code usage heatmap

GitHub contribution heatmap


footer

Pinned Loading

  1. Awesome-Embodied-AI-Safety Awesome-Embodied-AI-Safety Public

    Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 400+ Papers | Perception, Cognition, Planning, Interaction, Agentic System

    65

  2. System-Prompt-Open System-Prompt-Open Public

    Open database of system prompts extracted from frontier LLMs using JustAsk

    HTML 30 4

  3. JustAsk JustAsk Public

    JustAsk: Curious Code Agents Reveal System Prompts in Frontier LLMs | Verified on Claude Code | Autoresearch for System Prompt Extraction

    Python 46 21

  4. OpenRedRL OpenRedRL Public

    [FCS] OpenRedRL: A Light-Weight Benchmark for RL-Based Red Teaming

    Python 6 1

  5. wuyoscar/ISC-Bench wuyoscar/ISC-Bench Public

    Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

    Python 794 125