Yang Sheng
I am a Research Engineer at Z.ai, working on post-training methods for Vision-Language Models (VLMs), with a focus on reward modeling, reinforcement learning, and multimodal reasoning. As a core contributor, I have contributed to GLM-4.1V, GLM-4.5V, GLM-4.6V, GLM-5V-Turbo, and GLM-OCR. My research interests include scalable VLM post-training, model alignment, and robust multimodal understanding. I received my Master’s degree from Tsinghua University and my Bachelor’s degree from Huazhong University of Science and Technology (HUST).
selected publications
- GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal AgentsarXiv preprint arXiv:2604.26752, 2026
-
- GLM-4.1 V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement LearningarXiv preprint arXiv:2507.01006, 2025
- Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomersIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
- Backdoor defense via suppressing model shortcutsIn ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023