Yang Sheng

myself.jpg

I am a Research Engineer at Z.ai, working on post-training methods for Vision-Language Models (VLMs), with a focus on reward modeling, reinforcement learning, and multimodal reasoning. As a core contributor, I have contributed to GLM-4.1V, GLM-4.5V, GLM-4.6V, GLM-5V-Turbo, and GLM-OCR. My research interests include scalable VLM post-training, model alignment, and robust multimodal understanding. I received my Master’s degree from Tsinghua University and my Bachelor’s degree from Huazhong University of Science and Technology (HUST).

selected publications

  1. GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
    V Team, Wenyi Hong, Xiaotao Gu, and 15 more authors
    arXiv preprint arXiv:2604.26752, 2026
  2. GLM-OCR Technical Report
    Shuaiqi Duan, Yadong Xue, Weihan Wang, and 20 more authors
    arXiv preprint arXiv:2603.10910, 2026
  3. GLM-4.1 V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
    Wenyi Hong, Wenmeng Yu, Xiaotao Gu, and 8 more authors
    arXiv preprint arXiv:2507.01006, 2025
  4. Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers
    Sheng Yang, Jiawang Bai, Kuofeng Gao, and 3 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
  5. Backdoor defense via suppressing model shortcuts
    Sheng Yang, Yiming Li, Yong Jiang, and 1 more author
    In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023