Publications | Xiangyue Zhang (章湘粤) | Embodied AI, Motion Generation

All publications

Open-Source Systems

Technical Report

Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring

Xiangyue Zhang

arXiv preprint arXiv:2604.05854, 2026

arXiv Website Code Architecture Guide

We present Deep Researcher Agent, an open-source framework that enables large language model agents to autonomously conduct deep learning experiments around the clock, spanning the full lifecycle from hypothesis formation and code implementation to result analysis and iterative refinement.

The paper introduces zero-cost monitoring during training, a two-tier constant-size memory capped at roughly 5K characters, and a minimal-toolset leader-worker architecture for lower token overhead in long-running experiment automation.

2026

🔥 ECCV 2026

StreamTalk: Streaming Co-Speech Gesture Generation with Key-Pose Anchoring

Xiangyue Zhang, Jianfang Li, Jiaxu Zhang, Kaixing Yang, and Steven Hoi

European Conference on Computer Vision (ECCV), 2026

arXiv Coming Soon Bib Code Coming Soon Website Video


@inproceedings{zhang2026streamtalk,
  title={StreamTalk: Streaming Co-Speech Gesture Generation with Key-Pose Anchoring},
  author={Zhang, Xiangyue and Li, Jianfang and Zhang, Jiaxu and Yang, Kaixing and Hoi, Steven},
  booktitle={European Conference on Computer Vision},
  year={2026}
}

🔥 SIGGRAPH 2026

MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation

Kaixing Yang, Jiashu Zhu, Xulong Tang, Ziqiao Peng, Xiangyue Zhang, Puwei Wang, Jiahong Wu, Xiangxiang Chu, Hongyan Liu, and Jun He

ACM Transactions on Graphics (SIGGRAPH), 2026

arXiv Bib Code Website Dataset Video


@article{yang2026macedance,
  title={MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation},
  author={Yang, Kaixing and Zhu, Jiashu and Tang, Xulong and Peng, Ziqiao and Zhang, Xiangyue and Wang, Puwei and Wu, Jiahong and Chu, Xiangxiang and Liu, Hongyan and He, Jun},
  journal={ACM Transactions on Graphics (SIGGRAPH)},
  year={2026},
  eprint={2512.18181},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.18181}
}

🔥 ECCV 2026

OmniDance: Multimodal Driven Dance Video Generation with Large-scale Internet Data

Kaixing Yang, Jiashu Zhu, Xulong Tang, Ziqiao Peng, Xiangyue Zhang, Chubin Chen, Puwei Wang, Jiahong Wu, Xiangxiang Chu, Hongyan Liu, and Jun He

European Conference on Computer Vision (ECCV), 2026

Paper Bib Code Website Video


@inproceedings{yang2026omnidance,
  title={OmniDance: Multimodal Driven Dance Video Generation with Large-scale Internet Data},
  author={Yang, Kaixing and Zhu, Jiashu and Tang, Xulong and Peng, Ziqiao and Zhang, Xiangyue and Chen, Chubin and Wang, Puwei and Wu, Jiahong and Chu, Xiangxiang and Liu, Hongyan and He, Jun},
  booktitle={European Conference on Computer Vision},
  year={2026},
  url={https://sun-happy-ykx.github.io/OmniDance/}
}

arXiv 2026

PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers

Xiangyue Zhang, Yiyi Cai, Kunhang Li, Kaixing Yang, You Zhou, Zhengqing Li, Xuangeng Chu, Jiaxu Zhang, and Haiyang Liu

arXiv, 2026

arXiv Bib Code Website Video Demo


@misc{zhang2026personagesture,
  title={PersonaGesture: Single-Reference Co-Speech Gesture Personalization for Unseen Speakers},
  author={Zhang, Xiangyue and Cai, Yiyi and Li, Kunhang and Yang, Kaixing and Zhou, You and Li, Zhengqing and Chu, Xuangeng and Zhang, Jiaxu and Liu, Haiyang},
  year={2026},
  eprint={2605.06064},
  archivePrefix={arXiv},
  url={http://arxiv.org/abs/2605.06064}
}

arXiv 2026

Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors

Pengfei Zhou, Xiangyue Zhang, Xukun Shen, and Yong Hu

arXiv preprint arXiv:2603.29655, 2026

arXiv Bib Code Website Video


@misc{zhou2026framesequalcomplexityawaremasked,
      title={Not All Frames Are Equal: Complexity-Aware Masked Motion Generation via Motion Spectral Descriptors},
      author={Pengfei Zhou and Xiangyue Zhang and Xukun Shen and Yong Hu},
      year={2026},
      eprint={2603.29655},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.29655},
}

🔥 AAAI 2026

Mitigating Error Accumulation in Co-Speech Motion Generation via Global Rotation Diffusion and Multi-Level Constraints

Xiangyue Zhang, Jianfang Li, Jianqiang Ren, and Jiaxu Zhang

Annual AAAI Conference on Artificial Intelligence (AAAI), 2026

arXiv Bib Code Website Video Results


@misc{zhang2025mitigatingerroraccumulationcospeech,
      title={Mitigating Error Accumulation in Co-Speech Motion Generation via Global Rotation Diffusion and Multi-Level Constraints}, 
      author={Xiangyue Zhang and Jianfang Li and Jianqiang Ren and Jiaxu Zhang},
      year={2025},
      eprint={2511.10076},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.10076}, 
}

🔥 ECCV 2026

FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation

Kaixing Yang, Xulong Tang, Ziqiao Peng, Xiangyue Zhang, Puwei Wang, Jun He, and Hongyan Liu

European Conference on Computer Vision (ECCV), 2026

arXiv Bib Code Website Video


@inproceedings{yang2026flowerdance,
  title={FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation},
  author={Kaixing Yang and Xulong Tang and Ziqiao Peng and Xiangyue Zhang and Puwei Wang and Jun He and Hongyan Liu},
  booktitle={European Conference on Computer Vision},
  year={2026},
  eprint={2511.21029},
  archivePrefix={arXiv},
  url={https://arxiv.org/abs/2511.21029}
}

2025

🔥 ICCV 2025

SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis

Xiangyue Zhang, Jianfang Li, Jiaxu Zhang, Ziqiang Dang, Jianqiang Ren, Liefeng Bo, and Zhigang Tu

International Conference on Computer Vision (ICCV), 2025

arXiv Bib Code Website Video Results


@inproceedings{zhang2025semtalk,
  title={SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis},
  author={Zhang, Xiangyue and Li, Jianfang and Zhang, Jiaxu and Dang, Ziqiang and Ren, Jianqiang and Bo, Liefeng and Tu, Zhigang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13761--13771},
  year={2025}
}

🔥 ACM MM 2025

EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation

Xiangyue Zhang, Jianfang Li, Jiaxu Zhang, Jianqiang Ren, Liefeng Bo, and Zhigang Tu

ACM International Conference on Multimedia (ACM MM), 2025

arXiv Bib Code Website Video Results


@inproceedings{zhang2025echomask,
  title={EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation},
  author={Zhang, Xiangyue and Li, Jianfang and Zhang, Jiaxu and Ren, Jianqiang and Bo, Liefeng and Tu, Zhigang},
  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
  pages={10827--10836},
  year={2025}
}

🔥 T-CSVT 2025

Robust 2D Skeleton Action Recognition via Decoupling and Distilling 3D Latent Features

Xiangyue Zhang, Yifan Jia, Jiaxu Zhang, Yijie Yang, and Zhigang Tu

IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 2025