Yuankai Luo (罗元凯)
Assistant Professor at Nanjing University
[Google Scholar] [Github] [Email: yuankailuo@nju.edu.cn]
I am currently an Assistant Professor at the School of Artificial Intelligence, Nanjing University (NJU). I received my Ph.D. degree from the School of Computer Science and Engineering at Beihang University, where I was supervised by Prof. Lei Shi and jointly trained at The Hong Kong Polytechnic University under the supervision of Prof. Xiao-Ming Wu. Before that, I did research supervised by Veronika Thost.
My research asks a central question: how can we make models for complex structured data simpler, more efficient, and more reliable? Driven by the philosophy of “Simplicity,” my work has evolved from efficient structural representation and architecture optimization toward real-world embodied intelligence, where lightweight deployment is essential.
1. Embodied Intelligence: Lightweight On-Device Deployment
-
VLA / VLN: our primary focus is building lightweight and scalable embodied agents for robotic manipulation and navigation. SimVLA provides a streamlined VLA baseline by decoupling perception from control and using a standard training recipe, flow matching, and self-attention. CORAL extends this foundation to scalable multi-task deployment through lightweight, parameter-isolated LoRA experts and dynamic instruction-based routing, with minimal storage cost and zero additional inference overhead [IROS 2026].
-
Generation-Aware On-Device Detection and Segmentation: develop unified models that combine visual generation with perception to better handle rare and challenging corner cases. In particular, we explore jointly performing image inpainting and detection or segmentation prediction, allowing models to recover missing visual evidence while recognizing targets efficiently on drones and robotic platforms.
-
Multimodal Reasoning and Real-Time Video Understanding: study efficient multimodal reasoning over continuous video streams from surveillance cameras, drones, and embodied agents. Our goal is to enable real-time understanding of evolving scenes, events, and agent interactions from both third-person and egocentric perspectives.
2. Efficient Structural Representation and Architecture Optimization
-
Systematic Architecture Refinement (ModernGNN / GNN+): revisited classic GNNs through a systematic integration of message passing, residual connections, normalization, and regularization, establishing strong and simple baselines for node- and graph-level learning [NeurIPS 2024, ICML 2025, ICLR 2025].
-
Structural Encoding and Graph Transformers: developed compact, discrete, and interpretable Node IDs through vector quantization [ICLR 2025], and designed Graph Transformers for complex topologies, including DAGs [NeurIPS 2023] and multi-level structures [NeurIPS 2024]. These methods have been applied to molecular property prediction [NeurIPS 2023] and scholarly impact profiling [KDD 2023].
-
Graph Generation: proposed SimGFM, a simplified discrete flow matching framework that uses endpoint-focused scheduling and safe projection to reduce graph generation from hundreds of sampling steps to fewer than ten [ICML 2026].
Recent Publications
Academic Services
Conference Reviewer:
- WSDM 2023/2024, ICML 2024/2025/2026, NeurIPS 2024(Top Reviewer Award)/2025, ACL ARR 2024/2025, ICLR 2025/2026, AAAI 2025