MotionLLM: Multimodal Motion-Language Learning with Large Language Models

Recent advancements in Multimodal Large Language Models (MM-LLMs) have demonstrated promising potential in terms of generalization and robustness when applied to different modalities. While previous works have already achieved 3D human motion generation using various approaches including language modeling, they mostly use specialized architecture and are restricted to single-human motion generation. Inspired by the success of MM-LLMs, we propose MotionLLM, a simple and general framework that can achieve single-human, multi-human motion generation, and motion captioning by fine-tuning pre-trained LLMs. Specifically, we encode and quantize motions into discrete LLM-understandable tokens, which results in a unified vocabulary consisting of both motion and text tokens. With only 1-3% parameters of the LLMs trained by using adapters, our single-human motion generation achieves comparable results to those diffusion models and other trained-from-scratch transformer-based models. Additionally, we show that our approach is scalable and flexible, allowing easy extension to multi-human motion generation through autoregressive generation of single-human motions.

Single-Human Motion Generation

a person performs a backflip

a man is walking as if to be a zombie

a person is doing rope skipping exercise in the park

a person walks forward, turns around, and walks back the way he came

Single-Human Motion Generation Comparison

Ours: A man kneels down and proposes marriage

MoMask: A man kneels down and proposes marriage

MotionGPT: A man kneels down and proposes marriage

T2M-GPT: A man kneels down and proposes marriage

More comparison between MotionLLM and MoMask

Ours: A man stands motionless and then take one steps backwards to the left

MoMask: A man stands motionless and then take one steps backwards to the left

Ours: A person jumps and spins in the air 360 degrees counterclockwise

MoMask: A person jumps and spins in the air 360 degrees counterclockwise