Based on 86 core empirical studies (2021-2025) focusing on LLM applications in HRI
Study Period
2021-2025
LLM-HRI research boom period
Total Literature
86 papers
Core empirical studies
Research Focus
9 categories
Morphology/method/evaluation
Coverage Areas
8 domains
Healthcare/education/domestic etc.
Tip: data of 2025 is cut off at September 10, 2025.
RQ1: How do LLMs transform the foundational capabilities of HRI?
LLMs are fundamentally transforming the HRI research domain by shifting its paradigm from static, pre-programmed interaction pipelines to adaptive, generative, and socially grounded embodied intelligence. Our synthesis identifies a reconceptualization of HRI processes through a Sense-Interaction-Alignment framework. LLMs enable robots to move beyond basic environmental perception (Sense) to achieve contextual and social understanding. They redefine robot action as collaborative, proactive, and agentic Interaction, capable of generative social communication and task co-creation. Finally, they introduce a continuous Alignment phase, emphasizing long-term personalization, memory, and multi-level repair mechanisms to ensure behaviors remain safe, ethical, and congruent with human expectations over time. This transformation positions LLMs not merely as tools but as core cognitive engines that facilitate more natural, flexible, and context-aware human-robot collaboration.
RQ2: How are LLMs integrated into HRI system design?
The integration of LLMs into HRI systems is multifaceted, involving deliberate design choices across modality, morphology, and autonomy levels, which are deployed across diverse application domains. Integration is primarily achieved by embedding LLMs as central reasoning and dialogue engines within a robot's architecture. This involves: 1) Selecting interaction modalities (text, voice, visuals, motion, hybrid) where LLMs process inputs and generate outputs; 2) Choosing robot morphologies (humanoid, functional, etc.) that align with the intended social or task-oriented application; and 3) Determining the level of autonomy (teleoperation, semi-autonomy, full autonomy), which defines the balance between LLM-driven agency and human oversight.
RQ3: How can LLM-driven HRI systems be evaluated?
The evaluation of LLM-driven HRI systems employs a mixed-methods approach that combines established HRI methodologies with new metrics tailored to assess LLM-specific capabilities. Methodologically, research relies heavily on controlled laboratory experiments and, increasingly, on field deployments to test robustness in real-world settings. These are complemented by qualitative methods (interviews, case studies) and quantitative tools (questionnaires, standardized scales). The Wizard-of-Oz technique remains prevalent to simulate advanced capabilities during development. Evaluation strategies have evolved to include both objective metrics (e.g., task completion time, accuracy, LLM response latency, code execution success) and expanded subjective metrics. The latter now assess not only traditional constructs like usability and perceived safety but also LLM-influenced dimensions such as perceived intelligence (including theory-of-mind and emotional intelligence), quality of relational experience, dialogue fluency, and anthropomorphism tied to conversational competence.
RQ4: What are the opportunities and challenges for future research? (
LLMs present significant opportunities to advance HRI by enabling more natural communication, robust contextual reasoning, and longitudinal adaptation. However, our review synthesizes eleven key challenges that delineate critical future research directions. These are organized within the Sense-Interaction-Alignment framework:
Collectively, these challenges highlight the need for future work to focus on robustness, ethical design, longitudinal evaluation, and human-centered alignment to realize the safe and effective integration of LLMs into embodied HRI systems.