01. What It Is
Embodied AI is AI research concerned with agents that perceive and act within a physical or richly simulated environment using a body. The core claim is that intelligence cannot be fully separated from physical interaction with the world: perception, action, and cognition are deeply linked.
This contrasts with purely linguistic AI, where the system processes text or other digital tokens. An embodied agent must interpret sensory streams (cameras, lidar, force sensors, microphones), plan in physical space, and execute precise motor commands in real time, often on hardware with strict latency requirements.
Classical robotics (pre-deep-learning) used explicit models: hand-engineered kinematics, pre-defined object models, and rule-based planning. This produced reliable, repeatable behavior in structured environments such as factory assembly lines, but broke down in unstructured environments where every object and situation must be anticipated in advance.
Learned robotics uses neural networks trained on data to map observations to actions. The shift began in earnest around 2016 to 2018 and has accelerated significantly since 2022 with large-scale transformer architectures.