01. What It Is
Edge AI is the practice of running trained ML models on the device where data is generated, rather than transmitting that data to a cloud server for processing. "The edge" refers to any compute resource outside a centralized data center: a smartphone, a microcontroller, a browser tab, an IoT sensor, an in-car computer, or a hospital workstation operating in an air-gapped network.
The distinction is architectural. In cloud inference, the user's device sends raw data (an audio clip, a photo, a sensor reading) over a network connection, a large model runs in the cloud and returns a result. In on-device inference, a model resident on the device processes the data locally and the result is produced without a network round-trip.