Skip to content

Voice Assistants in the LLM Era

Using AI 7 min read

In Short

The voice assistants you already talk to, Alexa, Siri, and Google Assistant, are being rebuilt on large language models. The new versions hold a conversation and can take multi-step actions, but they are also less predictable and sometimes worse at simple tasks they used to handle. The three to know in mid-2026 are Amazon's Alexa+ (shipped, free with Prime or $19.99 a month), Apple's Siri (still a ChatGPT add-on today, with a Gemini-powered rebuild expected later in 2026), and Google's Gemini replacing Assistant on phones and in the home. Almost all of the work happens in the cloud, which raises the same privacy questions as any cloud AI.

Snapshot caveat: These products are mid-transition, so dates and features shift. Reflects June 2026. Re-verify on each provider's official page.

01. What It Is

A voice assistant is the software you talk to on a speaker, phone, or display. The three most common are Amazon's Alexa, Apple's Siri, and Google Assistant.

Until 2024 these ran on command-and-control software that matched your words against a fixed list of "intents" and needed near-exact phrasing, so it was reliable but rigid.
Since 2024 all three are being rebuilt on large language models, the same technology behind the chat apps in how-to-use-an-llm. The new version understands natural, half-formed, follow-up speech and can chain steps, which also makes it less predictable. A useful mental model is that the old assistant was a vending machine with set buttons, and the new one is a chatty helper who can also push them.

02. Why It Matters

These are the AI products most non-technical people actually touch every day, and Amazon alone says there are over 600 million Alexa devices in the world.

The upgrade often arrives automatically, and sometimes you cannot opt out. Google is replacing Assistant with Gemini rather than offering it as a choice, and once a home speaker is upgraded you cannot switch it back. So millions of people get a different product without choosing it, which is why it helps to know what changed.

03. How It Works

Old command-based versus new LLM-powered

The old assistants worked by pattern-matching. You said one of a set of phrases, the assistant recognized the intent, and it ran a fixed action. Step outside the expected wording and it failed.

The new assistants send your words to a large language model that interprets meaning, holds short-term context, and decides what to do. Google says Gemini for Home "understands natural, non-specific language, similar to a person" and "maintains short-term context" so you can ask follow-up questions.
The speech-to-text and text-to-speech pipeline underneath is covered in speech-and-audio-ai.

The three big ones in mid-2026

Amazon Alexa+:
Amazon announced Alexa+ on February 26, 2025 as its first generative-AI rebuild of Alexa, and made it available to everyone in the US on February 4, 2026. It costs $19.99 a month or is free for Prime members, with a separate free, usage-limited text-chat version at Alexa.com. It is "agentic," meaning it can carry out multi-step actions across services, such as booking through OpenTable or ordering groceries, not just answer. It runs on more than one model picked per request, from Amazon Nova and Anthropic's Claude, through Amazon Bedrock. Reported figures put Nova at over 70% of conversations in a recent four-week window, with Claude handling most complex tasks, a split to treat as a moving snapshot.

Apple Siri:
As of mid-2026 Siri works in two layers. Apple's own Siri handles device tasks, and an optional, opt-in ChatGPT hand-off, shipped in iOS 18.2 on December 11, 2024, takes certain hard questions. That hand-off is off by default, is usable without an account, and Siri always asks before sending a photo or file. Apple's privacy pitch is on-device first, sending only the needed data to its Private Cloud Compute servers when more power is required. The deeper "more personal Siri" rebuild was announced delayed on March 7, 2025 and has not shipped as of June 2026. It is now expected later in 2026 and, as reported, will run on Google's Gemini, with a deal widely reported at about $1 billion a year that Apple has not confirmed. This timing is the most volatile item here, so re-check it.

Google Gemini:
Google is replacing Google Assistant with Gemini. On phones the switch slipped from late 2025 into 2026, with new phones like the Pixel 10 shipping Gemini only. Gemini for Home, announced August 20, 2025, replaces Assistant on speakers and displays made since 2016 at no extra cost, still using "Hey Google." Once a device is upgraded you cannot switch it back. The free-flowing conversation mode, Gemini Live, needs a paid Google Home Premium subscription, though the basic voice assistant is free.

Where they still fail

The new capabilities are real. All three keep context for follow-up questions, summarize documents or emails, and chain steps, such as taking an emailed school schedule and adding the dates to a calendar.

The failures are also real. Reviews of Alexa+ through 2025 reportedly found misquoted prices and dropped alarm and timer commands, with delays running to tens of seconds. Gemini for Home users reportedly found it could no longer reliably set alarms or control lights that the old Assistant handled easily. The cause is structural. A language model is probabilistic and improvises, while home control wants predictable, deterministic behavior, so a task that used to be a fixed command can now be guessed wrong.

Privacy, from wake word to cloud

"Always recording" is a myth. The device runs local keyword spotting that listens for "Alexa" or "Hey Google" without sending audio anywhere. Only when it thinks it heard the wake word does it start streaming to the cloud, where Alexa also runs a second cloud check on the wake word.

The real shift is where the understanding happens. With LLM assistants, almost all of the processing is in the cloud. Tied to that, Amazon removed the on-device "Do Not Send Voice Recordings" option on March 28, 2025, because Alexa+ relies on cloud processing. Affected users were moved to "Don't Save Recordings," which deletes audio after processing but disables Voice ID personalization.
The retention and human-review questions match any cloud AI, covered in ai-privacy-and-your-data.

04. Key Terms

Term Plain meaning
Command-and-control assistant The old style. Matches your speech to a fixed list of commands and needs near-exact phrasing. Reliable but rigid.
LLM-powered assistant The new style. A large language model interprets natural speech, holds context, and can chain steps. Flexible but less predictable.
Wake word The trigger phrase ("Alexa," "Hey Siri," "Hey Google") the device listens for locally. Only after it is detected does audio go to the cloud.
Agentic / taking action The assistant does not just answer, it carries out a task across services such as booking or ordering.
The same idea as agentic-browsers-and-computer-use.
Hand-off / fallback When one assistant passes a hard request to another model, as today's Siri hands certain questions to ChatGPT.
On-device vs cloud Whether your request is handled on the gadget or sent to company servers. Wake-word spotting is on-device, the LLM understanding is in the cloud.
Hallucination A confident, wrong answer such as a made-up price. The main reason the new assistants can be less trustworthy than the old ones on specifics.

05. Examples

  • A follow-up conversation. Ask a question, then ask "what about tomorrow?" without repeating the whole request. The assistant keeps short-term context.
  • A multi-step chore. Alexa+ can take an emailed school schedule and add the dates to your calendar, or walk you through a recipe and shop for missing ingredients.
  • Still do this yourself. For a critical alarm, timer, or smart-home action, keep your old habit and confirm it worked. Reviewers reportedly found the new assistants dropping exactly these basic commands.

06. Common Misconceptions

"The new AI assistant is just a smarter version of the old one."
It is a different kind of software underneath. The old one followed fixed rules and was predictable. The new one runs a language model that improvises, so it can get formerly reliable tasks like alarms and light switches wrong.

"My speaker is always recording everything I say."
The device listens locally for the wake word and does not stream audio until it thinks it heard it. The real change is that, once activated, almost all processing now happens in the company's cloud, and Amazon removed the option to keep it local.

"Siri is already a ChatGPT-level assistant."
As of mid-2026 the Siri you use still runs Apple's own system for device tasks and only hands certain questions to ChatGPT, which is off by default. The deeply rebuilt Siri, now planned to run on Gemini, has not shipped and has been delayed more than once.

"If I do not like the new one, the old assistant will still be there."
These are replacements, not additions. Gemini for Home does not let you switch a speaker back, and Google is retiring the classic Assistant on phones. The change can arrive automatically.

"An assistant that can book and order things is pure convenience."
Once an assistant takes real actions, a wrong or hallucinated step has real consequences, like a wrong booking or the wrong smart-home action.
That is the same supervision-and-trust issue as agentic-browsers-and-computer-use, now pointed at your home and accounts.

Verified against primary sources

Every claim traces to a cited source below.

Key terms

Voice assistant
Software you talk to on a speaker, phone, or display.
Intent matching
The old command-and-control approach needing near-exact phrasing.
LLM rebuild
The shift to LLM-based assistants that converse and chain steps.

Tags

#voice-assistants #alexa #siri #llm #privacy

More in Do More With AI