Skip to content

How to Use an LLM

Using AI 6 min read

In Short

There are three ways to use a large language model. You can subscribe to a chat app like ChatGPT or Claude and use it in a browser or phone app with zero setup. You can use a developer API, where software sends text to the model and you pay per use. Or you can install an open model on your own computer and run it offline. For almost everyone starting out, a chat app is the right answer. The other two paths matter only when you are building something or when privacy and offline use are the priority.

01. What It Is

Using a large language model means sending it text and getting text back. The model itself is the same kind of thing in every case, a trained neural network that predicts the next words. What differs is where the model runs and how you reach it. That single difference (whose computer does the work) is what separates the three paths and drives every practical tradeoff: cost, privacy, convenience, and how capable the model is.

The three paths are:

  1. Chat app or subscription. You use a finished product (ChatGPT, Claude, Gemini, Grok, Copilot, and others) through a website or a phone app. The company runs the model on its own powerful servers. You just type.
  2. API (Application Programming Interface). A way for software to talk to the model. There is no chat window. A program sends text to the provider and gets an answer back, and you pay for the amount of text processed. This is the path for developers and for tools that build AI features on top of a model.
  3. Local install. You download an open model (such as Llama, Gemma, or Qwen) and run it on your own PC or Mac. Nothing leaves your machine. This needs capable hardware and a bit of setup.

02. Why It Matters

Picking the wrong path wastes money or time. Someone who only wants to draft emails and ask questions does not need to buy a graphics card and install software, a chat subscription does the job for a fixed monthly fee. Someone building an app that summarises thousands of documents should not be copying and pasting into a chat window, they need the API. Someone who must keep sensitive data on their own machine cannot use a cloud chat app at all and has to run a model locally. Knowing which path fits the goal is the first decision, before any talk of which specific model is "best."

03. How It Works: The Three Paths Compared

1. Chat app (subscription)

You open a browser tab or install an app, sign in, and start typing. The model runs in the provider's data centre on hardware you could never fit at home. You get the company's strongest models, a polished interface, and extra features bolted on (web search, file uploads, image generation, voice).

  • Cost: a flat monthly fee. Most have a free tier with limits, and a paid tier around 20 US dollars a month.
    See chat-apps-and-subscriptions.
  • Setup: none. Works on any device with a browser.
  • Privacy: your conversations are sent to the company and may be used to improve their models unless you opt out in settings.
  • Capability: the highest available. Chat apps run the frontier models.

This is the path the vast majority of people should use.

2. API (for developers and tools)

An API is plumbing, not a product. Instead of a chat window, a piece of software makes a request to the provider's servers, includes your text, and receives the model's answer. You need either some code or a third-party tool that is built to talk to APIs.

  • Cost: pay-as-you-go, measured in tokens (chunks of text, roughly three-quarters of a word). You pay for both the text you send and the text you get back. Cheap at low volume, and it can scale to large bills at high volume.
    See tokens-and-tokenization and cost-latency-deployment.
  • Setup: requires technical work or a tool built on top of it.
  • Privacy: data still goes to the provider's servers, though business and "zero-retention" tiers exist.
  • Capability: the same frontier models as the chat apps, plus full control over how they are used.

A non-coder rarely touches an API directly. It is named here so the landscape is complete, and because many of the tools people do use are API clients underneath.

3. Local install (run it on your own machine)

You download an open-weight model (one whose internals are released publicly) and run it with a free program like Ollama or LM Studio. The model lives on your hard drive and runs on your own processor or graphics card.

  • Cost: free to run. The cost is the hardware you already own, plus electricity.
  • Setup: install a program, download a model file (several gigabytes), and you are running. Modern tools have made this much easier than it used to be.
    See running-llms-locally.
  • Privacy: the strongest. Your prompts never leave your computer. It works with no internet once the model is downloaded.
  • Capability: lower than the cloud frontier. You can run small and mid-size models, not the hundreds-of-billions-parameter giants.
    See hardware-and-performance.

This path is for privacy, offline use, tinkering, and avoiding subscription fees, accepting a capable but not top-tier model in return.

04. Key Terms

Term Plain meaning
Chat app A finished product (ChatGPT, Claude, Gemini) you use in a browser or app. The model runs on the company's servers.
Subscription A flat monthly fee for a chat app's paid tier. Usually around 20 US dollars a month.
API A way for software to talk to a model. No chat window. Billed per token. For developers and tools.
Token A chunk of text, roughly three-quarters of a word. The unit AI usage is measured and billed in.
Local / on-device Running a model on your own computer, offline, with nothing sent to a company.
Open-weight model A model whose trained internals are released publicly so anyone can download and run it.
Frontier model The most capable models available at any given time, run by the big providers in the cloud.

05. Examples

  • Drafting a cover letter, asking everyday questions, summarising an article you paste in. Use a chat app. The free tier of ChatGPT, Claude, or Gemini handles this. Upgrade to a paid tier only if you hit the usage limits.
  • A small business that wants an AI assistant inside its own software. Use an API, or hire someone who can. The model answers through the company's product, not a chat window.
  • A lawyer or doctor who cannot send client data to an outside company. Run a model locally so the data never leaves the office machine.
  • A hobbyist who wants to experiment without paying every month. Install Ollama or LM Studio and run an open model for free.

06. Common Pitfalls / Misconceptions

"I need to install something to use AI."
No. The easiest and most capable path, a chat app, requires no installation at all. Local install is the advanced, optional route.

"The API is the cheap version of ChatGPT."
They are different things. The API has no interface and is billed per token. For light personal use a flat subscription is usually simpler and often cheaper than metered API billing, and far easier to use.

"Running it locally gives me the same model as ChatGPT."
It does not. Local models are open-weight models that are smaller and generally less capable than the frontier models behind the big chat apps. What you gain is privacy and offline use, not equal capability.

"Free chat apps are private."
Free tiers are the most likely to use your conversations for training. If privacy matters, read the provider's settings and data policy, or run a model locally.

"There is one best way."
There is only the best way for a given goal. Most people want a chat app. Builders want an API. The privacy-conscious and the tinkerers want local.
See cloud-vs-local-which-to-choose for the decision in detail.

Verified against primary sources

Every claim traces to a cited source below.

Key terms

Chat app
A finished product like ChatGPT run on the company's servers, zero setup.
API
A way for software to send text to a model and pay per use.
Local install
Running an open model on your own computer, offline.

Tags

#llm #chat-apps #api #local-models #getting-started

More in Getting Started