Ollama is the easiest way to run a real AI model on your Mac. Free, open source, no account, no subscription, no internet connection needed after setup. From a fresh Mac to a working local AI takes about ten minutes — and most of that is the model download. This guide walks through the whole setup, with the commands you actually need and the troubleshooting you might run into.

What You'll Need

An Apple Silicon Mac (M1, M2, M3, M4, or M5). Intel Macs are technically supported but the experience is poor enough we don't recommend it. See our hardware requirements guide for full details.

macOS 14 (Sonoma) or later. Earlier versions work but lack full Metal GPU acceleration.

16 GB of RAM (recommended). 8 GB Macs can run small models; 16 GB is the practical professional minimum.

10 GB free disk space. Enough for the Ollama install and a typical first model.

Step 1: Install Ollama

Two ways. Pick whichever you prefer.

Option A: Download the .dmg (easier)

Go to ollama.com and click the macOS download. Open the .dmg, drag Ollama to Applications, launch it. The first launch installs the command-line tool and starts the background server.

Option B: Install via Homebrew (faster for terminal users)

If you have Homebrew installed:

brew install ollama brew services start ollama

Verify it's running

Open Terminal and run:

ollama --version

You should see a version number. If you get “command not found,” the install didn't complete or your shell can't find Ollama's binary. The Ollama app should have placed it in /usr/local/bin/— restart Terminal and try again.

Step 2: Pull Your First Model

Ollama downloads models from its own model registry. The simplest first model for most professionals is Qwen2.5-7B:

ollama pull qwen2.5:7b

This downloads about 4.5 GB. On a fast connection it takes 2–5 minutes; on slower connections it can take much longer. Grab a coffee.

Once it finishes, verify by listing your installed models:

ollama list

Step 3: Talk to It

Run an interactive chat session in the terminal:

ollama run qwen2.5:7b

You'll get a prompt. Type a question:

>>> Explain the difference between an LLC and an S-Corp in plain English.

You should see the model start responding within about 50 milliseconds. Type/bye to exit the chat session.

That's it. If the model responded, your local AI is working. Everything else is configuration — connecting it to your editor, choosing different models, building applications on top.

Step 4: Try a Bigger Model (Optional)

If 7B isn't enough quality for your work and you have 24+ GB of RAM, try Qwen3-14B:

ollama pull qwen3:14b ollama run qwen3:14b

See our best local LLMs roundupfor which model is right for which task.

Step 5: Connect to Your Editor (Developers)

If you're using Ollama for coding work, the standard setup is to pair it with Continue.dev — an open-source VS Code/JetBrains extension that provides a Copilot-style experience using your local Ollama server.

Install Continue from the VS Code extension marketplace. Open its config (~/.continue/config.json) and add Ollama as a provider:

{
  "models": [
    {
      "title": "Qwen Coder",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b",
      "apiBase": "http://localhost:11434"
    }
  ]
}

Restart VS Code. Hit Cmd+L for the chat panel. You now have local code completion and chat-in-editor with no data leaving your machine.

Step 6: Use It With an Application

Ollama exposes an HTTP API on localhost:11434 that other apps can connect to. This is how local AI tools work — they don't reinvent model serving, they just talk to Ollama.

Most professional users don't need to interact with the API directly. For a packaged experience without configuration, Hey Eduardobundles its own model and provides a screen-aware chat interface — no Ollama installation needed.

Common Problems and Fixes

“ollama: command not found”

The Ollama binary isn't in your shell's PATH. Restart Terminal. If that doesn't help, the install may have failed — re-run the installer or run brew install ollama.

Model is slow / responses lag

Most often a RAM issue. Run ollama ps to see what's loaded. If the model is bigger than your free RAM, macOS swaps to disk and everything crawls. Try a smaller model, or close other apps to free memory.

“Error: pull request failed”

Network issue or temporary registry outage. Wait a minute, try again. If persistent, check that nothing on your network blocks the Ollama domain.

Ollama is using too much disk space

Models accumulate. Check what's installed with ollama list, remove ones you're not using:

ollama rm qwen2.5:7b

Model output is poor quality

Try a bigger model (14B or 30B if your RAM allows it), or a higher-quality quantization (Q5_K_M instead of the default Q4_K_M):

ollama pull qwen2.5:7b-q5_K_M

Where to Go Next

Once you have Ollama working, the natural next steps:

For developers: Wire it into your editor with Continue.dev (above), then pair with Hey Eduardo for non-coding chat work. See the developer AI privacy guide for the full recommended stack.

For professional users: If you want a polished app that handles screen capture, voice input, document analysis, and the visual chat experience without configuring anything, Hey Eduardo is purpose-built for that.

For tinkerers: Build something with the Ollama API. It speaks OpenAI-compatible JSON, so most ChatGPT-API-based tutorials work with minor URL changes.


Part of our On-Device AI cluster: See the pillar guide for the full picture, the model recommendations, the hardware requirements, or our cost analysis.

Sources & Citations

  1. Ollama. Official documentation. ollama.com
  2. Continue.dev. Documentation and setup guide. continue.dev
  3. SitePoint. “Run Local LLMs 2026 Complete Developer Guide.” sitepoint.com
  4. Dev.to. “How to Run DeepSeek Locally in 2026.” dev.to