Running your local LLM on MacOS using Docker and Ollama

In this post, we will explore how to run a local Large Language Model (LLM) on MacOS using Docker and Ollama. This setup allows you to leverage the power of LLMs without needing extensive hardware resources.

First, why run an LLM locally?

Privacy: Your data remains on your machine, reducing the risk of data leaks.
Cost: Avoiding cloud costs associated with API calls.
Speed: Local inference can be faster than making API call (small-scale).

Prerequisites

Everything I’m showing here was tested on MacOS 15.x with a M4 Pro cpu, but it should work on other versions as well.

Brew : Ensure you have Homebrew installed. If not, you can install it by running the following command in your terminal:
```
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
- access https://brew.sh for more information.
Docker: Ensure you have Docker installed and running on your Mac. You can download it from Docker’s official site.
- Docker-compose

Get started

First of all, by the time I’m writing this, Docker for MacOS does not support GPU acceleration, and because I want to leverage on the GPU capabilities of my M4 chip, I’ll be running Ollama server outside of the Docker container and use Docker only for OpenWebUI, which is a web-based interface for interacting with LLMs.

Install Ollama

What is Ollama? You can think about Ollama as a package manager for LLms, which allows you to run and manage LLMs locally. It provides a simple command-line interface to download, run, and interact with various LLMs.

Check Ollama’s website for more information.

To install Ollama, you can use Homebrew. Open your terminal and run the following command:

brew install ollama

Once installed, you can verify the installation by running:

ollama --version

This should display the version of Ollama you have installed.

To verify if Ollama is running you can check the status of the service:

brew services info ollama

You should see that the service is running like the output below:

ollama (homebrew.mxcl.ollama)
Running: ✔
Loaded: ✔
Schedulable: ✘
User: ferbass
PID: 75692

if the service is not running you can start it with:

brew services start ollama

Ok, now that you have Ollama installed and running we can download our first model to test. For this example I’ll be using qwen3:4b which is a 4 billion parameter model that should run smoothly on M4 chips.

To download the model, run the following command:

ollama run qwen3:4b

This command will pull and run the Qwen3 model right after downloading it. Alternatively, you can use the ollama pull command to download the model without running it immediately.

After download the model and run it, you should see an input interface on your terminal to interact with the model. Go ahead and try a few prompts to see how it responds. For example, you can ask it:

❯ ollama run qwen3:4b
>>> What is the capital of France?
Thinking...
Okay, the user is asking for the capital of France. Let me start by recalling what I know. France is a country in Europe, and I remember that its capital is Paris. But wait, I should
make sure I'm not confusing it with another city. Let me think... Paris is definitely the capital. I think it's also the largest city in France. But maybe I should double-check.
Sometimes people might confuse other major cities like Lyon or Marseille, but those are not the capitals. The capital is Paris, right? Yes, that's correct. So the answer is Paris. I
should present that clearly.
...done thinking.

The capital of France is **Paris**.

>>>

Cool, now that we have Ollama running and a model downloaded, we can move on to the next step: setting up OpenWebUI.

Install OpenWebUI

OpenWebUI is a self-hosted, open-source chat interface for LLMs. It gives you a web-based experience similar to ChatGPT — but fully local, private, and customizable. It’s designed to work out-of-the-box with backends like Ollama, LM Studio, or OpenAI-compatible APIs, making it perfect for macOS setups running LLMs natively. I’ll not go into details about OpenWebUI in this post, but you can check their GitHub repository for more information.

To run OpenWebUI, we will use Docker. First, ensure you have Docker installed and running on your Mac.

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
    volumes:
      - ${HOME}/open-webui-data:/app/backend/data
    restart: unless-stopped

volumes:
  open-webui-data:

docker-compose up -d

This command will start the OpenWebUI service in detached mode.

If everything goes well, you should see be able to access http://localhost:3000 and interact with OpenWebAI via your web browser using the model you downloaded earlier.

what is next

Now that you have a local LLM running using Ollama and OpenWebUI, I encorage you to explore more models and settings and try to see which one works best for you. You can find more models on the Ollama.

So far on my tests using a MacBook M4 Pro with 48GB of RAM, I was able to run models size up to 8B parameters without any issue.

I encorage you explore different models, settings, and try to play around with Knowledge Base, Tools and Function which I will explore in future posts.

Thanks