Local LLM Setup Guide
Set up a local AI engine for Privy apps. This guide covers
Ollama, LM Studio, and
Rapid-MLX, using Google Gemma 3 4B
(gemma3:4b) or a comparable local model as an example.
Use Ollama or LM Studio for broad platform support, and Rapid-MLX
when you want an OpenAI-compatible local server optimized for
Apple Silicon Macs.
⚡ One-Click Automated Installer (macOS)
Interactive installer for Ollama, LM Studio, or Rapid-MLX on macOS, with 16GB-friendly model picks (4B–7B). Run in Terminal:
curl -fsSL https://privy.kenelite.com/engine/install.sh | bash Engines, model scenarios, API endpoints, and integration examples are on the Quick Start guide.
1. Ollama
Ollama provides a CLI and local API. Default port is 11434.
1.1 Install
- macOS: Download from ollama.com/download/mac; the app auto-updates.
- Windows: Download
OllamaSetup.exefrom ollama.com/download, or run in PowerShell:
Requires Windows 10 or later.irm https://ollama.com/install.ps1 | iex - Linux: Run in a terminal:
curl -fsSL https://ollama.com/install.sh | sh
1.2 Pull a model (e.g. Gemma 3 4B)
In Ollama this model is named gemma3:4b (Hugging Face: google/gemma-3-4b).
ollama pull gemma3:4b Then run a chat:
ollama run gemma3:4b 1.3 Allow LAN access
By default Ollama listens on 127.0.0.1. To allow other devices on your network, set OLLAMA_HOST=0.0.0.0.
- One-off (current terminal):
If the Ollama app is already running, quit it first, then run the command above in a terminal.OLLAMA_HOST=0.0.0.0 ollama serve - macOS (persistent): Edit Ollama’s launchd plist (e.g. under
~/Library/LaunchAgents/or Homebrew’s plist) and add inside the<dict>:
Restart Ollama after saving. Alternatively, skip editing the plist and run<key>EnvironmentVariables</key> <dict> <key>OLLAMA_HOST</key> <string>0.0.0.0</string> </dict>OLLAMA_HOST=0.0.0.0 ollama servein a terminal when you need LAN access. - Windows: Add a user or system environment variable
OLLAMA_HOST=0.0.0.0, then restart the Ollama app/service. - Linux (systemd): Edit
/etc/systemd/system/ollama.service(or equivalent), add under[Service]:
Then run:Environment="OLLAMA_HOST=0.0.0.0"
sudo systemctl daemon-reload sudo systemctl restart ollama
For browser or cross-origin clients you may also set OLLAMA_ORIGINS=* (recommended only on a trusted LAN).
Other devices on the LAN can then use http://<your-machine-IP>:11434 (e.g. http://192.168.1.100:11434).
2. LM Studio
LM Studio offers a GUI and an OpenAI-compatible local API. It includes the lms CLI for downloading models, loading them, and running the server from the terminal. See LM Studio CLI docs for the full reference.
2.1 Install
- macOS: Download from lmstudio.ai/download (Apple Silicon only), or:
curl -fsSL https://lmstudio.ai/install.sh | bash - Windows: Download the installer from the same page, or PowerShell:
irm https://lmstudio.ai/install.ps1 | iex - Linux: Download the AppImage or use the install script from the official site.
16GB+ RAM is recommended; on Windows, 4GB+ dedicated VRAM is recommended. You must run LM Studio at least once before the lms CLI is available.
2.2 Download and load a model (e.g. Gemma 3 4B)
GUI: Open LM Studio, search for Gemma 3 4B or google/gemma-3-4b in the discovery view, choose a quantization (e.g. Q4_K_M), and download. Then load the model in the Local Server / Developer tab.
CLI: Use lms get to search and download models, lms ls to list models on disk, and lms load to load a model (e.g. with --gpu=max or --context-length=8192). Example:
lms get google/gemma-3-4b
lms load google/gemma-3-4b --identifier="gemma3-4b" Start the server with lms server start; stop it with lms server stop. Custom port: lms server start --port 3000. For web or cross-origin clients, add --cors (use only on a trusted network).
2.3 Allow LAN access
- GUI: In LM Studio’s server settings, enable “Serve on Local Network”. The server will bind to your machine’s LAN IP so other devices on the same network can reach it. See Serve on Local Network.
- CLI: Bind to all interfaces so the server is reachable on the LAN:
Or set the environment variablelms server start --bind 0.0.0.0LMS_SERVER_HOST=0.0.0.0before starting the server.
Default port is usually 1234 (or the last used port). Use http://<your-machine-IP>:1234 as the API base URL from other devices on the LAN.
3. Rapid-MLX
Rapid-MLX
is a local AI engine for Apple Silicon Macs. It exposes an
OpenAI-compatible API, so apps that support a custom OpenAI base
URL can point to http://localhost:8000/v1. Use it
when you are running Privy apps on or near a Mac with Apple Silicon
and want a fast local model server.
3.1 Install
- macOS Apple Silicon: Homebrew is the recommended install path:
brew install raullenchai/rapid-mlx/rapid-mlx - pip: Requires Python 3.10 or later:
pip install rapid-mlx - One-line installer: Auto-setup script from the Rapid-MLX project:
curl -fsSL https://raullenchai.github.io/Rapid-MLX/install.sh | bash
3.2 Serve a model
Rapid-MLX uses short model aliases. A practical starting point on
a 16 GB Apple Silicon Mac is qwen3.5-4b; run
rapid-mlx models to list available aliases.
rapid-mlx serve qwen3.5-4b
The first run downloads the model. When the server is ready, use
http://localhost:8000/v1 as the OpenAI-compatible
base URL, with default as the model name.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"default","messages":[{"role":"user","content":"Say hello"}]}' 3.3 LAN access
Rapid-MLX serves on port 8000 by default and its
server flag reference lists --host with default
0.0.0.0. On a trusted LAN, other devices can use
http://<your-machine-IP>:8000/v1 if the macOS
firewall allows inbound access.
For a stricter local-only setup, bind to localhost:
rapid-mlx serve qwen3.5-4b --host 127.0.0.1 --port 8000 If you expose Rapid-MLX beyond your own Mac, consider setting an API key and limiting access to trusted devices only.
4. Summary
| Item | Ollama | LM Studio | Rapid-MLX |
|---|---|---|---|
| Example model | gemma3:4b | Gemma 3 4B (Hugging Face) | qwen3.5-4b or another Rapid-MLX alias |
| Pull / download | ollama pull gemma3:4b | GUI or lms get; load with lms load | rapid-mlx serve qwen3.5-4b |
| Default port | 11434 | 1234 | 8000 |
| LAN access | OLLAMA_HOST=0.0.0.0 | Enable “Serve on Local Network” or --bind 0.0.0.0 | Default host is 0.0.0.0; use --host 127.0.0.1 for local-only |
| Best fit | Simple cross-platform local LLM server | Desktop GUI with OpenAI-compatible local API | Fast Apple Silicon OpenAI-compatible server |
Related links
- Ollama — Run LLMs locally
- Ollama: gemma3:4b
- LM Studio — Desktop local LLM and API
- LM Studio: lms CLI — Command reference (
lms get,lms load,lms server start, etc.) - LM Studio: lms server start — Port, CORS, and server options
- LM Studio: Serve on Local Network
- Rapid-MLX — Apple Silicon local AI engine with OpenAI-compatible API
This page is a public LLM setup reference from the Privy product site, for use with PrivyPDF, PrivyFeed, PrivaTranslate, PrivyApiStudio, and other apps that use a local LLM.