Macmini 下安装并使用本地大模型的文章已经满天飞了,记录一下,还是稍有不同的:
起点就是先安装homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
然后装一些必备工具
brew install uv
brew install procps
brew install htop
装编译工具
brew install cmake curl
编译llama.cpp
# Clone llama.cpp
git clone https://github.com/ggml-org/llama.cpp
# Configure build with Metal acceleration
cmake llama.cpp -B llama.cpp/build \
-DBUILD_SHARED_LIBS=OFF \
-DGGML_METAL=ON \
-DGGML_CUDA=OFF
# Build
cmake --build llama.cpp/build \
--config Release \
-j$(sysctl -n hw.ncpu) \
--clean-first \
--target llama-cli llama-mtmd-cli llama-server llama-gguf-split
下载模型文件
# download model
curl -L -o models/Qwen3.5-9B-UD-Q4_K_KL.gguf \
"https://huggingface.co/unsloth/Qwen3.5-9B-MTP-GGUF/resolve/main/Qwen3.5-9B-UD-Q4_K_XL.gguf?download=true"
下载模板,注意:如果模型用来跑openclaw,需要下载agent兼容得模板,否则没用
mkdir templates && \
curl -o templates/qwen35.jinja \
"https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates/resolve/main/chat_template.jinja"
启动一下,测测
./llama.cpp/llama-server \
-m models/Qwen3.5-9B-UD-Q4_K_XL.gguf \
--chat-template-file templates/qwen35.jinja \
--temp 0.7 \
--top-p 0.9 \
--top-k 20 \
-c 64000 \
-ngl 20 \
--host 127.0.0.1 \
--port 8080
建立mac系统的所谓systemd,真是臭长
sudo vi /Library/LaunchDaemons/com.openclaw.llama-server.plist
# 内容如下,注意替换掉 YOUR_USERNAME
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.openclaw.llama-server</string>
<key>UserName</key>
<string>YOUR_USERNAME</string>
<key>ProgramArguments</key>
<array>
<string>/Users/YOUR_USERNAME/llama.cpp/llama-server</string>
<string>-m</string>
<string>/Users/YOUR_USERNAME/models/Qwen3.5-9B-UD-Q4_K_XL.gguf</string>
<string>--chat-template-file</string>
<string>/Users/YOUR_USERNAME/templates/qwen35.jinja</string>
<string>--temp</string>
<string>0.7</string>
<string>--top-p</string>
<string>0.9</string>
<string>--top-k</string>
<string>20</string>
<string>-c</string>
<string>64000</string>
<string>-ngl</string>
<string>20</string>
<string>--host</string>
<string>127.0.0.1</string>
<string>--port</string>
<string>8080</string>
</array>
<key>WorkingDirectory</key>
<string>/Users/YOUR_USERNAME</string>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/llama-server.log</string>
<key>StandardErrorPath</key>
<string>/tmp/llama-server.err</string>
</dict>
</plist>
启动plist
sudo chown root:wheel /Library/LaunchDaemons/com.openclaw.llama-server.plist && \
sudo chmod 644 /Library/LaunchDaemons/com.openclaw.llama-server.plist && \
sudo launchctl bootstrap system /Library/LaunchDaemons/com.openclaw.llama-server.plist
编辑openclaw,在.openclaw/openclaw.json文件中新增models配置块
{
"models": {
"providers": {
"local": {
"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "sk-local",
"api": "openai-completions",
"models": [
{
"id": "qwen3-9b",
"name": "Qwen3.5 9B Local",
"contextWindow": 64000,
"maxTokens": 8192
}
]
}
/* REMOVE THIS COMMENT */
/* you may add additional providers, like anthropic here */
}
}
}
设置agents的缺省模型:
"agents": {
"defaults": {
"model": {
"primary": "local/qwen3-9b"
},
"models": {
"local/qwen3-9b": {}
}
}
用龙虾测试一下:
openclaw config validate
openclaw gateway restart
openclaw models list --provider local
openclaw infer model run \
--model local/qwen3-9b \
--prompt "Reply with exactly: pong" \
--json
做个skill来试试
mkdir -p ~/.openclaw/workspace/skills/python-calc
cat << 'EOF' > ~/.openclaw/workspace/skills/python-calc/SKILL.md
---
name: python-calc
description: A tool that evaluates mathematical expressions by executing a Python one-liner.
version: 1.0.0
---
## Instructions
1. Extract the exact mathematical expression the user wants to calculate.
2. Use your built-in shell tool to run this exact command, replacing `<expr>` with the expression: `python3 -c "print(<expr>)"`
3. Wait for the shell tool to return the stdout output.
4. You MUST generate a final conversational response to the user containing the exact numeric result returned by the script.
EOF
调用:
openclaw agent --local --agent main --verbose on --thinking high --message \
"Use the python-calc skill to calculate 8664 multiplied by 222.
Do not use skill_workshop. Tell me the final answer."