Macmini 下安装并使用本地大模型的文章已经满天飞了,记录一下,还是稍有不同的:

起点就是先安装homebrew

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

然后装一些必备工具

brew install uv
brew install procps

brew install htop

装编译工具

brew install cmake curl

编译llama.cpp

# Clone llama.cpp
git clone https://github.com/ggml-org/llama.cpp

# Configure build with Metal acceleration
cmake llama.cpp -B llama.cpp/build \
    -DBUILD_SHARED_LIBS=OFF \
    -DGGML_METAL=ON \
    -DGGML_CUDA=OFF

# Build
cmake --build llama.cpp/build \
    --config Release \
    -j$(sysctl -n hw.ncpu) \
    --clean-first \
    --target llama-cli llama-mtmd-cli llama-server llama-gguf-split

下载模型文件

# download model
curl -L -o models/Qwen3.5-9B-UD-Q4_K_KL.gguf \
"https://huggingface.co/unsloth/Qwen3.5-9B-MTP-GGUF/resolve/main/Qwen3.5-9B-UD-Q4_K_XL.gguf?download=true"

下载模板,注意:如果模型用来跑openclaw,需要下载agent兼容得模板,否则没用

mkdir templates && \
curl -o templates/qwen35.jinja \
"https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates/resolve/main/chat_template.jinja"

启动一下,测测

 ./llama.cpp/llama-server \
  -m models/Qwen3.5-9B-UD-Q4_K_XL.gguf \
  --chat-template-file templates/qwen35.jinja \
  --temp 0.7 \
  --top-p 0.9 \
  --top-k 20 \
  -c 64000 \
  -ngl 20 \
  --host 127.0.0.1 \
  --port 8080

建立mac系统的所谓systemd,真是臭长

sudo vi /Library/LaunchDaemons/com.openclaw.llama-server.plist

# 内容如下,注意替换掉 YOUR_USERNAME
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">

<plist version="1.0">
<dict>

<key>Label</key>
<string>com.openclaw.llama-server</string>

<key>UserName</key>
<string>YOUR_USERNAME</string>

<key>ProgramArguments</key>
<array>
    <string>/Users/YOUR_USERNAME/llama.cpp/llama-server</string>

    <string>-m</string>
    <string>/Users/YOUR_USERNAME/models/Qwen3.5-9B-UD-Q4_K_XL.gguf</string>

    <string>--chat-template-file</string>
    <string>/Users/YOUR_USERNAME/templates/qwen35.jinja</string>

    <string>--temp</string>
    <string>0.7</string>

    <string>--top-p</string>
    <string>0.9</string>

    <string>--top-k</string>
    <string>20</string>

    <string>-c</string>
    <string>64000</string>

    <string>-ngl</string>
    <string>20</string>

    <string>--host</string>
    <string>127.0.0.1</string>

    <string>--port</string>
    <string>8080</string>
</array>

<key>WorkingDirectory</key>
<string>/Users/YOUR_USERNAME</string>

<key>RunAtLoad</key>
<true/>

<key>KeepAlive</key>
<true/>

<key>StandardOutPath</key>
<string>/tmp/llama-server.log</string>

<key>StandardErrorPath</key>
<string>/tmp/llama-server.err</string>

</dict>
</plist>

启动plist

sudo chown root:wheel /Library/LaunchDaemons/com.openclaw.llama-server.plist && \
sudo chmod 644 /Library/LaunchDaemons/com.openclaw.llama-server.plist && \
sudo launchctl bootstrap system /Library/LaunchDaemons/com.openclaw.llama-server.plist

编辑openclaw,在.openclaw/openclaw.json文件中新增models配置块

{
  "models": {
    "providers": {
      "local": {
        "baseUrl": "http://127.0.0.1:8080/v1",
        "apiKey": "sk-local",
        "api": "openai-completions",
        "models": [
          {
            "id": "qwen3-9b",
            "name": "Qwen3.5 9B Local",
            "contextWindow": 64000,
            "maxTokens": 8192
          }
        ]
      }
    /* REMOVE THIS COMMENT */
    /* you may add additional providers, like anthropic here */ 
    }
  }
}

设置agents的缺省模型:

"agents": {
    "defaults": {
      "model": {
        "primary": "local/qwen3-9b"
      },
      "models": {
        "local/qwen3-9b": {}
      }
 }

用龙虾测试一下:

openclaw config validate

openclaw gateway restart

openclaw models list --provider local

openclaw infer model run \
  --model local/qwen3-9b \
  --prompt "Reply with exactly: pong" \
  --json

做个skill来试试

mkdir -p ~/.openclaw/workspace/skills/python-calc

cat << 'EOF' > ~/.openclaw/workspace/skills/python-calc/SKILL.md
---
name: python-calc
description: A tool that evaluates mathematical expressions by executing a Python one-liner.
version: 1.0.0
---
## Instructions
1. Extract the exact mathematical expression the user wants to calculate.
2. Use your built-in shell tool to run this exact command, replacing `<expr>` with the expression: `python3 -c "print(<expr>)"`
3. Wait for the shell tool to return the stdout output.
4. You MUST generate a final conversational response to the user containing the exact numeric result returned by the script.
EOF

调用:

openclaw agent --local --agent main --verbose on --thinking high --message \
"Use the python-calc skill to calculate 8664 multiplied by 222. 
Do not use skill_workshop. Tell me the final answer."