我们蒸馏模型的过程中会要到 huggingface.co 下载底模和数据文件,有必要单独拿出来说一下
安装
uv venv --python 3.12
source .venv/bin/activate
uv pip install huggingface_hub hf_transfer
hf auth login --token hf_xxxxx
#使用国内加速站下载
HF_ENDPOINT=https://hf-mirror.com hf download
# 下载所有文件,直接下载
hf download google/gemma-4-1b-it --local-dir ./gemma-4-1b-it
hf download google/translategemma-4b-it --local-dir ./translategemma-4b
# 下载多个文件,不指定下载目录
# 文件会放到
# ~/.cache/huggingface/hub/models--lmstudio-community--Qwen3.5-9B-GGUF/snapshots/1379f25c6b505a3fc737bd7818cb09389cf807c1/Qwen3.5-9B-Q4_K_M.gguf \
# ~/.cache/huggingface/hub/models--lmstudio-community--Qwen3.5-9B-GGUF/snapshots/1379f25c6b505a3fc737bd7818cb09389cf807c1/mmproj-Qwen3.5-9B-BF16.gguf \
hf download lmstudio-community/Qwen3.5-9B-GGUF Qwen3.5-9B-Q4_K_M.gguf mmproj-Qwen3.5-9B-BF16.gguf --revision main
# 下载多个文件,指定下载目录
uv tool run hf download facebook/m2m100_418M config.json vocab.json sentencepiece.bpe.model special_tokens_map.json tokenizer_config.json pytorch_model.bin --local-dir Translate/m2m100
# 下载单个文件,指定下载目录
hf download Jackrong/Qwopus3.5-9B-v3-GGUF --local-dir Jackrong/Qwopus3.5-9B-v3-GGUF Qwopus3.5-9B-v3.Q4_K_M.gguf
# 下载无限制的gemma4
uv tool run hf download HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf mmproj-Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-f16.gguf
# 下载所有Q4和多模
hf download unsloth/gemma-4-26B-A4B-it-GGUF \
--local-dir unsloth/gemma-4-26B-A4B-it-GGUF \
--include "*mmproj-BF16*" \
--include "*UD-Q4_K_XL*" # 动态 2 位请使用 "*UD-Q2_K_XL*"
那一些非常好的训练数据集:
https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered
opus 4.6的数据
https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x
opus 4.5的数据
https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x
qwen 3.5的数据