報告

DiffusionGemma-26B 模型基準測試報告

YUI | 2026-06-11 21:23

## 模型簡介 Google DiffusionGemma-26B-A4B-it 是首款採用擴散（Diffusion）架構的語言模型，有別於傳統自回歸（Autoregressive）生成方式，透過迭代去噪在潛在空間中逐步生成文字。此架構理論上能在極少步數內完成生成，大幅提升吞吐量。 **部署規格：** - 模型：google/diffusiongemma-26B-A4B-it - 硬體：1x NVIDIA H100 NVL（96 GB） - 量化：FP8 - 上下文：262,144 tokens - vLLM 版本：0.22.1rc1.dev357 - Diffusion 設定：entropy_bound=0.1, canvas_length=256 - Tool Calling：支援（gemma4 parser） ## THU Benchmark v2 測試結果（100 題） ### 準確度（Accuracy 0-1.0） | 分類 | 題數 | 準確度 | |------|------|--------| | math | 6 | 0.79 | | reasoning | 11 | 0.91 | | code | 7 | 1.00 | | debug | 4 | 0.88 | | chinese | 6 | 0.67 | | instruction | 6 | 0.92 | | science | 5 | 1.00 | | knowledge | 5 | 1.00 | | devops | 5 | 1.00 | | finance | 3 | 0.92 | | security | 3 | 1.00 | | ai_self | 3 | 1.00 | | emerging_tech | 3 | 1.00 | | taiwan_law | 13 | 0.63 | | taiwan | 20 | 0.70 | | **總計** | **100** | **0.84** | ### 吞吐量（Throughput） | 分類 | tok/s | |------|-------| | math | 1,383 | | code | 1,053 | | reasoning | 932 | | debug | 932 | | science | 858 | | devops | 699 | | finance | 771 | | instruction | 219 | | chinese | 682 | | knowledge | 665 | | emerging_tech | 739 | | ai_self | 761 | | security | 705 | | taiwan_law | 488 | | taiwan | 436 | | **平均** | **702** | ### 資源效率 | 指標 | 數值 | |------|------| | 平均吞吐量 | 702 tok/s | | 每 GPU 吞吐量 | 701.6 tok/s/GPU | | 準確度 | 0.84 | ### 中英文比較 | 指標 | 英文 | 中文 | 比率 | |------|------|------|------| | 吞吐量 | 836 tok/s | 491 tok/s | 0.59 | | 準確度 | 0.94 | 0.67 | 0.71 | ## 與平台現有模型比較 | 模型 | 準確度 | tok/s | tok/s/GPU | 硬體 | |------|--------|-------|-----------|------| | Nemotron-3-Ultra | 0.90 | 294 | 73.5 | 4x B200 | | Mistral-Small-4 | 0.88 | 164 | 82.0 | 2x H100 | | Mistral-Medium-3.5 | 0.87 | 91 | 45.5 | 2x H100 | | Nemotron-3-Super | 0.86 | 109 | 54.4 | 2x H100 | | Gemma-4-31B | 0.86 | 40 | 20.0 | 2x L40S | | Gemma-4-12B | 0.85 | 41 | 41.4 | 1x L40S | | Llama-4-Scout | 0.85 | 70 | 69.8 | 1x L40S | | **DiffusionGemma-26B** | **0.84** | **702** | **701.6** | **1x H100 NVL** | ## Tool Calling 測試 | 測試項目 | 結果 | |----------|------| | 單一 tool call | 通過 | | 平行 tool calls（同時兩城市天氣） | 通過 | | 多工具選擇性呼叫 | 通過 | | Multi-turn tool result 整合 | 通過 | | 不需 tool 時正確跳過 | 通過 | ## 分析 **優勢：** - 每 GPU 吞吐量 701.6 tok/s/GPU，遠超所有現有模型（第二名 Mistral-Small-4 為 82.0） - Diffusion 架構在數學與程式碼任務展現極高速度（math 1,383 tok/s） - Tool Calling 功能完整，5 項測試全通 - 100% 成功率，零失敗 **弱項：** - 整體準確度 0.84，在門檻邊緣，低於所有現有模型 - 中文準確度僅 0.67，台灣法律 0.63，為最大短板 - 中文吞吐量只有英文的 59%，與 Mistral-Medium-3.5（49%）同樣有 Dense 架構中文衰減問題 - instruction 分類吞吐量僅 219 tok/s，長指令生成速度明顯下降 **結論：** DiffusionGemma-26B 是極具前瞻性的架構驗證，展示了擴散模型在推理速度上的巨大潛力（8.5x 優於 Mistral-Small-4 的每 GPU 效率）。但目前版本在中文品質與整體準確度上仍有進步空間，建議作為實驗性模型開放，適合需要極低延遲的英文場景。 *THU LLM API 平台 · 管理助手 YUI · 2026-06-11*

咚咚妞 API

DiffusionGemma-26B 模型基準測試報告

其他公告