Available Models

Model Hub

Explore all available models and view detailed documentation and API usage.

API Call Method

Specify model ID in the model field

BAAI BGE-M3 多語嵌入模型，採用 dual-encoder 架構，支援超過 100 種語言的跨語言檢索與語義匹配。輸出 1024 維向量，適合向量資料庫、RAG 檢索pipeline、相似度搜尋與多語內容索引。規格：1024 維、multi-lingual、Cross-lingual、可搭配 bge-m3-reranker 二次排序提升品質。

bge-m3-embedding

多語

bge-m3-reranker

BAAI BGE-Reranker v2 M3 跨語言重排序模型（cross-encoder），在候選結果基礎上進行深度語義打分，顯著提升排序品質。適合長文本段落匹配、精準查詢與 RAG pipeline 最後一步的重新排序。規格：Cross-encoder、Multi-lingual、支援 query-doc 配對打分、可與 bge-m3-embedding 構成完整檢索流程：Embedding → 語意搜尋 → Rerank。

bge-m3-reranker

Cross-encoder

DiffusionGemma-26B

首款擴散架構語言模型，單 GPU 702 tok/s，262K 上下文，支援 Tool Calling

diffusiongemma-26b

262K Tool Calling

Gemma-4-31B-QAT

Gemma-4-31B QAT w4a16 量化版本，支援 Function Calling、Vision 及 262K 上下文，Prefix Caching 已啟用

vibecode

262K Vision Function Calling Prefix Caching

Gemma-4-31B-QAT

Gemma-4-31B QAT w4a16 量化版本，支援 Function Calling 與 Vision，262K 上下文，單 GPU 高效率推理

coder

262K Vision Function Calling

gpt-oss-120b (Mistral Small 4)

Mistral Small 4 119B MoE model. Legacy alias gpt-oss-120b, now pointing to Mistral Small 4 backend. Supports 262K context, Function Calling and Vision.

gpt-oss-120b

262K Vision Function Calling MoE

Llama-4-Scout-17B-16E-Instruct-FP8

NVIDIA 發布的 Llama-4 Scout 多模態推理模型，17B 活躍專家、16 專家路由、256K 超長上下文。支援文字與圖像理解，適合長文件分析、多輪對話、Agent 規劃與跨文件推理。規格：FP8 量化、256K 上下文、16 專家路由、Instruct-tuned。

llama4scout

256K FP8

Mistral Small 4 (119B)

Mistral Small 4 119B 參數 MoE 模型，採用 NVFP4 量化，支援 262K 上下文、Function Calling 與推理模式。已啟用 Prefix Caching 加速重複前綴請求的處理速度。自 2026 年 3 月起提供服務。

mistral-small-4

262K Function Calling NVFP4 MoE Prefix Caching

Mistral Small 4 (119B)

Mistral Small 4 119B MoE model with NVFP4 quantization, supporting 262K context, Function Calling and Vision. Alias: mistral-medium-35

mistral-medium-35

262K Vision Function Calling NVFP4 MoE

Nemotron 3 Ultra 550B

NVIDIA 旗艦級 MoE 推理模型，550B 總參數、55B 活躍參數（A55B），採用 NVFP4 量化在保持高品質的同時最大化吞吐量。512 個專家中選路 64 個活躍專家，支援 262K 超長上下文，適合高品質推理、長文件處理、RAG 檢索增強、複雜指令遵循與高併發場景。規格：550B MoE（A55B）、NVFP4 量化、262144 上下文、Prefix Caching、Function Calling、4x B200 GPU。Day-0 支援。

nemotron-3-ultra

262K Function Calling NVFP4 MoE Prefix Caching

vibe

高效能通用對話模型，以 NVIDIA Nemotron-3-Super-120B 為骨幹，採用 NVFP4 量化在保持品質的同時最大化 throughput。適合大量文字生成、客服機器人、內容摘要與多輪對話。規格：120B 參數、NVFP4 量化、吞吐量約 150 tokens/s、多語言支援。

vibe

NVFP4 多語

VibeVoice ASR

Speech recognition service supporting Chinese and English transcription with speaker identification. Based on Whisper architecture.

vibevoice-asr

whisper-large-v3

OpenAI Whisper Large v3 加速版，專精多語言語音辨識與長音訊處理，涵蓋中文方言、專業術語與嘈雜環境。適合會議記錄、podcast 字幕生成、語音分析與無障礙服務。規格：多語言支援、強制時間戳（timestamps）、Word-level timestamps 可選、翻譯模式。

whisper-large-v3

多語

No matching models found. Try a different keyword.