27B parameter model that is a more advanced version of Gemma3 27B for roleplay/storytelling.
Context: 32.8K
$0.30/1M / $0.30/1M
L3.3 finetune for roleplaying.
Context: 65.5K
$0.31/1M / $0.31/1M
L3.3 finetune for roleplaying - updated v1.1 with improved reasoning.
Context: 131.1K
$0.31/1M / $0.31/1M
32-B parameter reasoning model from DeepCogito (Qwen backbone) - strong general reasoning & coding at low price.
Context: 128K
$1.80/1M / $1.80/1M
Damascus-R1 builds upon Nevoria foundation with DeepSeek R1 Distill base: Hydroblated-R1-V3.
Context: 65.5K
$0.31/1M / $0.31/1M
Code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B. 60.6% Pass@1 on LiveCodeBench v5.
Context: 128K
$0.15/1M / $0.15/1M
24-B parameter Mistral model fine-tuned by NousResearch for balanced reasoning & creativity.
Context: 128K
$0.30/1M / $0.30/1M
DeepSeek's March 2025 V3 model, optimized for general-purpose tasks. Quantized at FP8.
Context: 128K
$0.25/1M / $0.70/1M
DeepSeek's R1 thinking model, scoring very well on all benchmarks at low cost.
Context: 128K
$0.40/1M / $1.70/1M
DeepSeek-R1 is now live and open source, rivaling OpenAI's o1.
Context: 64K
$0.40/1M / $1.70/1M
Hybrid model supporting thinking and non-thinking modes. Better at tool calling and agent tasks. Quantized at FP8.
Context: 128K
$0.20/1M / $0.70/1M
Latest update with language consistency improvements, stronger Code/Search Agents. FP8.
Context: 128K
$0.25/1M / $0.70/1M
Latest flagship model by Deepseek. Far better performance especially on longer contexts. FP8.
Context: 163.8K
$0.28/1M / $0.42/1M
Thinking version of Deepseek's latest flagship model. FP8.
Context: 163.8K
$0.28/1M / $0.42/1M
The new (May 28th) Deepseek R1 model with enhanced reasoning.
Context: 128K
$0.40/1M / $1.70/1M
Distilled version of R1 0528. Way cheaper, way faster, yet still extremely performant.
Context: 128K
$0.10/1M / $0.20/1M
Uncensored version of the Deepseek R1 Llama 70B model.
Context: 16.4K
$0.70/1M / $0.70/1M
Uncensored version of the Deepseek R1 Qwen 32B model.
Context: 16.4K
$0.70/1M / $0.70/1M
Most uncensored model yet, built on Qwen's 72b model.
Context: 8.2K
$0.31/1M / $0.31/1M
RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct.
Context: 16.4K
$2.01/1M / $2.01/1M
Latest flagship foundation model, purpose-built for agent-based applications. MoE with 355B total / 32B active params.
Context: 128K
$0.20/1M / $0.20/1M
106B total / 12B active parameter model for reasoning, coding, and agentic capabilities.
Context: 128K
$0.10/1M / $0.10/1M
Latest GLM series chat model with strong general performance. Quantized at FP8.
Context: 200K
$0.40/1M / $1.50/1M
Thinking version of the latest GLM series. Quantized at FP8.
Context: 200K
$0.40/1M / $1.50/1M
117B MoE model (5.1B active) optimized for reasoning, agentic, and production use. Runs on single H100.
Context: 128K
$0.05/1M / $0.25/1M
21B parameter MoE model (3.6B active) optimized for lower-latency inference.
Context: 128K
$0.04/1M / $0.15/1M
Gemma 3 with 128K context window and multilingual support in 140+ languages.
Context: 128K
$0.30/1M / $0.30/1M
Advanced reasoning model built on Llama-3.1-405B with hybrid thinking modes.
Context: 128K
$0.30/1M / $1.20/1M
Efficient reasoning model based on Llama-3.1-70B with strong performance in math and code.
Context: 128K
$0.20/1M / $0.40/1M
1T total / 32B active MoE model with exceptional coding and agent capabilities.
Context: 256K
$0.40/1M / $2.00/1M
Abliterated (restrictions removed) version of Llama 3.1 70b.
Context: 8.2K
$0.70/1M / $0.70/1M
70B parameter model from SAO10K based on Llama 3.1 70B for quality text generation.
Context: 20.5K
$0.31/1M / $0.36/1M
Optimized for multilingual dialogue, outperforms many open and closed chat models.
Context: 131.1K
$0.05/1M / $0.23/1M
17B active parameter model with 128 experts. Best multimodal model in class, beats GPT-4o.
Context: 1M
$0.18/1M / $0.80/1M
17B active with 16 experts. Best multimodal in class with 10M context window.
Context: 328K
$0.09/1M / $0.46/1M
MiniMax's flagship model with a 1M token context window.
Context: 1M
$0.14/1M / $1.12/1M
Enhanced reasoning and strong general performance. Optimized for coding and agentic workflows.
Context: 200K
$0.17/1M / $1.53/1M
Nvidia's latest Llama fine-tune optimized for instruction following.
Context: 16.4K
$0.36/1M / $0.41/1M
Small multilingual model by Microsoft.
Context: 128K
$0.17/1M / $0.68/1M
235B model with 22B active parameters. Supports thinking toggle with /think command.
Context: 41K
$0.30/1M / $0.50/1M
32B model supporting both thinking and non-thinking modes.
Context: 41K
$0.10/1M / $0.30/1M
480B total (35B active) coding specialist. Performs similar to Claude 4 Sonnet in coding.
Context: 262K
$0.13/1M / $0.50/1M
Experimental reasoning model. Great at coding and math.
Context: 32.8K
$0.20/1M / $0.20/1M
Great multilingual support, strong at mathematics and coding, supports roleplay and chatbots.
Context: 131.1K
$0.36/1M / $0.41/1M