ID: soob3123/amoral-gemma3-27B-v2
Context: 32.8K tokens
27B parameter model that is a more advanced version of Gemma3 27B for roleplay/storytelling.
Input: $0.30/1MOutput: $0.30/1M
ID: TheDrummer/Anubis-70B-v1
Context: 65.5K tokens
L3.3 finetune for roleplaying.
Input: $0.31/1MOutput: $0.31/1M
ID: TheDrummer/Anubis-70B-v1.1
Context: 131.1K tokens
L3.3 finetune for roleplaying – updated v1.1 with improved reasoning.
Input: $0.31/1MOutput: $0.31/1M
ID: deepcogito/cogito-v1-preview-qwen-32B
Context: 128K tokens
32-B parameter reasoning model from DeepCogito (Qwen backbone) – strong general reasoning & coding at low price.
Input: $1.80/1MOutput: $1.80/1M
ID: Steelskull/L3.3-Damascus-R1
Context: 65.5K tokens
Damascus-R1 builds upon Nevoria foundation with DeepSeek R1 Distill base: Hydroblated-R1-V3.
Input: $0.31/1MOutput: $0.31/1M
ID: agentica-org/DeepCoder-14B-Preview
Context: 128K tokens
Code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B. 60.6% Pass@1 on LiveCodeBench v5.
Input: $0.15/1MOutput: $0.15/1M
ID: NousResearch/DeepHermes-3-Mistral-24B-Preview
Context: 128K tokens
24-B parameter Mistral model fine-tuned by NousResearch for balanced reasoning & creativity.
Input: $0.30/1MOutput: $0.30/1M
ID: deepseek-v3-0324
Context: 128K tokens
DeepSeek's March 2025 V3 model, optimized for general-purpose tasks. Quantized at FP8.
Input: $0.25/1MOutput: $0.70/1M
ID: deepseek-r1
Context: 128K tokens
DeepSeek's R1 thinking model, scoring very well on all benchmarks at low cost.
Input: $0.40/1MOutput: $1.70/1M
ID: deepseek-reasoner
Context: 64K tokens
DeepSeek-R1 is now live and open source, rivaling OpenAI's o1.
Input: $0.40/1MOutput: $1.70/1M
ID: deepseek-ai/DeepSeek-V3.1
Context: 128K tokens
Hybrid model supporting thinking and non-thinking modes. Better at tool calling and agent tasks. Quantized at FP8.
Input: $0.20/1MOutput: $0.70/1M
ID: deepseek-ai/DeepSeek-V3.1-Terminus
Context: 128K tokens
Latest update with language consistency improvements, stronger Code/Search Agents. FP8.
Input: $0.25/1MOutput: $0.70/1M
ID: deepseek-ai/deepseek-v3.2-exp
Context: 163.8K tokens
Latest flagship model by Deepseek. Far better performance especially on longer contexts. FP8.
Input: $0.28/1MOutput: $0.42/1M
ID: deepseek-ai/deepseek-v3.2-exp-thinking
Context: 163.8K tokens
Thinking version of Deepseek's latest flagship model. FP8.
Input: $0.28/1MOutput: $0.42/1M
ID: deepseek-ai/DeepSeek-R1-0528
Context: 128K tokens
The new (May 28th) Deepseek R1 model with enhanced reasoning.
Input: $0.40/1MOutput: $1.70/1M
ID: deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
Context: 128K tokens
Distilled version of R1 0528. Way cheaper, way faster, yet still extremely performant.
Input: $0.10/1MOutput: $0.20/1M
ID: huihui-ai/DeepSeek-R1-Distill-Llama-70B-abliterated
Context: 16.4K tokens
Uncensored version of the Deepseek R1 Llama 70B model.
Input: $0.70/1MOutput: $0.70/1M
ID: huihui-ai/DeepSeek-R1-Distill-Qwen-32B-abliterated
Context: 16.4K tokens
Uncensored version of the Deepseek R1 Qwen 32B model.
Input: $0.70/1MOutput: $0.70/1M
ID: cognitivecomputations/dolphin-2.9.2-qwen2-72b
Context: 8.2K tokens
Most uncensored model yet, built on Qwen's 72b model.
Input: $0.31/1MOutput: $0.31/1M
ID: EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
Context: 16.4K tokens
RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct.
Input: $2.01/1MOutput: $2.01/1M
ID: zai-org/GLM-4.5-FP8
Context: 128K tokens
Latest flagship foundation model, purpose-built for agent-based applications. MoE with 355B total / 32B active params.
Input: $0.20/1MOutput: $0.20/1M
ID: zai-org/GLM-4.5-Air
Context: 128K tokens
106B total / 12B active parameter model for reasoning, coding, and agentic capabilities.
Input: $0.10/1MOutput: $0.10/1M
ID: z-ai/glm-4.6
Context: 200K tokens
Latest GLM series chat model with strong general performance. Quantized at FP8.
Input: $0.40/1MOutput: $1.50/1M
ID: z-ai/glm-4.6:thinking
Context: 200K tokens
Thinking version of the latest GLM series. Quantized at FP8.
Input: $0.40/1MOutput: $1.50/1M
ID: openai/gpt-oss-120b
Context: 128K tokens
117B MoE model (5.1B active) optimized for reasoning, agentic, and production use. Runs on single H100.
Input: $0.05/1MOutput: $0.25/1M
ID: openai/gpt-oss-20b
Context: 128K tokens
21B parameter MoE model (3.6B active) optimized for lower-latency inference.
Input: $0.04/1MOutput: $0.15/1M
ID: unsloth/gemma-3-27b-it
Context: 128K tokens
Gemma 3 with 128K context window and multilingual support in 140+ languages.
Input: $0.30/1MOutput: $0.30/1M
ID: nousresearch/hermes-4-405b
Context: 128K tokens
Advanced reasoning model built on Llama-3.1-405B with hybrid thinking modes.
Input: $0.30/1MOutput: $1.20/1M
ID: nousresearch/hermes-4-70b
Context: 128K tokens
Efficient reasoning model based on Llama-3.1-70B with strong performance in math and code.
Input: $0.20/1MOutput: $0.40/1M
ID: moonshotai/Kimi-K2-Instruct-0905
Context: 256K tokens
1T total / 32B active MoE model with exceptional coding and agent capabilities.
Input: $0.40/1MOutput: $2.00/1M
ID: failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
Context: 8.2K tokens
Abliterated (restrictions removed) version of Llama 3.1 70b.
Input: $0.70/1MOutput: $0.70/1M
ID: Sao10K/L3.1-70B-Euryale-v2.2
Context: 20.5K tokens
70B parameter model from SAO10K based on Llama 3.1 70B for quality text generation.
Input: $0.31/1MOutput: $0.36/1M
ID: meta-llama/llama-3.3-70b-instruct
Context: 131.1K tokens
Optimized for multilingual dialogue, outperforms many open and closed chat models.
Input: $0.05/1MOutput: $0.23/1M
ID: meta-llama/llama-4-maverick
Context: 1M tokens
17B active parameter model with 128 experts. Best multimodal model in class, beats GPT-4o.
Input: $0.18/1MOutput: $0.80/1M
ID: meta-llama/llama-4-scout
Context: 328K tokens
17B active with 16 experts. Best multimodal in class with 10M context window.
Input: $0.09/1MOutput: $0.46/1M
ID: minimax/minimax-01
Context: 1M tokens
MiniMax's flagship model with a 1M token context window.
Input: $0.14/1MOutput: $1.12/1M
ID: MiniMax-M2
Context: 200K tokens
Enhanced reasoning and strong general performance. Optimized for coding and agentic workflows.
Input: $0.17/1MOutput: $1.53/1M
ID: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Context: 16.4K tokens
Nvidia's latest Llama fine-tune optimized for instruction following.
Input: $0.36/1MOutput: $0.41/1M
ID: phi-4-mini-instruct
Context: 128K tokens
Small multilingual model by Microsoft.
Input: $0.17/1MOutput: $0.68/1M
ID: Qwen/Qwen3-235B-A22B
Context: 41K tokens
235B model with 22B active parameters. Supports thinking toggle with /think command.
Input: $0.30/1MOutput: $0.50/1M
ID: qwen/qwen3-32b
Context: 41K tokens
32B model supporting both thinking and non-thinking modes.
Input: $0.10/1MOutput: $0.30/1M
ID: qwen/qwen3-coder
Context: 262K tokens
480B total (35B active) coding specialist. Performs similar to Claude 4 Sonnet in coding.
Input: $0.13/1MOutput: $0.50/1M
ID: qwen/qwq-32b-preview
Context: 32.8K tokens
Experimental reasoning model. Great at coding and math.
Input: $0.20/1MOutput: $0.20/1M
ID: qwen/qwen-2.5-72b-instruct
Context: 131.1K tokens
Great multilingual support, strong at mathematics and coding, supports roleplay and chatbots.
Input: $0.36/1MOutput: $0.41/1M