Contacts
Follow us:
My Account
Close

Contacts

USA, Nevada - Las Vegas

+1 702 4 18 18 18

Best LLM Models

Best LLM Models

Best LLM Models

Model Name Size / Type Description / Key Strengths Context Window (approx.) Notes / Variants
all-minilm-l6-v2-vllm Small embedding Sentence-transformers model mapping sentences & paragraphs to 384-dim vectors Embedding model
deepseek-r1-distill-llama Distilled LLaMA Fast, optimized for real-world tasks Medium-High Distilled version
deepseek-v3.2-vllm Large Improved efficiency, reasoning, DSA, agentic capabilities High
devstral-small 24B Agentic coding LLM fine-tuned from Mistral-Small-3.1 128K Original version
devstral-small-2 24B (FP8) Agentic SWE tasks, codebase tooling, SWE-bench; supports vision 256K Top recommendation for coding
embeddinggemma Embedding State-of-the-art text embedding from Google DeepMind
functiongemma 270M Offline function-calling agents on small devices Low-Medium Very lightweight
gemma3 Small / Medium Google’s latest Gemma: small yet strong for chat & generation Medium-High Includes QAT variant
gemma3n Efficient multimodal Text, image, audio, video on low-resource devices Medium Multimodal edge
gemma4 2B–31B (incl. 26B MoE) Multimodal, optimized for reasoning, coding, long context High Multiple sizes (E2B, E4B, 31B, 26B A4B)
glm-4.7-flash ~30B-A3B MoE Balances strong performance with efficient deployment High Flash variant
glm-5-safetensors 744B MoE (40B active) Reasoning, coding, agentic tasks (FP8) High Very large MoE
gpt-oss Varies OpenAI’s open-weight models for powerful reasoning & agentic tasks High Includes safeguard variant
granite-4.0-h-micro 3B Long-context instruct with RL, tool calling, enterprise readiness Long Micro variant
granite-4.0-h-nano Lightweight Lightweight instruct via SFT, RL, merging Medium Nano variant
granite-4.0-h-small 32B Long-context instruct with RL, tool use, enterprise optimization Long Small variant
granite-4.0-h-tiny 7B Long-context instruct with RL, tool use, enterprise optimization Long Tiny variant
granite-4.0-micro 3B Long-context instruct with RL, tool use, enterprise optimization Long
granite-4.0-nano Lightweight Lightweight instruct via SFT, RL, merging Medium
granite-docling Multimodal Efficient document conversion Medium Document-focused
granite-embedding-multilingual 278M Encoder-only multilingual embedding (XLM-RoBERTa style) Multilingual embedding
kimi-k2 Varies Open-source agent with deep reasoning, stable tool use, fast INT4 256K Thinking model
llama3.2 Medium Reliable for coding, chat, Q&A High
llama3.3 Medium-Large Improved reasoning and generation quality High Newest LLaMA 3
magistral-small-3.2 24B multimodal Tuned for accuracy, tool use, fewer repeats High Mistral AI
ministral3 ~24B performance Compact vision-enabled model optimized for local edge use Medium-High Vision-enabled
mistral Efficient Top-tier performance and fast inference High General purpose
moondream2 Small VLM Fast visual language model for image interpretation via text prompts Medium Vision-focused
nomic-embed-text-v1.5 Embedding Open-source, fully auditable text embedding model Auditable embedding
phi4 Compact Surprisingly capable at reasoning and code Medium Microsoft compact
qwen3 Large Top-tier coding, math, reasoning, language tasks Very High Latest Qwen LLM
qwen3-coder Coding series Dedicated coding agent models High
qwen3-coder-next 80B MoE (3B active) Advanced coding agent for code generation, debugging, agentic tasks 256K Highly efficient MoE
qwen3-embedding Embedding Multilingual for retrieval, ranking, clustering Text embedding
qwen3-reranker Reranker Multilingual reranking for text retrieval (119 languages) Reranking model
qwen3-vl Advanced multimodal Major gains in text, vision, video, reasoning High Vision-language
qwen3.5 397B MoE (17B active) Multimodal with 262K context, 201 languages, reasoning/coding/agents 262K Flagship MoE
qwq Experimental lean Fast, mysterious Qwen variant Medium-High Experimental
seed-oss Varies Reasoning, agent, general capabilities, developer-friendly High
smollm2 Tiny Built for speed, edge devices, local development (includes SmolVLM multimodal) Medium Lightweight
smollm3 3.1B Efficient on-device use with strong chat performance Medium On-device
smolvlm Lightweight multimodal Video, image, text analysis optimized for devices Medium Multimodal edge
snowflake-arctic-embed-l-v2-vllm Embedding Boosts multilingual retrieval and efficiency Multilingual embedding
stable-diffusion Diffusion Image generation (base latent diffusion + refiner) Image gen (not LLM)