NVIDIA NIM (Inference Microservices)

Type: Platform Tags: NVIDIA, inference, microservices, LLM, AI, REST API, containers, production deployment, OpenAI-compatible Related: NIM-for-Large-Language-Models, NIM-for-LLM-Benchmarking-Guide, NVIDIA-AIPerf, NVIDIA-GenAI-Perf, NVIDIA-NIM-Operator, NVIDIA-NIM-on-GKE, NVIDIA-NIM-on-WSL2, Red-Hat-AI-Factory-with-NVIDIA, NeMo-Retriever-Embedding-NIM, Llama-Nemotron-Embed-1B-v2, Llama-Nemotron-Embed-VL-1B-v2, NeMo-Retriever-Reranking-NIM, Llama-Nemotron-Rerank-1B-v2, Llama-Nemotron-Rerank-VL-1B-v2, NIM-for-Image-OCR, NIM-for-Object-Detection, NIM-for-NV-CLIP, NIM-for-Cosmos-WFM, NIM-for-Cosmos-Embed1, NIM-for-Earth-2-CorrDiff, NIM-for-Earth-2-FourCastNet, NIM-for-DoMINO-Automotive-Aero, NIM-for-Vision-Language-Models, Ising-Calibration-1-35B-A3B, Nemotron-3-Nano, Nemotron-3-Super, Nemotron-3-Nano-Omni, Nemotron-Parse, NIM-for-Visual-Generative-AI, NVIDIA-Speech-NIM-Microservices, NVIDIA-ASR-NIM, Nemotron-ASR-Streaming, Nemotron-3-VoiceChat, NVIDIA-TTS-NIM, NVIDIA-NMT-NIM, NVIDIA-Background-Noise-Removal-NIM, NIM-for-Maxine-Studio-Voice, NIM-for-Maxine-Audio2Face-2D, NIM-for-Maxine-Eye-Contact, NIM-for-Maxine-Active-Speaker-Detection, NIM-for-Audio2Face-3D, NVIDIA-NemoGuard-NIMs, Nemotron-3-Content-Safety, Nemotron-Content-Safety-Reasoning-4B-Experimental-NIM, Llama-3.1-Nemotron-Safety-Guard-8B-NIM, Llama-3.1-NemoGuard-8B-TopicControl-NIM, Llama-3.1-NemoGuard-8B-ContentSafety-NIM, NVIDIA-NemoGuard-JailbreakDetect-NIM, NIM-for-Multimodal-Safety, NIM-for-MAISI, NIM-for-VISTA-3D, NIM-for-AlphaFold2, NIM-for-AlphaFold2-Multimer, NIM-for-OpenFold2, NIM-for-OpenFold3, NIM-for-Boltz2, NIM-for-Evo-2, NIM-for-MSA-Search, NIM-for-ProteinMPNN, NIM-for-RFdiffusion, NIM-for-MolMIM, NIM-for-GenMol, NIM-for-DiffDock, NIM-for-ALCHEMI-Batched-Geometry-Relaxation, NIM-for-ALCHEMI-Batched-Molecular-Dynamics, NVIDIA-AI-Enterprise, NGC, NVIDIA-NGC-Catalog, NVIDIA-AI-Blueprints, NVIDIA-RAG-Blueprint, NVIDIA-AI-Q-Blueprint, NVIDIA-Data-Flywheel-Blueprint, NVIDIA-Video-Search-and-Summarization-Blueprint, NVIDIA-Tokkio-Digital-Human-Blueprint, NVIDIA-AI-Data-Platform, NVIDIA-API-Documentation, LLM-Inference-Quick-Start-Recipes, NVIDIA-Brev, NVIDIA-Cloud-Accelerator-NCX, TensorRT-LLM, Triton-Inference-Server, NVIDIA-NeMo, NeMo-Platform, NeMo-Data-Designer, NeMo-Customizer, NeMo-Evaluator, NeMo-Auditor, NeMo-AutoModel, NeMo-RL, NeMo-Megatron-Bridge, NeMo-Export-Deploy, NeMo-Retriever, NVIDIA-BioNeMo, NVIDIA-Cosmos, Earth-2, PhysicsNeMo, NVLM, NVIDIA-Riva, NVIDIA-Maxine, NVIDIA-ACE, NVIDIA-Clara, NVIDIA-MONAI-Toolkit, NVIDIA-Dynamo, NVIDIA-CMX, NIXL, Nemotron, Nsight-Copilot, ComputeEval Sources: https://docs.nvidia.com/nim/index.html, https://docs.nvidia.com/nim-operator/latest/index.html, https://docs.nvidia.com/nim/large-language-models/latest/about-nim-llm/overview.html, https://docs.nvidia.com/nim/benchmarking/llm/latest/overview.html, https://docs.nvidia.com/nim/benchmarking/llm/latest/step-by-step.html, https://docs.nvidia.com/aiperf/welcome-to-ai-perf-documentation, https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/overview.html, https://docs.nvidia.com/nim/nemo-retriever/text-reranking/latest/overview.html, https://docs.nvidia.com/nim/ingestion/image-ocr/latest/overview.html, https://docs.nvidia.com/nim/ingestion/object-detection/latest/overview.html, https://docs.nvidia.com/nim/nvclip/latest/introduction.html, https://docs.nvidia.com/nim/cosmos/latest/introduction.html, https://docs.nvidia.com/nim/cosmos-embed1/latest/introduction.html, https://docs.nvidia.com/nim/earth-2/corrdiff/latest/overview.html, https://docs.nvidia.com/nim/earth-2/fourcastnet/latest/overview.html, https://docs.nvidia.com/nim/physicsnemo/domino-automotive-aero/latest/overview.html, https://docs.nvidia.com/nim/vision-language-models/latest/introduction.html, https://docs.nvidia.com/nim/vision-language-models/latest/release-notes.html, https://docs.nvidia.com/nim/vision-language-models/latest/examples/nemotron-3-nano-omni-30b-a3b-reasoning/api.html, https://docs.nvidia.com/nim/vision-language-models/latest/examples/nemotron-parse/api.html, https://docs.nvidia.com/nim/visual-genai/latest/overview.html, https://docs.nvidia.com/nim/speech/latest/index.html, https://docs.nvidia.com/nim/speech/latest/asr/index.html, https://docs.nvidia.com/nim/speech/latest/tts/index.html, https://docs.nvidia.com/nim/speech/latest/nmt/index.html, https://docs.nvidia.com/nim/maxine/bnr/latest/overview.html, https://docs.nvidia.com/nim/maxine/studio-voice/latest/overview.html, https://docs.nvidia.com/nim/maxine/audio2face-2d/latest/overview.html, https://docs.nvidia.com/nim/maxine/eye-contact/latest/overview.html, https://docs.nvidia.com/nim/maxine/active-speaker-detection/latest/overview.html, https://docs.nvidia.com/nim/digital-human/a2f-3d/latest/index.html, https://docs.nvidia.com/rag/latest/, https://docs.nvidia.com/vss/latest/, https://docs.nvidia.com/ace/tokkio/latest/overview/overview.html, https://docs.nvidia.com/nim/llama-3-1-nemotron-safety-guard-8b/latest/index.html, https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-topiccontrol/latest/index.html, https://docs.nvidia.com/nim/llama-3-1-nemoguard-8b-contentsafety/latest/index.html, https://docs.nvidia.com/nim/nemoguard-jailbreakdetect/latest/index.html, https://docs.nvidia.com/nim/multimodal-safety/latest/overview.html, https://docs.nvidia.com/nim/medical/maisi/latest/overview.html, https://docs.nvidia.com/nim/medical/vista3d/latest/overview.html, https://docs.nvidia.com/nim/bionemo/alphafold2/latest/overview.html, https://docs.nvidia.com/nim/bionemo/openfold3/latest/overview.html, https://docs.nvidia.com/nim/bionemo/boltz2/latest/overview.html, https://docs.nvidia.com/nim/bionemo/evo2/latest/overview.html, https://docs.nvidia.com/nim/bionemo/msa-search/latest/overview.html, https://docs.nvidia.com/nim/bionemo/proteinmpnn/latest/overview.html, https://docs.nvidia.com/nim/bionemo/rfdiffusion/latest/overview.html, https://docs.nvidia.com/nim/bionemo/molmim/latest/overview.html, https://docs.nvidia.com/nim/bionemo/genmol/latest/overview.html, https://docs.nvidia.com/nim/bionemo/diffdock/latest/overview.html, https://docs.nvidia.com/nim/alchemi/alchemi-bgr/latest/overview.html, https://docs.nvidia.com/nim/alchemi/alchemi-bmd/latest/overview.html, https://build.nvidia.com/models, https://build.nvidia.com/blueprints, https://build.nvidia.com/nvidia/nemotron-3-super-120b-a12b/modelcard, https://build.nvidia.com/nvidia/nemotron-3-nano-30b-a3b/modelcard, https://docs.nvidia.com/nemo/microservices/latest/index.html, https://docs.nvidia.com/nemo/microservices/latest/data-designer/index.html, https://docs.nvidia.com/nemo/microservices/latest/customizer/index.html, https://docs.nvidia.com/nemo/microservices/latest/evaluator/index.html, https://docs.nvidia.com/nemo/microservices/latest/audit/index.html, https://docs.nvidia.com/nemo/automodel/latest/index.html, https://docs.nvidia.com/nemo/rl/latest/about/overview.html, https://docs.nvidia.com/nemo/megatron-bridge/latest/index.html, https://docs.nvidia.com/nemo/export-deploy/latest/index.html, https://docs.nvidia.com/ai-enterprise/deployment/red-hat-ai-factory/latest/deploy-ai-workloads-nim-operator.html Last Updated: 2026-04-29

Summary

NVIDIA NIM (NVIDIA Inference Microservices) is NVIDIA’s containerized inference microservice layer for deploying foundation models on clouds, data centers, and self-hosted GPU infrastructure. Current NVIDIA docs position NIM as part of NVIDIA-AI-Enterprise, with production runtimes, ongoing security updates, build.nvidia.com API access, and integration into the broader NVIDIA-NeMo agent lifecycle stack.

Detail

Purpose

NIM packages model-specific inference runtimes, APIs, containers, and deployment guidance so teams can move from model selection to production serving without rebuilding the entire inference stack. The same model capability may appear as a hosted build.nvidia.com API, a downloadable NIM, an NGC artifact, or a Kubernetes deployment.

Current scope

Representative model families

NVIDIA context

NIM is the practical deployment boundary between NVIDIA’s model catalog and production applications. It links model development in NVIDIA-NeMo, inference optimization in TensorRT-LLM, serving in Triton-Inference-Server, catalog distribution in NGC, and enterprise support in NVIDIA-AI-Enterprise.

Connections

Source Excerpts

  • NVIDIA NIM docs describe NIM microservices as part of NVIDIA AI Enterprise for deploying foundation models on cloud or data center infrastructure.
  • build.nvidia.com lists NVIDIA-published models, downloadable artifacts, free endpoints, and NIM API experiences.

Resources