NVIDIA Video Search and Summarization Blueprint

Type: Platform Tags: NVIDIA, AI Blueprint, VSS, video search, video summarization, vision agents, video analytics, Metropolis, DeepStream, Cosmos Related: NVIDIA-AI-Blueprints, NVIDIA-Metropolis, NVIDIA-DeepStream, NVIDIA-Cosmos, NIM-for-Cosmos-Embed1, NIM-for-Vision-Language-Models, NVIDIA-AI-Data-Platform, NeMo-Retriever, NVIDIA-NIM, Triton-Inference-Server, TensorRT, NVIDIA-Jetson-Platform, NVIDIA-AI-Enterprise Sources: https://docs.nvidia.com/vss/latest/, https://docs.nvidia.com/vss/latest/adding-workflows.html, https://docs.nvidia.com/vss/latest/vss-agent/VSS-Agent-Overview.html Last Updated: 2026-04-29

Summary

NVIDIA Video Search and Summarization (VSS) Blueprint is NVIDIA’s current blueprint for building vision agents and AI-powered video analytics applications. The latest VSS docs describe reference architectures with real-time vision microservices, VLMs, LLMs, video embeddings, downstream analytics, video search, long-video summarization, alert verification, and industry examples for smart city and warehouse operations.

Detail

Purpose

Video analytics systems need to process live or archived video, extract searchable visual features, reason over incidents, summarize long recordings, and answer user questions. VSS provides a NVIDIA-authored blueprint that composes real-time video intelligence, analytics microservices, middleware, and an agent layer into deployable workflows.

Current scope

  • Real-time video intelligence with object detection/tracking, real-time VLM captioning/anomaly detection, and real-time video embedding generation.
  • Downstream analytics for behavior analytics, alert verification, message brokers, video IO/storage, Elasticsearch-backed incident records, and observability.
  • Agent workflows for Q&A and report generation, alert verification, real-time alerts, natural-language video search, and long-video summarization.
  • VSS Agent modes for production blueprint deployments through Video Analytics MCP and direct video analysis for developer profiles.
  • Model adjacency to Nemotron reasoning/report generation, Cosmos VLMs for video understanding, NIM-for-Cosmos-Embed1 for video-text embeddings, and NIM-for-Vision-Language-Models for visual reasoning.
  • Industry-specific examples for smart city and warehouse operations without creating separate wiki pages for every sub-blueprint.

NVIDIA context

VSS belongs in the wiki graph as the video/vision-agent blueprint that connects NVIDIA-Metropolis, NVIDIA-DeepStream, NVIDIA-Cosmos, NVIDIA-AI-Data-Platform, and NVIDIA-AI-Blueprints. It is broader than a single model or NIM because it defines an application architecture across video ingestion, analytics, storage, embeddings, agents, and deployment.

Connections

Source Excerpts

  • NVIDIA docs describe VSS as reference architectures for building vision agents and AI-powered video analytics applications.
  • Current VSS docs list video retrieval, VLM-based Q&A, alert verification, natural-language video search, and long-video summarization workflows.

Resources