cuVS

Type: Technology Tags: CUDA, NVIDIA, GPU, Vector Search, Approximate Nearest Neighbor, RAPIDS, AI, Open Source Related: NVIDIA-RAPIDS, cuDF, cuML, cuGraph, NVIDIA-Merlin, TensorRT, cuBLAS, NeMo-Retriever, NeMo-Retriever-Embedding-NIM, NeMo-Retriever-Reranking-NIM, NVIDIA-AI-Data-Platform Sources: NVIDIA official documentation (RAPIDS), https://www.nvidia.com/en-us/data-center/ai-data-platform/ Last Updated: 2026-04-30

Summary

cuVS is NVIDIA’s GPU-accelerated vector search library providing world-class performance for approximate nearest neighbor (ANN) search via its CAGRA algorithm. Part of NVIDIA-RAPIDS, it accelerates vector search operations critical to retrieval-augmented generation (RAG), recommendation systems, and semantic search at scale. It supports Python, C++, C, and Rust APIs.

Detail

Purpose

Vector search (finding the most similar vectors in a large database) is foundational to modern AI applications — RAG pipelines, recommendation systems, semantic search, and image retrieval all depend on fast ANN search. cuVS accelerates this on GPU, enabling orders-of-magnitude faster indexing and querying compared to CPU-based ANN libraries like FAISS or HNSW.

Key Features

  • World-class ANN performance via the CAGRA graph-based algorithm
  • GPU-accelerated index build and query phases
  • Multiple index types: CAGRA (graph-based), IVF-Flat, IVF-PQ, brute-force
  • Multi-GPU support for large-scale vector databases
  • Integration with popular ANN benchmarks (ANN-Benchmarks)
  • Python, C++, C, and Rust APIs for broad ecosystem compatibility
  • cuDF integration for DataFrame-native vector search pipelines
  • Used as the backend for several vector database products
  • Referenced in current NVIDIA-AI-Data-Platform material as the GPU-accelerated vector search and data clustering layer for semantic search workloads

Use Cases

  • Retrieval-Augmented Generation (RAG) for LLMs
  • Semantic search over embedding databases
  • Recommendation systems (item-to-item, user-to-item similarity)
  • Image and video similarity search
  • Drug discovery (molecular similarity search)
  • Genomics (sequence similarity)
  • Fraud detection via behavioral embedding similarity

Hardware Requirements

  • NVIDIA GPU, Volta or newer (Ampere/Hopper recommended for peak performance)
  • CUDA 11.x or 12.x
  • Linux (primary supported OS)
  • Part of RAPIDS ecosystem

Language Bindings

  • Python (primary)
  • C++
  • C
  • Rust

Connections

  • NVIDIA-RAPIDS — cuVS is the vector search library in NVIDIA’s CUDA-X data science stack
  • cuDF — cuVS integrates with cuDF for end-to-end GPU vector search pipelines
  • cuML — cuML uses cuVS for K-nearest neighbors (KNN) operations
  • cuGraph — cuVS and cuGraph are both used in recommendation and retrieval pipelines
  • NVIDIA-Merlin — recommender systems often pair vector retrieval/search with ranking and feature pipelines
  • TensorRT — TensorRT generates embeddings that cuVS then indexes and searches
  • cuBLAS — cuVS uses BLAS routines internally for distance computation
  • NeMo-Retriever — retrieval stacks use vector search and indexing to connect enterprise data to agents
  • NeMo-Retriever-Embedding-NIM — produces embeddings that vector search systems index and query.
  • NeMo-Retriever-Reranking-NIM — reranks candidates returned from vector or hybrid retrieval.
  • NVIDIA-AI-Data-Platform — AI Data Platform uses cuVS in its accelerated data retrieval and semantic search story

Resources