cuVS

Type: Technology Tags: CUDA, NVIDIA, GPU, Vector Search, Approximate Nearest Neighbor, RAPIDS, AI, Open Source Related: NVIDIA-RAPIDS, cuDF, cuML, cuGraph, NVIDIA-Merlin, TensorRT, cuBLAS, NeMo-Retriever, NeMo-Retriever-Embedding-NIM, NeMo-Retriever-Reranking-NIM, NVIDIA-AI-Data-Platform Sources: NVIDIA official documentation (RAPIDS), https://www.nvidia.com/en-us/data-center/ai-data-platform/ Last Updated: 2026-04-30

Summary

cuVS is NVIDIA’s GPU-accelerated vector search library providing world-class performance for approximate nearest neighbor (ANN) search via its CAGRA algorithm. Part of NVIDIA-RAPIDS, it accelerates vector search operations critical to retrieval-augmented generation (RAG), recommendation systems, and semantic search at scale. It supports Python, C++, C, and Rust APIs.

Detail

Purpose

Vector search (finding the most similar vectors in a large database) is foundational to modern AI applications — RAG pipelines, recommendation systems, semantic search, and image retrieval all depend on fast ANN search. cuVS accelerates this on GPU, enabling orders-of-magnitude faster indexing and querying compared to CPU-based ANN libraries like FAISS or HNSW.

Key Features

World-class ANN performance via the CAGRA graph-based algorithm
GPU-accelerated index build and query phases
Multiple index types: CAGRA (graph-based), IVF-Flat, IVF-PQ, brute-force
Multi-GPU support for large-scale vector databases
Integration with popular ANN benchmarks (ANN-Benchmarks)
Python, C++, C, and Rust APIs for broad ecosystem compatibility
cuDF integration for DataFrame-native vector search pipelines
Used as the backend for several vector database products
Referenced in current NVIDIA-AI-Data-Platform material as the GPU-accelerated vector search and data clustering layer for semantic search workloads

Use Cases

Retrieval-Augmented Generation (RAG) for LLMs
Semantic search over embedding databases
Recommendation systems (item-to-item, user-to-item similarity)
Image and video similarity search
Drug discovery (molecular similarity search)
Genomics (sequence similarity)
Fraud detection via behavioral embedding similarity

Hardware Requirements

NVIDIA GPU, Volta or newer (Ampere/Hopper recommended for peak performance)
CUDA 11.x or 12.x
Linux (primary supported OS)
Part of RAPIDS ecosystem

Language Bindings

Python (primary)
C++
C
Rust

Connections

NVIDIA-RAPIDS — cuVS is the vector search library in NVIDIA’s CUDA-X data science stack
cuDF — cuVS integrates with cuDF for end-to-end GPU vector search pipelines
cuML — cuML uses cuVS for K-nearest neighbors (KNN) operations
cuGraph — cuVS and cuGraph are both used in recommendation and retrieval pipelines
NVIDIA-Merlin — recommender systems often pair vector retrieval/search with ranking and feature pipelines
TensorRT — TensorRT generates embeddings that cuVS then indexes and searches
cuBLAS — cuVS uses BLAS routines internally for distance computation
NeMo-Retriever — retrieval stacks use vector search and indexing to connect enterprise data to agents
NeMo-Retriever-Embedding-NIM — produces embeddings that vector search systems index and query.
NeMo-Retriever-Reranking-NIM — reranks candidates returned from vector or hybrid retrieval.
NVIDIA-AI-Data-Platform — AI Data Platform uses cuVS in its accelerated data retrieval and semantic search story

AIPS BOOM

Explorer

cuVS

cuVS

Summary

Detail

Purpose

Key Features

Use Cases

Hardware Requirements

Language Bindings

Connections

Resources

Graph View

Table of Contents

Backlinks