NIM for Maxine Eye Contact

Type: Microservice Tags: NVIDIA, NIM, Maxine, eye contact, gaze correction, video conferencing, AR, digital engagement, CUDA, TensorRT, Triton Related: NVIDIA-NIM, NVIDIA-Maxine, NVIDIA-AI-for-Media-SDKs, NVIDIA-Augmented-Reality-SDK, NIM-for-Maxine-Active-Speaker-Detection, NVIDIA-Video-Effects-SDK, NVIDIA-AI-Enterprise, TensorRT, Triton-Inference-Server, NVIDIA-CUDA, NVIDIA-Video-Codec-SDK, NGC Sources: https://docs.nvidia.com/nim/maxine/eye-contact/latest/overview.html, https://docs.nvidia.com/nim/maxine/eye-contact/latest/support-matrix.html Last Updated: 2026-04-29

Summary

NVIDIA Eye Contact NIM is a Maxine NIM microservice that redirects a user’s gaze toward the camera to simulate natural eye contact in video. Current NVIDIA docs describe it as a CUDA, TensorRT, and Triton accelerated service for remote engagement, video conferencing, and video processing workflows.

Detail

Purpose

Eye Contact NIM solves a common remote-presence problem: users often look at screens, notes, or participants rather than the camera. The service modifies the eye region in video frames so the face appears to look forward while preserving the surrounding frame.

Current scope

  • Extracts an eye-region patch from each video frame using NVIDIA face tracking.
  • Uses 2D facial landmarks and 6DOF head pose as inputs to the eye-contact network.
  • Applies a disentangled encoder-decoder network to estimate gaze and redirect the eye patch.
  • Blends the corrected eye region back into the original frame with an inverse transform.
  • Supports transactional mode for complete video-file processing and streaming mode for streamable MP4 inputs.
  • Model ID in the current support matrix is maxine-eye-contact.
  • Current optimized configurations list FP16 profiles for T4, A2, A10, A16, A40, L4, L40, NVIDIA RTX PRO 6000 Blackwell Server Edition, RTX 4090, RTX 5090, and RTX 5080.
  • Requires GPUs with Tensor Cores plus NVENC/NVDEC media support; current docs call out A100, H100, and B100 as unsupported for this NIM.

NVIDIA context

Eye Contact NIM turns an NVIDIA-Augmented-Reality-SDK-style gaze feature into a self-hostable NVIDIA-NIM service. It sits alongside NIM-for-Maxine-Active-Speaker-Detection for real-time meeting/video intelligence and alongside NVIDIA-Video-Effects-SDK for video enhancement.

Connections

Source Excerpts

  • NVIDIA docs describe Eye Contact NIM as dynamically redirecting eye position toward the camera.
  • The current docs list transactional and streaming input modes and identify NVENC/NVDEC support as a requirement.

Resources