Deep dive

Jul 31, 2025
Securing Agentic AI: How Semantic Prompt Injections Bypass AI Guardrails
Prompt injection, where adversaries manipulate inputs to make large language models behave in unintended ways, has long posed a threat to AI systems since the...
8 MIN READ

Jul 30, 2025
Using CI/CD to Automate Network Configuration and Deployment
Continuous integration and continuous delivery/deployment (CI/CD) is a set of modern software development practices used for delivering code changes more...
6 MIN READ

Jul 23, 2025
Approaches to PDF Data Extraction for Information Retrieval
The PDF is among the most common file formats for sharing information such as financial reports, research papers, technical documents, and marketing materials....
11 MIN READ

Jul 22, 2025
Understanding NCCL Tuning to Accelerate GPU-to-GPU Communication
The NVIDIA Collective Communications Library (NCCL) is essential for fast GPU-to-GPU communication in AI workloads, using various optimizations and tuning to...
14 MIN READ

Jul 22, 2025
Building Robotic Mental Models with NVIDIA Warp and Gaussian Splatting
This post explores a promising direction for building dynamic digital representations of the physical world, a topic gaining increasing attention in recent...
4 MIN READ

Jul 21, 2025
Traditional RAG vs. Agentic RAG—Why AI Agents Need Dynamic Knowledge to Get Smarter
Ever relied on an old GPS that didn’t know about the new highway bypass, or a sudden road closure? It might get you to your destination, but not in the most...
8 MIN READ

Jul 17, 2025
Safeguard Agentic AI Systems with the NVIDIA Safety Recipe
As large language models (LLMs) power more agentic systems capable of performing autonomous actions, tool use, and reasoning, enterprises are drawn to their...
7 MIN READ

Jul 16, 2025
CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design
GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...
12 MIN READ

Jul 16, 2025
CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels
In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models...
12 MIN READ

Jul 16, 2025
R²D²: Training Generalist Robots with NVIDIA Research Workflows and World Foundation Models
A major challenge in robotics is training robots to perform new tasks without the massive effort of collecting and labeling datasets for every new task and...
11 MIN READ

Jul 14, 2025
Enabling Fast Inference and Resilient Training with NCCL 2.27
As AI workloads scale, fast and reliable GPU communication becomes vital, not just for training, but increasingly for inference at scale. The NVIDIA Collective...
9 MIN READ

Jul 14, 2025
Enhancing Multilingual Human-Like Speech and Voice Cloning with NVIDIA Riva TTS
While speech AI is used to build digital assistants and voice agents, its impact extends far beyond these applications. Core technologies like text-to-speech...
10 MIN READ

Jul 11, 2025
Improving Synthetic Data Augmentation and Human Action Recognition with SynthDa
Human action recognition is a capability in AI systems designed for safety-critical applications, such as surveillance, eldercare, and industrial monitoring....
10 MIN READ

Jul 10, 2025
InfiniBand Multilayered Security Protects Data Centers and AI Workloads
In today’s data-driven world, security isn't just a feature—it's the foundation. With the exponential growth of AI, HPC, and hyperscale cloud computing, the...
6 MIN READ

Jul 02, 2025
Optimizing FLUX.1 Kontext for Image Editing with Low-Precision Quantization
FLUX.1 Kontext, the recently released model from Black Forest Labs, is a fascinating addition to the repertoire of community image generation models. The open...
10 MIN READ

Jul 01, 2025
Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training
In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the...
10 MIN READ