Deep dive

Jul 31, 2025

Securing Agentic AI: How Semantic Prompt Injections Bypass AI Guardrails

Prompt injection, where adversaries manipulate inputs to make large language models behave in unintended ways, has long posed a threat to AI systems since the...

8 MIN READ

Jul 30, 2025

Using CI/CD to Automate Network Configuration and Deployment

Continuous integration and continuous delivery/deployment (CI/CD) is a set of modern software development practices used for delivering code changes more...

6 MIN READ

Jul 23, 2025

Approaches to PDF Data Extraction for Information Retrieval

The PDF is among the most common file formats for sharing information such as financial reports, research papers, technical documents, and marketing materials....

11 MIN READ

Jul 22, 2025

Understanding NCCL Tuning to Accelerate GPU-to-GPU Communication

The NVIDIA Collective Communications Library (NCCL) is essential for fast GPU-to-GPU communication in AI workloads, using various optimizations and tuning to...

14 MIN READ

Jul 22, 2025

Building Robotic Mental Models with NVIDIA Warp and Gaussian Splatting

This post explores a promising direction for building dynamic digital representations of the physical world, a topic gaining increasing attention in recent...

4 MIN READ

Jul 21, 2025

Traditional RAG vs. Agentic RAG—Why AI Agents Need Dynamic Knowledge to Get Smarter

Ever relied on an old GPS that didn’t know about the new highway bypass, or a sudden road closure? It might get you to your destination, but not in the most...

8 MIN READ

Jul 17, 2025

Safeguard Agentic AI Systems with the NVIDIA Safety Recipe

As large language models (LLMs) power more agentic systems capable of performing autonomous actions, tool use, and reasoning, enterprises are drawn to their...

7 MIN READ

Jul 16, 2025

CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design

GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...

12 MIN READ

Jul 16, 2025

CUTLASS: Principled Abstractions for Handling Multidimensional Data Through Tensors and Spatial Microkernels

In the era of generative AI, utilizing GPUs to their maximum potential is essential to training better models and serving users at scale. Often, these models...

12 MIN READ

Jul 16, 2025

R²D²: Training Generalist Robots with NVIDIA Research Workflows and World Foundation Models

A major challenge in robotics is training robots to perform new tasks without the massive effort of collecting and labeling datasets for every new task and...

11 MIN READ

Jul 14, 2025

Enabling Fast Inference and Resilient Training with NCCL 2.27

As AI workloads scale, fast and reliable GPU communication becomes vital, not just for training, but increasingly for inference at scale. The NVIDIA Collective...

9 MIN READ

Jul 14, 2025

Enhancing Multilingual Human-Like Speech and Voice Cloning with NVIDIA Riva TTS

While speech AI is used to build digital assistants and voice agents, its impact extends far beyond these applications. Core technologies like text-to-speech...

10 MIN READ

Jul 11, 2025

Improving Synthetic Data Augmentation and Human Action Recognition with SynthDa

Human action recognition is a capability in AI systems designed for safety-critical applications, such as surveillance, eldercare, and industrial monitoring....

10 MIN READ

Jul 10, 2025

InfiniBand Multilayered Security Protects Data Centers and AI Workloads

In today’s data-driven world, security isn't just a feature—it's the foundation. With the exponential growth of AI, HPC, and hyperscale cloud computing, the...

6 MIN READ

Jul 02, 2025

Optimizing FLUX.1 Kontext for Image Editing with Low-Precision Quantization

FLUX.1 Kontext, the recently released model from Black Forest Labs, is a fascinating addition to the repertoire of community image generation models. The open...

10 MIN READ

Jul 01, 2025

Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training

In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the...

10 MIN READ