Dynamic memory allocation #3727

cehongwang · 2025-07-29T20:34:37Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

Co-authored-by: Adrian Wang <123616592+cehongwang@users.noreply.github.com>

github-actions

There are some changes that do not conform to C++ style guidelines:

diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/register_jit_hooks.cpp b/tmp/changes.txt
index 6d15bd8..b6f2d5b 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/register_jit_hooks.cpp
+++ b/tmp/changes.txt
@@ -109,7 +109,10 @@ static auto TORCHTRT_UNUSED TRTEngineTSRegistrtion =
            [](const c10::intrusive_ptr<TRTEngine>& self) -> std::vector<std::string> { return self->serialize(); },
            [](std::vector<std::string> serialized_info) -> c10::intrusive_ptr<TRTEngine> {
              serialized_info[ENGINE_IDX] = base64_decode(serialized_info[ENGINE_IDX]);
-              LOG_DEBUG("Deserialized resource allocation strategy: " << (static_cast<bool>(std::stoi(serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX])) ? "Dynamic" : "Static"));
+              LOG_DEBUG(
+                  "Deserialized resource allocation strategy: "
+                  << (static_cast<bool>(std::stoi(serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX])) ? "Dynamic"
+                                                                                                      : "Static"));
              TRTEngine::verify_serialization_fmt(serialized_info);
              return c10::make_intrusive<TRTEngine>(serialized_info);
            });
diff --git a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp b/tmp/changes.txt
index 253738b..de70331 100644
--- a/home/runner/work/TensorRT/TensorRT/core/runtime/TRTEngine.cpp
+++ b/tmp/changes.txt
@@ -86,7 +86,9 @@ TRTEngine::TRTEngine(std::vector<std::string> serialized_info)
          static_cast<bool>(std::stoi(serialized_info[HW_COMPATIBLE_IDX])),
          static_cast<bool>(std::stoi(serialized_info[REQUIRES_OUTPUT_ALLOCATOR_IDX])),
          serialized_info[SERIALIZED_METADATA_IDX],
-          (static_cast<bool>(std::stoi(serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX])) ? ResourceAllocationStrategy::kDynamic : ResourceAllocationStrategy::kStatic)) {}
+          (static_cast<bool>(std::stoi(serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX]))
+               ? ResourceAllocationStrategy::kDynamic
+               : ResourceAllocationStrategy::kStatic)) {}

TRTEngine::TRTEngine(
    const std::string& mod_name,
@@ -129,7 +131,9 @@ TRTEngine::TRTEngine(
  }

  this->resource_allocation_strategy = resource_allocation_strategy;
-  LOG_DEBUG("Resource allocation strategy: " << (this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic ? "Dynamic" : "Static"));
+  LOG_DEBUG(
+      "Resource allocation strategy: "
+      << (this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic ? "Dynamic" : "Static"));
  if (this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic) {
    this->exec_ctx =
        make_trt(cuda_engine->createExecutionContext(nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
@@ -472,7 +476,8 @@ std::vector<std::string> TRTEngine::serialize() {
  serialized_info[REQUIRES_OUTPUT_ALLOCATOR_IDX] = this->requires_output_allocator ? "1" : "0";
  serialized_info[SERIALIZED_METADATA_IDX] = this->serialized_metadata;
  serialized_info[TARGET_PLATFORM_IDX] = this->target_platform.serialize();
-  serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX] = this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic ? "1" : "0";
+  serialized_info[RESOURCE_ALLOCATION_STRATEGY_IDX] =
+      this->resource_allocation_strategy == ResourceAllocationStrategy::kDynamic ? "1" : "0";

  return serialized_info;
}
@@ -486,11 +491,11 @@ void TRTEngine::set_resource_allocation_strategy(TRTEngine::ResourceAllocationSt
    this->resource_allocation_strategy = new_strategy;
    if (this->resource_allocation_strategy == TRTEngine::ResourceAllocationStrategy::kDynamic) {
      LOG_DEBUG("Setting resource allocation strategy to dynamic");
-      this->exec_ctx = make_trt(cuda_engine->createExecutionContext(nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
+      this->exec_ctx =
+          make_trt(cuda_engine->createExecutionContext(nvinfer1::ExecutionContextAllocationStrategy::kUSER_MANAGED));
    } else {
      LOG_DEBUG("Setting resource allocation strategy to static");
-      this->exec_ctx = make_trt(
-          cuda_engine->createExecutionContext());
+      this->exec_ctx = make_trt(cuda_engine->createExecutionContext());
    }
  }
}
ERROR: Some files do not conform to style guidelines

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/examples/dynamo/dynamic_memory_allocation.py	2025-07-29 23:34:54.135102+00:00
+++ /home/runner/work/TensorRT/TensorRT/examples/dynamo/dynamic_memory_allocation.py	2025-07-29 23:35:18.839735+00:00
@@ -14,21 +14,22 @@
    "ir": "dynamo",
    "use_python_runtime": False,
    "enabled_precisions": {torch.float32},
    "immutable_weights": False,
    "lazy_engine_init": True,
-    "dynamically_allocate_resources": True
-
+    "dynamically_allocate_resources": True,
}

model = models.resnet152(pretrained=True).eval().to("cuda")
compiled_module = torch_trt.compile(model, inputs=inputs, **settings)
print((torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3)
compiled_module(*inputs)

time.sleep(30)
-with torch_trt.dynamo.runtime.ResourceAllocationStrategy(compiled_module, dynamically_allocate_resources=False):
+with torch_trt.dynamo.runtime.ResourceAllocationStrategy(
+    compiled_module, dynamically_allocate_resources=False
+):
    print(
        "Memory used (GB):",
        (torch.cuda.mem_get_info()[1] - torch.cuda.mem_get_info()[0]) / 1024**3,
    )
    compiled_module(*inputs)
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_ResourceAllocator.py	2025-07-29 23:34:54.152102+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_ResourceAllocator.py	2025-07-29 23:35:20.711155+00:00
@@ -12,21 +12,25 @@
    """

    def __init__(
        self,
        compiled_module: torch.nn.Module,
-        dynamically_allocate_resources: bool = True
+        dynamically_allocate_resources: bool = True,
    ) -> None:
        super(ResourceAllocationStrategy, self).__init__()
        self.compiled_module = compiled_module
        self.dynamically_allocate_resources = dynamically_allocate_resources

    def __enter__(self) -> None:
        print("Entering resource allocator context")
        for name, submodule in self.compiled_module.named_modules():
            if "_run_on_acc" in name:
-                submodule.use_dynamically_allocated_resources(dynamically_allocate_resources=self.dynamically_allocate_resources)
+                submodule.use_dynamically_allocated_resources(
+                    dynamically_allocate_resources=self.dynamically_allocate_resources
+                )

    def __exit__(self, exc_type: Any, exc_value: Any, exc_tb: Any) -> None:
        for name, submodule in self.compiled_module.named_modules():
            if "_run_on_acc" in name:
-                submodule.use_dynamically_allocated_resources(dynamically_allocate_resources=self.dynamically_allocate_resources)
+                submodule.use_dynamically_allocated_resources(
+                    dynamically_allocate_resources=self.dynamically_allocate_resources
+                )
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2025-07-29 23:34:54.152102+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py	2025-07-29 23:35:21.044122+00:00
@@ -186,11 +186,13 @@
        engine_info[SERIALIZED_METADATA_IDX] = self.encode_metadata(metadata)
        engine_info[TARGET_PLATFORM_IDX] = target_platform._to_serialized_rt_platform()
        engine_info[REQUIRES_OUTPUT_ALLOCATOR_IDX] = str(
            int(self.requires_output_allocator)
        )
-        print(f"PROVIDED RESOURCE ALLOCATION STRATEGY: {self.dynamically_allocate_resources}")
+        print(
+            f"PROVIDED RESOURCE ALLOCATION STRATEGY: {self.dynamically_allocate_resources}"
+        )
        engine_info[RESOURCE_ALLOCATION_STRATEGY_IDX] = str(
            int(self.dynamically_allocate_resources)
        )
        print(engine_info[RESOURCE_ALLOCATION_STRATEGY_IDX])

@@ -219,13 +221,17 @@
        return budget_bytes

    def _reset_captured_graph(self) -> None:
        self.engine.reset_captured_graph()

-    def use_dynamically_allocated_resources(self, dynamically_allocate_resources: bool = False) -> None:
+    def use_dynamically_allocated_resources(
+        self, dynamically_allocate_resources: bool = False
+    ) -> None:
        self.dynamically_allocate_resources = dynamically_allocate_resources
-        self.engine.use_dynamically_allocated_resources(self.dynamically_allocate_resources)
+        self.engine.use_dynamically_allocated_resources(
+            self.dynamically_allocate_resources
+        )

    def setup_engine(self) -> None:
        """
        Setup engine for a module which has deferred engine setup.

narendasan · 2025-07-30T16:47:48Z

core/runtime/runtime.h

+std::string resource_allocation_strategy_to_string(TRTEngine::ResourceAllocationStrategy strategy);
+TRTEngine::ResourceAllocationStrategy resource_allocation_strategy_from_string(const std::string& str);


Forgot to delete these

narendasan · 2025-07-30T16:48:41Z

core/runtime/runtime.h

@@ -38,13 +38,17 @@ typedef enum {
  SERIALIZED_METADATA_IDX,
  TARGET_PLATFORM_IDX,
  REQUIRES_OUTPUT_ALLOCATOR_IDX,
+  RESOURCE_ALLOCATION_STRATEGY_IDX,


Make sure to bump the ABI version

Added initial implementation

e22be64

meta-cla bot added the cla signed label Jul 29, 2025

cehongwang requested a review from narendasan July 29, 2025 20:34

github-actions bot added component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jul 29, 2025

github-actions bot requested a review from bowang007 July 29, 2025 20:34

cehongwang changed the title ~~Added initial implementation~~ Dynamic memory allocation Jul 29, 2025

Some fixes for the dynamic memory setting (#3729)

50678a5

Co-authored-by: Adrian Wang <123616592+cehongwang@users.noreply.github.com>

github-actions bot requested changes Jul 29, 2025

View reviewed changes

narendasan reviewed Jul 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic memory allocation #3727

Dynamic memory allocation #3727

Uh oh!

cehongwang commented Jul 29, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

narendasan Jul 30, 2025

Uh oh!

narendasan Jul 30, 2025

Uh oh!

Uh oh!

		std::string resource_allocation_strategy_to_string(TRTEngine::ResourceAllocationStrategy strategy);
		TRTEngine::ResourceAllocationStrategy resource_allocation_strategy_from_string(const std::string& str);

Dynamic memory allocation #3727

Are you sure you want to change the base?

Dynamic memory allocation #3727

Uh oh!

Conversation

cehongwang commented Jul 29, 2025

Description

Type of change

Checklist:

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

narendasan Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

narendasan Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!