Pulse · pytorch/ao · GitHub

July 28, 2025 – August 4, 2025

Overview

53 Active pull requests

7 Active issues

30 Pull requests merged by 11 people

[MoE training] torch.compile support for ScaledGroupedMMTensor
#2509 merged Aug 2, 2025
[ROCm CI] Migrate to MI325 Capacity
#2662 merged Aug 2, 2025
Make AWQ more general
#2400 merged Aug 1, 2025
backward pass for differentiable mxfp8 grouped gemm with dynamic quant
#2639 merged Aug 1, 2025
support for 2d-2d emulated mxfp8 grouped gemm
#2632 merged Aug 1, 2025
Add new ops to CMakeLists.txt
#2647 merged Aug 1, 2025
Add Aten operations
#2664 merged Aug 1, 2025
Add Float8BlockwiseLinear for training
#2618 merged Aug 1, 2025
[bc-breaking] Generalize FakeQuantizeConfig beyond intx
#2628 merged Aug 1, 2025
fix bc breakage flex path
#2652 merged Aug 1, 2025
Remove warnings in favor of skiptests for Moe code
#2654 merged Aug 1, 2025
skip rocm for moe training tests
#2646 merged Jul 31, 2025
Add op_exetorch
#2645 merged Jul 31, 2025
Clarifying the meaning of VERSION in AOBaseConfig
#2635 merged Jul 31, 2025
fix: improve formatting and resolve minor bug for better utility
#2634 merged Jul 31, 2025
Update test_dynamic_activation_lut.py
#2637 merged Jul 30, 2025
add differentiable mxfp8 grouped gemm with dynamic quant (forward pass)
#2627 merged Jul 30, 2025
Add tests
#2624 merged Jul 30, 2025
[Easy] Fix git repo url in citation
#2599 merged Jul 30, 2025
Add Triton kernels for fp8 blockwise quantization and GEMMs
#2617 merged Jul 30, 2025
mxfp8 emulated grouped gemm
#2626 merged Jul 30, 2025
update float8 readme with more recent performance numbers
#2580 merged Jul 30, 2025
Integrate PARQ with lowbit Arm CPU kernels
#2622 merged Jul 29, 2025
Update the op_-impl.h
#2621 merged Jul 29, 2025
Disable register_da8w4_concat_linear_cpu_pass
#2623 merged Jul 29, 2025
[OpenVINOQuantizer] Minor improvements
#2581 merged Jul 29, 2025
Update packed_weghts_header to include the new groupwise lowbit header.
#2582 merged Jul 28, 2025
reorganize MX inference code
#2616 merged Jul 28, 2025
delete outdated MX inference code
#2615 merged Jul 28, 2025
fix mx kernel tests
#2614 merged Jul 28, 2025

23 Pull requests opened by 10 people

mx: expose scaling calculation methods in training UX
#2620 opened Jul 28, 2025
New multi-step QAT API
#2629 opened Jul 29, 2025
Update test_bitpacking.cpp. Bug fix in test.
#2633 opened Jul 30, 2025
Deprecate old QAT APIs
#2641 opened Jul 30, 2025
Make scaling type configurable for MoE training
#2642 opened Jul 30, 2025
Convert model inference test from pytest to unittest
#2644 opened Jul 31, 2025
Update coreml codebook
#2648 opened Jul 31, 2025
Bump version for float8 dynamic quant and weight only quant configs
#2650 opened Aug 1, 2025
Check numerical equivalence / closeness between different kernel preferences
#2651 opened Aug 1, 2025
Migrate to unittest for files in test/float8
#2655 opened Aug 1, 2025
Add gpu_name as a parameter in roofline estimate utils
#2657 opened Aug 1, 2025
Migrate to unittest for test files in quantization, sparsity
#2658 opened Aug 1, 2025
Convert SmoothQuant test to unittest
#2659 opened Aug 1, 2025
Replace `torch.norm` with `torch.linalg.vector_norm` for PyTorch future update
#2660 opened Aug 1, 2025
mx: make CUDA kernel for dim1 cast in mxfp8_cublas recipe
#2661 opened Aug 1, 2025
[MoE training] Assert expert weights are column-major; preserve subclass with transpose
#2663 opened Aug 1, 2025
Update test scripts to include the new operation.
#2665 opened Aug 1, 2025
Add NVFP4 QAT
#2666 opened Aug 1, 2025
Add more metrics to dashboard
#2667 opened Aug 1, 2025
[moe training] use smaller block sizes for per group scaling kernels to improve perf
#2668 opened Aug 2, 2025
[moe training] add llama4 benchmarking script
#2669 opened Aug 2, 2025
[moe training] add benchmark script for moe layer
#2671 opened Aug 4, 2025
fix float8 rowwise inference perf with torch.compile
#2672 opened Aug 4, 2025

1 Issue closed by 1 person

What is the intention of "NF4WeightOnlyConfig" ?
#2631 closed Jul 31, 2025

6 Issues opened by 4 people

[moe training] fsdp2 bug for llama4 shared experts where num_experts=1
#2673 opened Aug 4, 2025
Deprecation for Float8DynamicActivationFloat8WeightConfig and Float8WeightOnlyConfig and the models
#2649 opened Jul 31, 2025
Refresh "Quantization Overview" docs page
#2643 opened Jul 30, 2025
Confusing behavior of `register_quantize_module_handler`
#2640 opened Jul 30, 2025
Deprecate and remove old QAT API
#2630 opened Jul 30, 2025
Int8QuantizedTrainingLinearWeight missing aten.cat support for distributed training
#2619 opened Jul 28, 2025

20 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Float8Tensor
#2463 commented on Aug 2, 2025 • 31 new comments
Remove double baseline calculations for CI microbenchmarks
#2613 commented on Aug 4, 2025 • 6 new comments
Add vLLM x TorchAO integration workflow
#2610 commented on Jul 29, 2025 • 2 new comments
Migrate to unittest for files in test/dtypes
#2605 commented on Jul 31, 2025 • 0 new comments
wip MoE refactor
#2600 commented on Aug 1, 2025 • 0 new comments
Enable PyTorch/ao CI on Intel XPU with PyTorch Base Docker
#2584 commented on Aug 4, 2025 • 0 new comments
[Inductor][float8] Support qlinear for float8 in inductor
#2565 commented on Aug 4, 2025 • 0 new comments
Refactor Wanda for better readability
#2538 commented on Aug 1, 2025 • 0 new comments
feat: RGS for wanda++
#2537 commented on Jul 29, 2025 • 0 new comments
[CPU] Add support for dynamic float8 act float8 weight on CPU
#2505 commented on Aug 1, 2025 • 0 new comments
Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig
#2474 commented on Aug 2, 2025 • 0 new comments
[DRAFT] Enable CPU data layout convert to XPU
#2441 commented on Jul 29, 2025 • 0 new comments
[Inductor] Support scaled mm on inductor
#2411 commented on Aug 1, 2025 • 0 new comments
Enables the per_tensor lowering patterns for weight per_packing
#2391 commented on Jul 31, 2025 • 0 new comments
[roadmap/tracker] Low precision MoE training
#2147 commented on Aug 4, 2025 • 0 new comments
[ROCm] torchao.float8 should work properly on ROCm
#1066 commented on Aug 4, 2025 • 0 new comments
[feature request] np.packbits / np.unpackbits, general BitTensors (maybe can be just tensors with dtype torch.bits8 or have a new dtype torch.bits introduced) and bit packed tensors utilities for saving memory / accesses, support for BitTensors wherever BoolTensors are used
#292 commented on Aug 3, 2025 • 0 new comments
[Feature Req] Can you add *args and **kwargs to improve extensibility ?
#2496 commented on Aug 1, 2025 • 0 new comments
bug: int8 w8a8 doesn't work on 5090
#2376 commented on Jul 28, 2025 • 0 new comments
Missing benchmark for `sparse24_sm90_sparsify` overhead
#2612 commented on Jul 28, 2025 • 0 new comments