-
Notifications
You must be signed in to change notification settings - Fork 309
Insights: pytorch/ao
Overview
Could not load contribution data
Please try again later
30 Pull requests merged by 11 people
-
[MoE training] torch.compile support for ScaledGroupedMMTensor
#2509 merged
Aug 2, 2025 -
[ROCm CI] Migrate to MI325 Capacity
#2662 merged
Aug 2, 2025 -
Make AWQ more general
#2400 merged
Aug 1, 2025 -
backward pass for differentiable mxfp8 grouped gemm with dynamic quant
#2639 merged
Aug 1, 2025 -
support for 2d-2d emulated mxfp8 grouped gemm
#2632 merged
Aug 1, 2025 -
Add new ops to CMakeLists.txt
#2647 merged
Aug 1, 2025 -
Add Aten operations
#2664 merged
Aug 1, 2025 -
Add Float8BlockwiseLinear for training
#2618 merged
Aug 1, 2025 -
[bc-breaking] Generalize FakeQuantizeConfig beyond intx
#2628 merged
Aug 1, 2025 -
fix bc breakage flex path
#2652 merged
Aug 1, 2025 -
Remove warnings in favor of skiptests for Moe code
#2654 merged
Aug 1, 2025 -
skip rocm for moe training tests
#2646 merged
Jul 31, 2025 -
Add op_exetorch
#2645 merged
Jul 31, 2025 -
Clarifying the meaning of VERSION in AOBaseConfig
#2635 merged
Jul 31, 2025 -
fix: improve formatting and resolve minor bug for better utility
#2634 merged
Jul 31, 2025 -
Update test_dynamic_activation_lut.py
#2637 merged
Jul 30, 2025 -
add differentiable mxfp8 grouped gemm with dynamic quant (forward pass)
#2627 merged
Jul 30, 2025 -
Add tests
#2624 merged
Jul 30, 2025 -
[Easy] Fix git repo url in citation
#2599 merged
Jul 30, 2025 -
Add Triton kernels for fp8 blockwise quantization and GEMMs
#2617 merged
Jul 30, 2025 -
mxfp8 emulated grouped gemm
#2626 merged
Jul 30, 2025 -
update float8 readme with more recent performance numbers
#2580 merged
Jul 30, 2025 -
Integrate PARQ with lowbit Arm CPU kernels
#2622 merged
Jul 29, 2025 -
Update the op_-impl.h
#2621 merged
Jul 29, 2025 -
Disable register_da8w4_concat_linear_cpu_pass
#2623 merged
Jul 29, 2025 -
[OpenVINOQuantizer] Minor improvements
#2581 merged
Jul 29, 2025 -
Update packed_weghts_header to include the new groupwise lowbit header.
#2582 merged
Jul 28, 2025 -
reorganize MX inference code
#2616 merged
Jul 28, 2025 -
delete outdated MX inference code
#2615 merged
Jul 28, 2025 -
fix mx kernel tests
#2614 merged
Jul 28, 2025
23 Pull requests opened by 10 people
-
mx: expose scaling calculation methods in training UX
#2620 opened
Jul 28, 2025 -
New multi-step QAT API
#2629 opened
Jul 29, 2025 -
Update test_bitpacking.cpp. Bug fix in test.
#2633 opened
Jul 30, 2025 -
Deprecate old QAT APIs
#2641 opened
Jul 30, 2025 -
Make scaling type configurable for MoE training
#2642 opened
Jul 30, 2025 -
Convert model inference test from pytest to unittest
#2644 opened
Jul 31, 2025 -
Update coreml codebook
#2648 opened
Jul 31, 2025 -
Bump version for float8 dynamic quant and weight only quant configs
#2650 opened
Aug 1, 2025 -
Check numerical equivalence / closeness between different kernel preferences
#2651 opened
Aug 1, 2025 -
Migrate to unittest for files in test/float8
#2655 opened
Aug 1, 2025 -
Add gpu_name as a parameter in roofline estimate utils
#2657 opened
Aug 1, 2025 -
Migrate to unittest for test files in quantization, sparsity
#2658 opened
Aug 1, 2025 -
Convert SmoothQuant test to unittest
#2659 opened
Aug 1, 2025 -
Replace `torch.norm` with `torch.linalg.vector_norm` for PyTorch future update
#2660 opened
Aug 1, 2025 -
mx: make CUDA kernel for dim1 cast in mxfp8_cublas recipe
#2661 opened
Aug 1, 2025 -
[MoE training] Assert expert weights are column-major; preserve subclass with transpose
#2663 opened
Aug 1, 2025 -
Update test scripts to include the new operation.
#2665 opened
Aug 1, 2025 -
Add NVFP4 QAT
#2666 opened
Aug 1, 2025 -
Add more metrics to dashboard
#2667 opened
Aug 1, 2025 -
[moe training] use smaller block sizes for per group scaling kernels to improve perf
#2668 opened
Aug 2, 2025 -
[moe training] add llama4 benchmarking script
#2669 opened
Aug 2, 2025 -
[moe training] add benchmark script for moe layer
#2671 opened
Aug 4, 2025 -
fix float8 rowwise inference perf with torch.compile
#2672 opened
Aug 4, 2025
1 Issue closed by 1 person
-
What is the intention of "NF4WeightOnlyConfig" ?
#2631 closed
Jul 31, 2025
6 Issues opened by 4 people
-
[moe training] fsdp2 bug for llama4 shared experts where num_experts=1
#2673 opened
Aug 4, 2025 -
Deprecation for Float8DynamicActivationFloat8WeightConfig and Float8WeightOnlyConfig and the models
#2649 opened
Jul 31, 2025 -
Refresh "Quantization Overview" docs page
#2643 opened
Jul 30, 2025 -
Confusing behavior of `register_quantize_module_handler`
#2640 opened
Jul 30, 2025 -
Deprecate and remove old QAT API
#2630 opened
Jul 30, 2025 -
Int8QuantizedTrainingLinearWeight missing aten.cat support for distributed training
#2619 opened
Jul 28, 2025
20 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Float8Tensor
#2463 commented on
Aug 2, 2025 • 31 new comments -
Remove double baseline calculations for CI microbenchmarks
#2613 commented on
Aug 4, 2025 • 6 new comments -
Add vLLM x TorchAO integration workflow
#2610 commented on
Jul 29, 2025 • 2 new comments -
Migrate to unittest for files in test/dtypes
#2605 commented on
Jul 31, 2025 • 0 new comments -
wip MoE refactor
#2600 commented on
Aug 1, 2025 • 0 new comments -
Enable PyTorch/ao CI on Intel XPU with PyTorch Base Docker
#2584 commented on
Aug 4, 2025 • 0 new comments -
[Inductor][float8] Support qlinear for float8 in inductor
#2565 commented on
Aug 4, 2025 • 0 new comments -
Refactor Wanda for better readability
#2538 commented on
Aug 1, 2025 • 0 new comments -
feat: RGS for wanda++
#2537 commented on
Jul 29, 2025 • 0 new comments -
[CPU] Add support for dynamic float8 act float8 weight on CPU
#2505 commented on
Aug 1, 2025 • 0 new comments -
Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig
#2474 commented on
Aug 2, 2025 • 0 new comments -
[DRAFT] Enable CPU data layout convert to XPU
#2441 commented on
Jul 29, 2025 • 0 new comments -
[Inductor] Support scaled mm on inductor
#2411 commented on
Aug 1, 2025 • 0 new comments -
Enables the per_tensor lowering patterns for weight per_packing
#2391 commented on
Jul 31, 2025 • 0 new comments -
[roadmap/tracker] Low precision MoE training
#2147 commented on
Aug 4, 2025 • 0 new comments -
[ROCm] torchao.float8 should work properly on ROCm
#1066 commented on
Aug 4, 2025 • 0 new comments -
[feature request] np.packbits / np.unpackbits, general BitTensors (maybe can be just tensors with dtype torch.bits8 or have a new dtype torch.bits introduced) and bit packed tensors utilities for saving memory / accesses, support for BitTensors wherever BoolTensors are used
#292 commented on
Aug 3, 2025 • 0 new comments -
[Feature Req] Can you add *args and **kwargs to improve extensibility ?
#2496 commented on
Aug 1, 2025 • 0 new comments -
bug: int8 w8a8 doesn't work on 5090
#2376 commented on
Jul 28, 2025 • 0 new comments -
Missing benchmark for `sparse24_sm90_sparsify` overhead
#2612 commented on
Jul 28, 2025 • 0 new comments