-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Insights: Lightning-AI/pytorch-lightning
Overview
Could not load contribution data
Please try again later
24 Pull requests merged by 9 people
-
build(deps): bump Lightning-AI/utilities from 0.14.3 to 0.15.0
#21010 merged
Aug 4, 2025 -
fix ci: progress bar console clearing for latest Rich release
#21016 merged
Aug 4, 2025 -
build(deps): update torchmetrics requirement from <1.8.0,>=0.10.0 to >=0.10.0,<1.9.0 in /requirements
#21006 merged
Aug 4, 2025 -
build(deps): update docutils requirement from <=0.19,>=0.18.1 to >=0.18.1,<=0.22 in /requirements
#21027 merged
Aug 4, 2025 -
build(deps): update awscli requirement from <1.42.0,>=1.30.0 to >=1.30.0,<1.43.0 in /requirements
#21026 merged
Aug 4, 2025 -
build(deps): bump mypy from 1.17.0 to 1.17.1 in /requirements
#21025 merged
Aug 4, 2025 -
build(deps): bump coverage from 7.9.2 to 7.10.2 in /requirements
#21024 merged
Aug 4, 2025 -
fix: rich progress bar error when resume training
#21000 merged
Jul 31, 2025 -
fix broken links to studios
#21014 merged
Jul 28, 2025 -
fix CI: awscli docutils version conflict
#20997 merged
Jul 26, 2025 -
docs: updating flaking links
#20980 merged
Jul 23, 2025 -
build(deps): update tensorboard requirement from <2.20.0,>=2.9.1 to >=2.9.1,<2.21.0 in /requirements
#20992 merged
Jul 22, 2025 -
build(deps): bump mypy from 1.16.1 to 1.17.0 in /requirements
#20991 merged
Jul 22, 2025 -
Allow
dataloader_idx_
in log names whenadd_dataloader_idx=False
#20987 merged
Jul 18, 2025 -
fix: failing markdown link test in ci
#20979 merged
Jul 14, 2025 -
docs: update ref to latest tutorials
#20977 merged
Jul 14, 2025 -
Add support nvcr.io/nvidia/pytorch:25.06-py3
#20971 merged
Jul 11, 2025 -
Model checkpointing
save_on_train_epoch_end
default behavior documentation#20931 merged
Jul 9, 2025 -
Fix: Allow trainer to accept CUDAAccelerator instance as accelerator with FSDP strategy
#20964 merged
Jul 9, 2025 -
Add dev env setup guide
#20961 merged
Jul 9, 2025 -
build(deps): bump coverage from 7.9.1 to 7.9.2 in /requirements
#20966 merged
Jul 7, 2025 -
build(deps): update awscli requirement from <1.41.0,>=1.30.0 to >=1.30.0,<1.42.0 in /requirements
#20965 merged
Jul 7, 2025
10 Pull requests opened by 6 people
-
[pre-commit.ci] pre-commit suggestions
#20968 opened
Jul 7, 2025 -
fix: remove extra parameter in accelerator registry decorator
#20975 opened
Jul 11, 2025 -
Fix MLFlowLogger.save_dir Windows file URI handling (Fixes #20972)
#20988 opened
Jul 19, 2025 -
nitpick: add make command to quickly setup the project on `lightning studio`
#20996 opened
Jul 23, 2025 -
docker: simplify the docker name with CUDA
#21001 opened
Jul 25, 2025 -
add/debug Lit CI [wip]
#21002 opened
Jul 25, 2025 -
docs: update mail to developer@lightning.ai
#21003 opened
Jul 26, 2025 -
Allow `training_step` in manual optimization to return general mappings
#21011 opened
Jul 28, 2025 -
Sync dist clarification and consistency
#21012 opened
Jul 28, 2025 -
Fix fabric examples and load_checkpoint hparams ref
#21013 opened
Jul 28, 2025
15 Issues closed by 4 people
-
bugs too many
#20875 closed
Aug 4, 2025 -
Documentation or main page is not loading [not available in your region]
#20989 closed
Aug 1, 2025 -
Rich progress_bar_id is None if restore training state from a step checkpoint
#21015 closed
Jul 31, 2025 -
Missing jsonargparse as dependency
#21018 closed
Jul 31, 2025 -
Rich progress bar error when resume training
#20976 closed
Jul 31, 2025 -
Spend a lot of time to load large ckpt
#21017 closed
Jul 31, 2025 -
on_validation_batch_end is not called when Loss is NaN
#20999 closed
Jul 30, 2025 -
Allow user to use `dataloader_idx` in log name in `LightningModule.log`
#20485 closed
Jul 18, 2025 -
`load_from_checkpoint` returns `None`
#20607 closed
Jul 17, 2025 -
Lightning is requiring packaging < 25.0
#20772 closed
Jul 14, 2025 -
`ModelCheckpoint`'s argument `save_on_train_epoch_end`'s documentation unclear when value is `None`
#20781 closed
Jul 9, 2025 -
Strategy `fsdp` requires a GPU accelerator, but got CUDAAccelerator
#20957 closed
Jul 9, 2025 -
Recommend dev setup / support uv
#20954 closed
Jul 9, 2025 -
lightning throws an exception on MacOS when the pytorch default device is set
#20696 closed
Jul 7, 2025
16 Issues opened by 15 people
-
DDP Strategy Does Not Automatically Shard Batch Sizes Despite Documentation Claims
#21023 opened
Aug 4, 2025 -
Trainer parameter limit-train-batches was meant to be per-worker
#21022 opened
Aug 3, 2025 -
The difference of Trainer.test with ddp strategy
#21004 opened
Jul 27, 2025 -
Remove an unnecessary TODO in `src/lightning/pytorch/loops/fit_loop.py`
#20998 opened
Jul 24, 2025 -
Changing `on_step` in `self.log` causes `batch_to_device` to change
#20995 opened
Jul 23, 2025 -
Support BatchSizeFinder in DDP
#20994 opened
Jul 22, 2025 -
Accept `TensorDict` (or more generally, dict-like's) as a `training_step` return type
#20993 opened
Jul 22, 2025 -
When the model is in an eval state before calling trainer.fit it should be moved to train state
#20986 opened
Jul 18, 2025 -
Doing full validation on step 0
#20985 opened
Jul 17, 2025 -
uv for CI
#20984 opened
Jul 16, 2025 -
MoE (mixture of experts) support for expert parallel
#20982 opened
Jul 15, 2025 -
Accelerator registry decorator usage fails with TypeError due to incorrect function signature
#20973 opened
Jul 11, 2025 -
MLFlowLogger.save_dir mishandles absolute file: URIs on Windows
#20972 opened
Jul 10, 2025 -
Proper way to use mixed precision with manual optimization
#20970 opened
Jul 9, 2025 -
Recommend uv commands for development scripts
#20969 opened
Jul 9, 2025 -
Improve Fault Tolerance via TorchFT
#20967 opened
Jul 7, 2025
72 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Fix double iteration bug when resumed from a checkpoint.
#20775 commented on
Aug 4, 2025 • 6 new comments -
Generic weight averaging callback that supports EMA
#20545 commented on
Jul 17, 2025 • 1 new comment -
Added warmup parameter to early stopping cb
#20778 commented on
Jul 19, 2025 • 0 new comments -
[Not finished] Allow customized parameter grouping for automatic optimzier configuration.
#20742 commented on
Jul 19, 2025 • 0 new comments -
fix: `overfit_batches` uses same batch for train and val
#20731 commented on
Jul 19, 2025 • 0 new comments -
Cpu memory accumulation bug
#20730 commented on
Jul 19, 2025 • 0 new comments -
docs(LightningModule): update docs for `.training` mode in loops
#20716 commented on
Jul 19, 2025 • 0 new comments -
FabricModule: wrap forward methods instead of monkeypatch-based redirect
#20711 commented on
Aug 4, 2025 • 0 new comments -
Added support for flushing Comet experiment data to the Comet after saving a checkpoint.
#20680 commented on
Jul 7, 2025 • 0 new comments -
feat[logger] update mlflow limit for parameters length log
#20636 commented on
Jul 16, 2025 • 0 new comments -
fix(mlflow): Enabling multiple callbacks for checkpoint reporting
#20585 commented on
Jul 19, 2025 • 0 new comments -
updated `ModelCheckpoint` to add the facility of retaining periodic checkpoints
#20547 commented on
Jul 10, 2025 • 0 new comments -
Add Deepspeed Zero 3 MiCS support (Issues #20378)
#20461 commented on
Jul 19, 2025 • 0 new comments -
Add `best_k_metrics` parameter to the `ModelCheckpoint`
#20457 commented on
Jul 19, 2025 • 0 new comments -
Call configure_module before freeze_before_training
#20428 commented on
Jul 19, 2025 • 0 new comments -
[Backend]: Support device backend registration for a wide range of third-party hardware
#20349 commented on
Jul 24, 2025 • 0 new comments -
Add compile_fn parameter for Trainer
#20269 commented on
Jul 19, 2025 • 0 new comments -
Feat: support reusable instance of `ModelCheckpoint`
#20202 commented on
Jul 19, 2025 • 0 new comments -
Fix `save_last` behavior in absence of validation
#20960 commented on
Jul 7, 2025 • 0 new comments -
Make asyncio checkpointing work if validate/fit is called more than once
#20952 commented on
Aug 4, 2025 • 0 new comments -
docs(csv_logs): Clarify CSV and YAML logging distinction and improve examples
#20951 commented on
Jul 14, 2025 • 0 new comments -
update ModelSummary
#20945 commented on
Jul 26, 2025 • 0 new comments -
Fix wrong behavior of `DDPStrategy` option with simple GAN training using DDP
#20936 commented on
Jul 19, 2025 • 0 new comments -
Fix: `no_grad` with AMP bug
#20921 commented on
Jul 7, 2025 • 0 new comments -
Add `save_on_exception` option to `ModelCheckpoint`
#20916 commented on
Aug 2, 2025 • 0 new comments -
feat: Default to RichProgressBar and RichModelSummary if rich is avai…
#20896 commented on
Aug 4, 2025 • 0 new comments -
DOC: Clarify DeviceStatsMonitor logged metrics
#20895 commented on
Jul 19, 2025 • 0 new comments -
bugfix: add support for `global_ordinal`, `local_ordinal`, `world_size` in xla
#20872 commented on
Aug 4, 2025 • 0 new comments -
PR: Fix Duplicate Metric Logging in MLFlowLogger to Prevent MLflow Database Errors
#20871 commented on
Jul 23, 2025 • 0 new comments -
Add documentation warning: Don’t use torch.profiler.profile context manager around Trainer methods
#20864 commented on
Jul 19, 2025 • 0 new comments -
Fix: Respect `required=False` in `add_lightning_class_args` when `subclass_mode=False`
#20856 commented on
Jul 19, 2025 • 0 new comments -
Add Callback for Opacus integration
#20853 commented on
Jul 19, 2025 • 0 new comments -
to_onnx return ONNXProgram
#20811 commented on
Aug 4, 2025 • 0 new comments -
Fix advanced profiler for python >=3.12
#20809 commented on
Aug 4, 2025 • 0 new comments -
Torch-Tensorrt Integration with LightningModule
#20808 commented on
Jul 23, 2025 • 0 new comments -
Support `grad_clip_norm_()` for FSDP
#20784 commented on
Jul 19, 2025 • 0 new comments -
Restoring Trainer State with Early Stop fails
#13225 commented on
Jul 19, 2025 • 0 new comments -
ReduceLROnPlateu within configure_optimizers behave abnormally
#20829 commented on
Jul 19, 2025 • 0 new comments -
Support PyTorch/XLA 2.7
#20852 commented on
Jul 19, 2025 • 0 new comments -
The progress bar shows wrong length when using multiple dataloaders mixing dataset and iterable dataset
#20695 commented on
Jul 19, 2025 • 0 new comments -
Cannot call self.log in evaluation_hooks after using trainer.predict, even if using a new trainer object.
#19101 commented on
Jul 19, 2025 • 0 new comments -
Error when learning on tpu
#20891 commented on
Jul 19, 2025 • 0 new comments -
RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: Invalid --2a886c8_slice_builder_worker_addresses specified. Expected 4 worker addresses, got 1.
#20244 commented on
Jul 19, 2025 • 0 new comments -
Warnings when learning on tpu
#20890 commented on
Jul 19, 2025 • 0 new comments -
Weird bug when setting `val_check_interval` dynamically in `setup()`
#20894 commented on
Jul 19, 2025 • 0 new comments -
Tqdm print multi lines with refresh
#20909 commented on
Jul 19, 2025 • 0 new comments -
Logging in `on_test_epoch_end` with multiple dataloaders
#20885 commented on
Jul 19, 2025 • 0 new comments -
Ignore Keyword Arguments Outside of Callback Signature During `Fabric.call`
#20915 commented on
Jul 19, 2025 • 0 new comments -
Global step reset when restoring checkpoints with trainer.validate
#17127 commented on
Jul 15, 2025 • 0 new comments -
`ModelCheckpoint` not saving best model
#20657 commented on
Jul 13, 2025 • 0 new comments -
MLFlow logger with remote tracking fails with CLI
#16310 commented on
Jul 11, 2025 • 0 new comments -
stateful dataloaders do not load their state_dict if self.trainer.estimated_stepping_batches called beforehand
#20550 commented on
Jul 9, 2025 • 0 new comments -
Inconcistency in loading from checkpoint in LightningCLI
#20801 commented on
Jul 9, 2025 • 0 new comments -
Metrics get mapped twice to the same epoch in MLflow logger
#20902 commented on
Jul 7, 2025 • 0 new comments -
docs: fixed the `init_module` and deepspeed
#20175 commented on
Jul 19, 2025 • 0 new comments -
Call `configure_model` from LightningCLI
#19111 commented on
Jul 19, 2025 • 0 new comments -
deprecation: Is `frequency` key necessary in `lr_scheduler_config`?
#20714 commented on
Aug 4, 2025 • 0 new comments -
PyTorchProfiler: not showing CPU memory used even with `profile_memory=True`
#20339 commented on
Aug 1, 2025 • 0 new comments -
Validation takes place every N time
#13324 commented on
Aug 1, 2025 • 0 new comments -
LearningRateMonitor broken on MPS backend with Apple silicon
#20250 commented on
Jul 31, 2025 • 0 new comments -
Resuming should allow to differentiate what to resume (steps/opti/weights)
#5339 commented on
Jul 29, 2025 • 0 new comments -
Model diverges or struggles to converge with complex-valued tensors in DDP
#20480 commented on
Jul 28, 2025 • 0 new comments -
Gradient accumulation calcluation may be incorrect
#20350 commented on
Jul 28, 2025 • 0 new comments -
Error using wandb when learning on tpu
#20880 commented on
Jul 27, 2025 • 0 new comments -
Light / dark mode for documentation
#20396 commented on
Jul 23, 2025 • 0 new comments -
Parameters and Gradient is not logged by WandB under FSDP strategy
#17512 commented on
Jul 21, 2025 • 0 new comments -
deepspeed strategy can't save checkpoint, TypeError: cannot pickle `torch._C._distributed_c10d.ProcessGroup` object
#17369 commented on
Jul 19, 2025 • 0 new comments -
diff-svc(winerror3 when the training starts)
#20849 commented on
Jul 19, 2025 • 0 new comments -
Fabric FSDP with bitsandbytes plugin is not supported
#20855 commented on
Jul 19, 2025 • 0 new comments -
`add_lightning_class_args` `required` argument ignored if not using subclass mode
#20851 commented on
Jul 19, 2025 • 0 new comments -
Mlflow logging LR duplicate key issue with PostgreSQL DB #190
#20865 commented on
Jul 19, 2025 • 0 new comments -
lightning.fabric.utilities.exceptions.MisconfigurationException: No supported gpu backend found! Maybe latest gpu compability issue..?
#20626 commented on
Jul 19, 2025 • 0 new comments