-
Notifications
You must be signed in to change notification settings - Fork 116
Permalink
Choose a base ref
{{ refName }}
default
Choose a head ref
{{ refName }}
default
Comparing changes
Choose two branches to see what’s changed or to start a new pull request.
If you need to, you can also or
learn more about diff comparisons.
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Learn more about diff comparisons here.
base repository: sql-machine-learning/elasticdl
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: develop
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
...
head repository: sql-machine-learning/elasticdl
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.2.0
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
- 6 commits
- 20 files changed
- 3 contributors
Commits on Jan 8, 2021
-
Configuration menu - View commit details
-
Copy full SHA for d99a31d - Browse repository at this point
Copy the full SHA d99a31dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a4f6fb0 - Browse repository at this point
Copy the full SHA a4f6fb0View commit details
Commits on Jan 13, 2021
-
Configuration menu - View commit details
-
Copy full SHA for c65bb9e - Browse repository at this point
Copy the full SHA c65bb9eView commit details
Commits on Jan 14, 2021
-
Implement the fail fast mechanism of master. (#2480) (#2482)
* Add TFV1TrainLoopMonitorCallback. * Remove the parameter num_workers in TFV1TrainLoopMonitorCallback.is_critical_pod * Update comments. * Merge two if conditions into one. * Composite TFV1PSStrategyTrainLoopMonitorCallback if _is_tfv1_ps_strategy_training is True. * Set the exit_code according to the success parameter in Master.request_stop. * Add message content in request_stop. * Organize the master exit logic. * Resolve test failure. * Resolve typo.
Configuration menu - View commit details
-
Copy full SHA for c555da9 - Browse repository at this point
Copy the full SHA c555da9View commit details
Commits on Jan 22, 2021
-
merge develop to v0.2.0 before disable go ps test (#2491)
* Update the version releasing doc. (#2474) * Update the release step: cherry-pick the fix commit from develop to release branch * Fix typo * Update the versioning doc. * Add cluster_spec_json in EXCLUDE_PRINT_ARGS (#2479) * Implement the fail fast mechanism of master. (#2480) * Add TFV1TrainLoopMonitorCallback. * Remove the parameter num_workers in TFV1TrainLoopMonitorCallback.is_critical_pod * Update comments. * Merge two if conditions into one. * Composite TFV1PSStrategyTrainLoopMonitorCallback if _is_tfv1_ps_strategy_training is True. * Set the exit_code according to the success parameter in Master.request_stop. * Add message content in request_stop. * Organize the master exit logic. * Resolve test failure. * Resolve typo. * add pod status change log (#2483) * Fix model cuda (#2484) * Relaunch worker on failure (#2485) * relaunch worker on failure * only relaunch in PS strategy * Create an ElasticImageFolder for PyTorch. (#2486) * Develop an image folder dataset for PyTorch * Add docstring * Check whether to register hooks according to HOROVOD_ELASTIC (#2487) * Check whether to register hooks according to HOROVOD_ELASTIC * Register hooks * Remove "elasticdl-" prefix to ps/worker pod name (#2489) * remove prefix to ps/worker pod name * fix tests * fix black Co-authored-by: brightcoder01 <55301748+brightcoder01@users.noreply.github.com> Co-authored-by: Qinlong Wang <WangQL1201@outlook.com>
Configuration menu - View commit details
-
Copy full SHA for 5b8c196 - Browse repository at this point
Copy the full SHA 5b8c196View commit details -
* Update the version releasing doc. (#2474) * Update the release step: cherry-pick the fix commit from develop to release branch * Fix typo * Update the versioning doc. * Add cluster_spec_json in EXCLUDE_PRINT_ARGS (#2479) * Implement the fail fast mechanism of master. (#2480) * Add TFV1TrainLoopMonitorCallback. * Remove the parameter num_workers in TFV1TrainLoopMonitorCallback.is_critical_pod * Update comments. * Merge two if conditions into one. * Composite TFV1PSStrategyTrainLoopMonitorCallback if _is_tfv1_ps_strategy_training is True. * Set the exit_code according to the success parameter in Master.request_stop. * Add message content in request_stop. * Organize the master exit logic. * Resolve test failure. * Resolve typo. * add pod status change log (#2483) * Fix model cuda (#2484) * Relaunch worker on failure (#2485) * relaunch worker on failure * only relaunch in PS strategy * Create an ElasticImageFolder for PyTorch. (#2486) * Develop an image folder dataset for PyTorch * Add docstring * Check whether to register hooks according to HOROVOD_ELASTIC (#2487) * Check whether to register hooks according to HOROVOD_ELASTIC * Register hooks * Remove "elasticdl-" prefix to ps/worker pod name (#2489) * remove prefix to ps/worker pod name * fix tests * fix black * Develop an API to get training epoch (#2488) * Check whether to register hooks according to HOROVOD_ELASTIC * Develop an API to get training epoch * Register hooks * Add unittest * Fic by comments * Fix unittest Co-authored-by: brightcoder01 <55301748+brightcoder01@users.noreply.github.com> Co-authored-by: HT <tenn_2001c@yahoo.com>
Configuration menu - View commit details
-
Copy full SHA for 2d2547f - Browse repository at this point
Copy the full SHA 2d2547fView commit details
Loading
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff develop...v0.2.0