Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: sql-machine-learning/elasticdl
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: develop
Choose a base ref
...
head repository: sql-machine-learning/elasticdl
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.2.0
Choose a head ref
  • 6 commits
  • 20 files changed
  • 3 contributors

Commits on Jan 8, 2021

  1. Configuration menu
    Copy the full SHA
    d99a31d View commit details
    Browse the repository at this point in the history
  2. Bump version (#2478)

    brightcoder01 authored Jan 8, 2021
    Configuration menu
    Copy the full SHA
    a4f6fb0 View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2021

  1. Configuration menu
    Copy the full SHA
    c65bb9e View commit details
    Browse the repository at this point in the history

Commits on Jan 14, 2021

  1. Implement the fail fast mechanism of master. (#2480) (#2482)

    * Add TFV1TrainLoopMonitorCallback.
    
    * Remove the parameter num_workers in TFV1TrainLoopMonitorCallback.is_critical_pod
    
    * Update comments.
    
    * Merge two if conditions into one.
    
    * Composite TFV1PSStrategyTrainLoopMonitorCallback if _is_tfv1_ps_strategy_training is True.
    
    * Set the exit_code according to the success parameter in Master.request_stop.
    
    * Add message content in request_stop.
    
    * Organize the master exit logic.
    
    * Resolve test failure.
    
    * Resolve typo.
    brightcoder01 authored Jan 14, 2021
    Configuration menu
    Copy the full SHA
    c555da9 View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2021

  1. merge develop to v0.2.0 before disable go ps test (#2491)

    * Update the version releasing doc. (#2474)
    
    * Update the release step: cherry-pick the fix commit from develop to release branch
    
    * Fix typo
    
    * Update the versioning doc.
    
    * Add cluster_spec_json in EXCLUDE_PRINT_ARGS (#2479)
    
    * Implement the fail fast mechanism of master. (#2480)
    
    * Add TFV1TrainLoopMonitorCallback.
    
    * Remove the parameter num_workers in TFV1TrainLoopMonitorCallback.is_critical_pod
    
    * Update comments.
    
    * Merge two if conditions into one.
    
    * Composite TFV1PSStrategyTrainLoopMonitorCallback if _is_tfv1_ps_strategy_training is True.
    
    * Set the exit_code according to the success parameter in Master.request_stop.
    
    * Add message content in request_stop.
    
    * Organize the master exit logic.
    
    * Resolve test failure.
    
    * Resolve typo.
    
    * add pod status change log (#2483)
    
    * Fix model cuda (#2484)
    
    * Relaunch worker on failure (#2485)
    
    * relaunch worker on failure
    
    * only relaunch in PS strategy
    
    * Create an ElasticImageFolder for PyTorch. (#2486)
    
    * Develop an image folder dataset for PyTorch
    
    * Add docstring
    
    * Check whether to register hooks according to HOROVOD_ELASTIC (#2487)
    
    * Check whether to register hooks according to HOROVOD_ELASTIC
    
    * Register hooks
    
    * Remove "elasticdl-" prefix to ps/worker pod name (#2489)
    
    * remove prefix to ps/worker pod name
    
    * fix tests
    
    * fix black
    
    Co-authored-by: brightcoder01 <55301748+brightcoder01@users.noreply.github.com>
    Co-authored-by: Qinlong Wang <WangQL1201@outlook.com>
    3 people authored Jan 22, 2021
    Configuration menu
    Copy the full SHA
    5b8c196 View commit details
    Browse the repository at this point in the history
  2. Sync (#2493)

    * Update the version releasing doc. (#2474)
    
    * Update the release step: cherry-pick the fix commit from develop to release branch
    
    * Fix typo
    
    * Update the versioning doc.
    
    * Add cluster_spec_json in EXCLUDE_PRINT_ARGS (#2479)
    
    * Implement the fail fast mechanism of master. (#2480)
    
    * Add TFV1TrainLoopMonitorCallback.
    
    * Remove the parameter num_workers in TFV1TrainLoopMonitorCallback.is_critical_pod
    
    * Update comments.
    
    * Merge two if conditions into one.
    
    * Composite TFV1PSStrategyTrainLoopMonitorCallback if _is_tfv1_ps_strategy_training is True.
    
    * Set the exit_code according to the success parameter in Master.request_stop.
    
    * Add message content in request_stop.
    
    * Organize the master exit logic.
    
    * Resolve test failure.
    
    * Resolve typo.
    
    * add pod status change log (#2483)
    
    * Fix model cuda (#2484)
    
    * Relaunch worker on failure (#2485)
    
    * relaunch worker on failure
    
    * only relaunch in PS strategy
    
    * Create an ElasticImageFolder for PyTorch. (#2486)
    
    * Develop an image folder dataset for PyTorch
    
    * Add docstring
    
    * Check whether to register hooks according to HOROVOD_ELASTIC (#2487)
    
    * Check whether to register hooks according to HOROVOD_ELASTIC
    
    * Register hooks
    
    * Remove "elasticdl-" prefix to ps/worker pod name (#2489)
    
    * remove prefix to ps/worker pod name
    
    * fix tests
    
    * fix black
    
    * Develop an API to get training epoch (#2488)
    
    * Check whether to register hooks according to HOROVOD_ELASTIC
    
    * Develop an API to get training epoch
    
    * Register hooks
    
    * Add unittest
    
    * Fic by comments
    
    * Fix unittest
    
    Co-authored-by: brightcoder01 <55301748+brightcoder01@users.noreply.github.com>
    Co-authored-by: HT <tenn_2001c@yahoo.com>
    3 people authored Jan 22, 2021
    Configuration menu
    Copy the full SHA
    2d2547f View commit details
    Browse the repository at this point in the history
Loading