-
Notifications
You must be signed in to change notification settings - Fork 55
feat(RHOAIENG-26487): Cluster lifecycling via RayJob #873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ray-jobs-feature
Are you sure you want to change the base?
feat(RHOAIENG-26487): Cluster lifecycling via RayJob #873
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## ray-jobs-feature #873 +/- ##
====================================================
+ Coverage 93.06% 93.27% +0.21%
====================================================
Files 28 28
Lines 1513 1561 +48
====================================================
+ Hits 1408 1456 +48
Misses 105 105 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4cf653d
to
4b60884
Compare
Signed-off-by: Pat O'Connor <paoconno@redhat.com>
486a6bd
to
0add890
Compare
/hold |
9a01141
to
0add890
Compare
/retest |
I've verified this change works as described |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some requested changes for reference @chipspeak. I also left random reminders for myself when I finish this PR off next week.
cc: @LilyLinh
if cluster_config is not None: | ||
# Ensure cluster config has the same namespace as the job | ||
if cluster_config.namespace is None: | ||
cluster_config.namespace = namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to use ClusterConfiguration for creating the rayClusterSpec
, we should either:
- don't take a Namespace in the ClusterConfig as Kuberay will create the Cluster in the same namespace as the RayJob
or
- remove the requirement for
k8s_namespace
for the new RayJob object.
Lets put namespace in just one place and let the RayCluster inherit it from the job. I will add this next week when updating this PR
""" | ||
# Validate required parameters | ||
if not self.entrypoint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: add validation for runtime_env
nit: add helper functions to validate shape etc of runtime_env and entrypoint
temp_config.write_to_file = False | ||
|
||
# Create a minimal Cluster object for the build process | ||
from ..cluster.cluster import Cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move in-function imports to top of file? unsure of best python practice there
# Create a minimal Cluster object for the build process | ||
from ..cluster.cluster import Cluster | ||
|
||
temp_cluster = Cluster.__new__(Cluster) # Create without calling __init__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird syntax here although it might just be me. I'll look into this
|
||
# Extract just the RayCluster spec - RayJob CRD doesn't support metadata in rayClusterSpec | ||
# Note: CodeFlare Operator should still create dashboard routes for the RayCluster | ||
ray_cluster_spec = ray_cluster_dict["spec"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
propogate the metadata from the ray job here. Anything we need for the cluster, add to the rayjob
Issue link
Jira
What changes have been made
Support has been added to submit a RayJob that will create and lifecycle its own cluster.
Verification steps
Prerequisites
oc login
Steps
poetry build
.Checks