Add reason to cancelation of a job#1074
Conversation
|
Thanks for the contribution. I have the following comments. Job is a kind of namespace for tasks. Persistent states such as WAITING/RUNNING/CANCELLED are states of tasks, not jobs. Jobs have only an inferred state (WAITING/RUNNING/CANCELLED) depending on the state of tasks within the job. Your change disrupts this, so now you can have a Job in a canceled state (your newly introduced state) while there are no canceled tasks within it. This is something unwanted. I suggest the following refactoring:
Additionally:
Just thinking: My original idea was to put the message into the task state (not the job state), because technically we can cancel different tasks for different reasons. But actually users usually have just a single reason to cancel a job, so it seems fine to have just one message for the whole job. |
|
Okey, will do :). |
|
I personally like If a job is cancelled multiple times (this doesn't really make sense, but someone could do it in theory), we can either error out or overwrite any previous cancel reason. |
|
It is tricky because the cancellation predates open jobs. But in the current version you can do this:
So we create an open job, submit something, cancel the job (=cancel all current tasks within the task) but job is still open, submit new tasks into the same job. And we can cancel these new jobs again. Open jobs makes all these things more complex, on the other hand, as HQ is optimized for having rather small number of jobs (and many tasks), we do not want encourage user to create a new job when they needs "just" canceling the tasks. In the mentioned case with multiple cancellation, I am totally ok with overwriting the old message. |
|
Side note: I just realized that it also needs some updates in the journal, to not lost the message in case of server crash. We can postpone it to an additional PR, since it is nontrivial as current cancellation event tracks individual tasks, not whole jobs. |
|
Okey, makes more sense now what the open really means :). I think the option of overwriting might be confusing at times, but as you mentioned, it doesn't happens that often so I think if it will be documented then it is not that big of a deal. Thanks for the feedback About the journaling, I can try to take a look at that while at it. But I will see how complex it will be :). |
Ok. I do not have strong opinion on this. |
|
One last question/idea before i jump into it. Maybe there could be like a param to the |
|
I have no problem with showing it in |
1165aa5 to
41e58c5
Compare
|
I have added the I have also updated the I have added 2 test suits to test the added functionality of the Documentation was also updated, to atleast mention the new additions. But let me know if this is enough :). I haven't yet looked into the journaling of the cancelation but during the implementation, i have come across Also seems like some of the tests have been broken, by the added changes, I will try to look at it asap. If you have some recomendations about it, it would sure help :). While I was testing it locally, I also ran into some troubles with the tests, but some of them were timeouts, so I thought, I was missing some crutial part locally, like pbs or slurm. But seems like that might not be the case |
41e58c5 to
6ac79eb
Compare
|
There were a small bug, due to the removal of is_open. That was fixed in this last push |
Adds a optional argument to
hq job cancel <ID_Selector> --reason <String>requested in #1001.Currently set to max 32 chars in the cli output mode.
With the change of the representation of the Job state, the json output has been also slightly changed to not just outputs if the Job is opened, but the state of the Job. I maybe find that the Job could have all of the states represented in the newly added JobState, and not transfered through the Status enum. But it is true the state might change quite often.
Preview of current state of the

hq job list --all