Proto commits in PKUHPC/CraneSched-FrontEnd

These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)

Commit:de0fce6
Author:Junlin Li
Committer:GitHub

feat(requeue): add frontend requeue support and follow-up fixes (#431)

The documentation is generated from this commit.

Commit:4f315c2
Author:junlinli
Committer:junlinli

feat(requeue): add frontend requeue support and follow-up fixes

The documentation is generated from this commit.

Commit:8a73029
Author:huerni
Committer:GitHub

feat: Add [account/user-partition] resource limit (#287) * feat: add partition resource limits and modify account handling * feat: add support for --show-partition flag in global flags * fix: format ModifyAccountRequest initialization for consistency * fix: standardize comment formatting in proto files for clarity * fix: add missing newline at end of proto files * fix: rename flag from --show-partition to --partition-limit for clarity * feat: add partition-specific resource limit error codes and messages * docs/test: Add partition resource limit help and test script - Update help.go to document new partition resource limit commands: - Add --partition-limit/-P flag description in show account/user - Add account partition resource limit options in modify section - Add user partition resource limit options in modify section - Add --partition-limit/-P to GLOBAL OPTIONS - Add scripts/test_partition_limit.sh: - Test setting account partition limits (maxJobs, maxSubmitJobs, maxTres, maxTresPerJob, maxWall, maxWallPerJob) - Test setting user partition limits (same fields) - Test show partition limits with --partition-limit and -P flags - Test error cases (missing partition in where clause) - Test job submission enforcement of partition limits - Test resetting partition limits * fix: remove newline from error messages and add comments for maxWall limit clarification * fix: add missing newlines to error messages for partition-specific resource limits * fix: remove test script for partition-specific resource limits * fix: update partition-specific resource limit error codes for consistency * fix: update MaxTresPerJob and MaxWallDurationPerJob enum values for resource limits * fix: add error message for PMIx error in error message ma

Commit:a42e507
Author:RileyWen

proto: sync frontend definitions with backend

Commit:9c3f389
Author:RileyWen

Merge remote-tracking branch 'origin/master' into feat/metrics # Conflicts: # protos/Crane.proto # protos/PublicDefs.proto

Commit:f495dcb
Author:RileyWen

trace: add queue count and writer stats

Commit:776fdc4
Author:NamelessOIer
Committer:NamelessOIer

chore: sync step launch mode proto

Commit:93482b8
Author:huerni
Committer:huerni

feat: add partition-specific resource limit error codes and messages

Commit:24ab4db
Author:huerni
Committer:huerni

fix: standardize comment formatting in proto files for clarity

Commit:22a50c5
Author:huerni
Committer:huerni

fix: add missing newline at end of proto files

Commit:db8d8b7
Author:huerni
Committer:huerni

fix: update partition-specific resource limit error codes for consistency

Commit:6feb90a
Author:huerni
Committer:huerni

fix: update MaxTresPerJob and MaxWallDurationPerJob enum values for resource limits

Commit:6f99586
Author:huerni
Committer:huerni

feat: add partition resource limits and modify account handling

Commit:642c7c3
Author:huerni
Committer:GitHub

feat: cattach (#325) * feat * feat: cancel task when cfored restart * feat: cattach recive msg * feat: when task finish cattach close * feat: cattach input * fix some bug * feat: add cattach history * refactor * feat: ctrl+c not input * fix: multi node * feat: --layout * refactor * merge master * feat * refactor ctldReplyChannelMapForCattachByStep * feat: cattach connect step * fix: execCranedIds is empty * fix: uid * refactor(cattach,cfored): rename task-based proto types to step-based Replace `TASK_CONNECT_REQUEST/REPLY` with `STEP_CONNECT_REQUEST/REPLY`, `TASK_COMPLETION_ACK_REPLY` with `STEP_COMPLETION_ACK_REPLY`, and `TASK_X11_FORWARD` with `STEP_X11_FORWARD` across cattach and cfored to align with updated proto definitions. Also remove redundant `TaskId` fields from IO and X11 forward request payloads, and fix stale log messages that referenced wrong component names (Crun instead of Cattach). * feat(cattach): handle task exit status and step cancel request Add handling for `TASK_EXIT_STATUS` reply in the `StateForwarding` method to capture and report non-zero exit codes. If the task was signaled, a "Terminated" message is printed; otherwise, the specific exit code is reported. The exit code is stored in `m.err` for propagation. Also add handling for `STEP_CANCEL_REQUEST` to log a trace message and passively wait for `STEP_COMPLETION_ACK_REPLY` instead of actively closing the step. * refactor(cfored): rename proto types to align with step/job semantics - Rename `StreamTaskIORequest` to `StreamStepIORequest` across cattach_server.go and server.go - Rename `TASK_X11_OUTPUT` to `STEP_X11_OUTPUT` and update X11 forward reply to include `CranedId` and `LocalId` fields - Rename `TASK_CANCEL_REQUEST` to `JOB_CANCEL_REQUEST` and `TASK_COMPLETION_REQUEST` to `JOB_COMPLETION_REQUEST` in ctld_client.go - Replace `InteractiveTaskType` with `InteractiveJobType` in completion request payloads - Clean up duplicate imports in cfored.go * refactor(cfored): consolidate X11 I/O forwarding and fix stream error handling - Remove `forwardRemoteIoToCrun` function and replace its usage with the existing `forwardRemoteIoToFront` for X11-related messages, eliminating redundant forwarding logic - Fix misplaced `toSupervisorStream.Send` error handling block: move it from the `cattachReq` branch to the `crunReq` branch where it belongs - Fix indentation of `channel <- nil` in `SupervisorCrashAndRemoveAllChannel` - Reorder imports in `cfored.go` and `x11.go` to follow Go conventions (stdlib first, then internal, then third-party) - Remove stale TODO comment in `StepIOStream` * fix(cfored): handle TASK_EXIT_STATUS and X11 msg types in cattach Previously, the default case would call `log.Fatalf` and crash cfored upon receiving unexpected message types. This change explicitly handles `TASK_EXIT_STATUS` by skipping it (completion is signaled via `JOB_COMPLETION_ACK_REPLY`), adds stub handling for `STEP_X11_CONN` and `STEP_X11_EOF` with TODO notes for future support, and replaces `log.Fatalf` with `log.Errorf` in the default case to avoid crashing cfored on truly unexpected message types. * fix(cattach): handle step completion race before IO forwarding - Add early detection in CattachWaitIOForward to check if the crun channel is still active before attempting to wait for supervisor channels, preventing indefinite blocking when a step completes early - Add a non-blocking pre-forwarding check for JOB_COMPLETION_ACK_REPLY from ctld to handle the case where the step finishes between state transitions - Add a second crun channel liveness check after the readyChannel wait to catch completions that occur during supervisor setup - Remove unused `forwardEstablished` atomic bool that was never read - Simplify switch statement indentation in cattach StateForwarding and add a default warning log for unhandled message types - Remove TASK_EXIT_STATUS and STEP_CANCEL_REQUEST handling from cattach StateForwarding as exit status reporting is now handled elsewhere * feat(cattach): add --label, --output-filter, --input-filter, and --quiet flags Introduce `TaskOutputMsg` to carry task ID alongside output data through internal channels, enabling per-task I/O control: - `--label`: prepend `[task_id]: ` to each output line via `applyLabel` - `--output-filter`: suppress output from tasks other than the specified one - `--input-filter`: direct stdin to a specific task instead of broadcasting - `--quiet`: suppress the "Task io forward ready" status message * refactor(cfored): rename task states to step/job in cattach server Replace `CattachWaitTaskMeta` with `CattachWaitStepMeta` and `CattachWaitTaskComplete` with `CattachWaitJobComplete` to align state names with step/job terminology. Also replace incorrect `End` state transitions with `DeadCattach` for broken connection handling, and update log messages to reflect the new naming conventions. * fix(cattach/cfored): fix goroutine leaks and missing cleanup on task end Call `taskFinishCb()` before transitioning to the End state in `StateForwarding` to ensure all IO goroutines (StdinReaderRoutine, StdoutWriterRoutine, input-forward) are signaled to exit cleanly and don't leak when the task finishes, encounters an error, or the Cfored connection is broken. Also add `broadcastStopWaiting()` in cfored's cattach and crun server state machines to wake up goroutines blocked in `waitSupervisorChannelsReady` so they can observe `stopWaiting == true` and exit promptly after receiving a completion/cancel reply from Ctld. * fix(cattach/crun/cfored): prevent channel deadlock on Ctrl+C or slow terminal Replace bare channel sends in `StateForwarding` and `StateWaitAck` with `select` statements that also listen on a cancellation context (`taskFinishCtx`/`stopStepCtx`). Without this guard, a slow terminal fills `chanOutputFromRemote`, permanently blocking the forwarding loop and making SIGINT/Ctrl+C ineffective. Additional fixes: - Pre-initialize `stopStepCtx`/`stopStepCb` in `crun.Init()` to avoid nil-pointer dereferences when `StateWaitAck` is reached without ever entering `StateForwarding`. - Add a `globalCtx.Done()` case in cfored's `WAIT_TASK_IO_FORWARD` loop so cfored transitions to `CancelJobOfDeadCrun` immediately on shutdown instead of racing against a synthetic `JOB_CANCEL_REQUEST` that may not arrive within the 30-second window. * fix(cfored): downgrade spurious "unknown crun/cattach" warning to debug Replace the Warning-level log with a Debug-level message when no front-end is connected for a step during IO forwarding. This condition is expected during the brief window between crun exiting and the supervisor stopping its output stream, so a Warning was misleading and noisy. The new message also better explains the situation. * fix(cfored): use non-blocking send to prevent supervisor IO deadlock Replace the blocking channel send in `forwardRemoteIoToFront` with a non-blocking `select` to prevent deadlocks caused by slow front-end consumers (e.g. terminals). Previously, a backed-up front-end channel could block the supervisor IO stream, causing cfored, the supervisor stream, and crun/cattach to wait on each other indefinitely. The new behavior: - Delivers messages immediately if the channel has capacity - Drops messages when the front-end channel is full, logging a trace - Returns early if cfored is shutting down (globalCtx cancelled) Dropping output lines for slow interactive consumers (crun/cattach) is an acceptable trade-off to keep the supervisor stream flowing. * refactor(cattach): replace X11 channels with X11SessionMgr Replace the single-channel X11 forwarding approach (`chanX11InputFromLocal`, `chanX11OutputFromRemote`) with an `X11SessionMgr` that handles per-session demultiplexing by `(CranedId, LocalId)` pairs. This mirrors the session manager used in crun and enables multi-task X11 jobs where each task can open concurrent X11 sessions. The old `StartX11ReaderWriterRoutine` is removed in favor of `X11SessionMgr.SessionMgrRoutine`, and reply routing now handles `STEP_X11_CONN`, `STEP_X11_FORWARD`, and `STEP_X11_EOF` messages through the manager. * style: fix indentation and alignment in cattach state machine code Correct misaligned switch/case blocks, const declarations, and select statements in cattach.go and cattach_server.go to properly reflect their logical nesting level. No functional changes were made. * refactor(cattach): remove X11 forwarding support from cattach Remove all X11 forwarding logic from the cattach state machine, including the X11SessionMgr field, X11 request/reply channel handling in StateForwarding, and X11 session initialization in StartIOForward. Also removes corresponding X11 message routing in cfored's cattach server (STEP_X11_FORWARD case) and cleans up related forwarding logic in forwardTaskMsgToCattach. * style: fix indentation of switch cases in cattach_server.go Corrected improper indentation of `case` blocks within the `switch` statement in the `CforedCattachStateMachineLoop` function to align with Go formatting conventions. * fix(cfored): correct map references, types, and IO routing in server - Fix wrong map reference from `taskIORequestChannelMap` to `stepIORequestChannelMap` in `setRemoteIoToFrontChannel` - Replace incorrect `StreamTaskIORequest` type with `StreamStepIORequest` in channel and buffer initialization - Remove duplicate `getStepDoneChannel` function definition - Fix `TASK_ERR_OUTPUT` to route via `forwardRemoteIoToFront` instead of `forwardRemoteIoToCrun` - Fix indentation in supervisor request switch statement * feat(cattach): add stderr forwarding support for attached tasks - Add `chanErrOutputFromRemote` channel to `StateMachineOfCattach` for handling stderr output separately from stdout - Implement `StderrWriterRoutine` to write remote stderr output to local `os.Stderr`, with proper draining on task completion - Handle new `TASK_ERR_OUTPUT_FORWARD` reply type in `StateForwarding` to route stderr messages to the dedicated channel - Update `cfored` to forward `TASK_ERR_OUTPUT` messages from supervisor to cattach clients using the new proto message type - Add `TaskIOErrOutputForwardReply` proto message and `TASK_ERR_OUTPUT_FORWARD` enum value to `StreamCattachReply` This enables cattach to properly separate and display stderr output from remote tasks instead of mixing it with stdout. * fix(crun): handle stopStepCtx and globalCtx in I/O forwarding states Add missing context cancellation cases in `StateForwarding` and `StateWaitAck` to prevent goroutine hangs when a job is terminated or cfored connection is lost: - In `StateForwarding`: transition to `JobKilling` when `stopStepCtx` is done during stdout/stderr forwarding - In `StateWaitAck`: silently drop output messages when `stopStepCtx` is done (allows draining replyChannel to receive `STEP_COMPLETION_ACK_REPLY`), and transition to `End` state when `globalCtx` is cancelled due to cfored shutdown or connection loss * feat(cattach): add read-only mode when step has exclusive stdin routing When a job step is started with `crun --input=<task_id>`, the step's `IoMeta.InputTaskId` is set, meaning stdin is exclusively routed to one specific task. In this case, `cattach` now automatically enters read-only mode: it displays task output but does not forward any stdin from the local terminal. Key changes: - Add `FlagReadOnly` flag in `cmd.go`, set automatically during `StateWaitForward` by inspecting `IoMeta.InputTaskId` (non-nil and non-PTY implies read-only) - Skip `StdinReaderRoutine` and the stdin-forwarding goroutine in `StartIOForward`/`StateForwarding` when `FlagReadOnly` is true - PTY mode is explicitly excluded from read-only since it always requires full interactive control * fix(cfored): fix channel leak and IO forwarding in cattach/supervisor - Remove ctldReplyChannelMapByPid entry when cattach disconnects before ctld replies, preventing a permanent channel leak that bypassed the normal cleanup path in the STEP_META_REPLY branch - Remove the non-blocking select default case in forwardRemoteIoToFront to stop silently dropping IO messages when the front-end channel is full - Delete the outer stepIORequestChannelMap entry when the last cattach disconnects so forwardRemoteIoToFront correctly detects no connected front-end instead of iterating over an empty map * fix(cfored): prevent supervisor IO stream blocking on slow front-end consumers Add a `default` branch to the channel send select in `forwardRemoteIoToFront` so that a full or already-exited front-end IO channel never blocks the supervisor IO forwarding goroutine. Messages are dropped with a warning log when the channel is full, ensuring the supervisor stream remains unblocked. * fix(cfored): reject cattach for non-interactive jobs Add a check for interactive metadata in the STEP_META_REPLY handler. If a job lacks interactive metadata (e.g., a batch job), cattach is now explicitly rejected with a clear failure reason instead of causing a potential nil pointer dereference. Also refactors the reply block and extracts `failureReason` earlier to be used consistently in all failure paths. * fix(cfored): increase channel capacities and use dedicated cattach map Increase `ctldReplyChannel` buffer from 2→8 in both calloc and cattach servers to prevent overflow when `WaitAllFrontEnd` pre-sends cancel and completion ACK messages alongside pre-existing ctld messages. Increase `TaskIoRequestChannel` buffer from 2→64 in cattach to reduce message drops in `forwardRemoteIoToFront` when the supervisor produces output faster than cattach consumes it. Replace usage of `ctldReplyChannelMapByPid` with the dedicated `ctldReplyChannelMapForCattachByPid` in cattach so that `WaitAllFrontEnd` sends the correct termination message type (`JOB_COMPLETION_ACK_REPLY`) instead of the calloc/crun-specific `JOB_ID_REPLY`, and ensures proper cleanup on early disconnects to avoid channel leaks. * fix(cattach): improve data handling in StateMachine and avoid potential leaks in cattach server * fix(cfored): improve error handling in CattachStream and replace fatal logs with error logs * fix(crun): remove unused stopStepCtx references and clean up StateForwarding logic * Implement feature X to enhance user experience and optimize performance * fix(cattach): align RootCmd variable declaration for improved readability * fix(cattach): add licensing information and improve code comments for clarity * fix(cattach): improve EOF handling in StateForwarding and enhance error messages for job and step IDs Co-authored-by: Copilot <copilot@github.com> * fix(cattach): capture pre-registration history in setRemoteIoToFrontChannel to prevent message duplication * fix(cattach): enhance crash signal handling and remove unused X11 forward messages from StreamCattachRequest * fix(cattach): update copyright year to 2026 in cattach.go * style: address PR #325 review comments from L-Xiafeng - LX1: add comment clarifying map key is cattach process PID - LX3: rename cattachRequestChannelMap -> cattachRequestChannel (the field is a chan, not a map; fix misleading name) - LX4: add Doxygen-style comments to TaskIOBuffer struct and its Push/GetHistory methods - LX7: rename getRemoteHistory(taskId) param to jobId for consistency with StepIdentifier.JobId and rest of the codebase * fix(cfored): send STEP_META_REPLY failure instead of JOB_COMPLETION_ACK_REPLY to cattach on cfored disconnect When WaitAllFrontEnd notifies cattach clients waiting for STEP_META_REPLY, sending JOB_COMPLETION_ACK_REPLY was semantically incorrect: CattachWaitStepMeta would transition to DeadCattach and send STEP_COMPLETION_ACK_REPLY back to the cattach client, making it look like the step completed successfully with no error. Fix by sending STEP_META_REPLY{Ok: false} instead. This reuses the existing normal failure path in CattachWaitStepMeta, which correctly sends STEP_CONNECT_REPLY{Ok: false, FailureReason: "Cfored is not connected to CraneCtld."} back to the cattach client, causing it to print an error message and exit with a non-zero exit code. * fix(cfored): update message handling to use STEP_META_REPLY instead of JOB_COMPLETION_ACK_REPLY for cattach sessions * fix(cfored): update comments to clarify usage of job ID and process PID in cattach and Crun/Cattach mappings * fix(cattach): clarify comment to specify removal of cattach by [jobid, stepId] in ctldReplyChannel * fix(cattach, cfored): improve handling of channel sends to prevent blocking and ensure graceful exits * refactor(cfored): generalize request forwarding to reduce code duplication * feat(cattach): add craned_task_map to StreamCtldReply for task ID mapping * refactor(cattach, cfored): replace StepToCtld with CattachStepInfo in StreamCtldReply and related structures * feat(cattach): enhance PrintStepLayout to display per-node task breakdown and add craned_task_map to CattachStepInfo * refactor(cfored): update StreamCtldReply to remove craned_task_map and adjust step_info usage * feat(cattach, cfored): implement task ID validation and improve stdin handling for input filters * refactor(cattach): improve error handling for non-interactive jobs in cattach * feat(goreleaser): add cattach binary to package contents --------- Co-authored-by: Copilot <copilot@github.com>

Commit:6a53f6b
Author:Zhang Yanwen
Committer:GitHub

feat: Add array (#422) * feat: add array job support across CLI and protos * fix: address review comments and bug fixes * accept comments * fix * fix * delete dead code * fix * accept comments * cancel output --------- Co-authored-by: crane-dev <crane-dev@local>

Commit:da739ba
Author:huerni
Committer:huerni

merge master

Commit:aa645d9
Author:huerni
Committer:huerni

feat: cattach recive msg

Commit:464c91b
Author:huerni
Committer:huerni

feat: cattach connect step

Commit:d55fce9
Author:huerni
Committer:huerni

refactor(cattach,cfored): rename task-based proto types to step-based Replace `TASK_CONNECT_REQUEST/REPLY` with `STEP_CONNECT_REQUEST/REPLY`, `TASK_COMPLETION_ACK_REPLY` with `STEP_COMPLETION_ACK_REPLY`, and `TASK_X11_FORWARD` with `STEP_X11_FORWARD` across cattach and cfored to align with updated proto definitions. Also remove redundant `TaskId` fields from IO and X11 forward request payloads, and fix stale log messages that referenced wrong component names (Crun instead of Cattach).

Commit:9c11f09
Author:huerni
Committer:huerni

fix(cattach): enhance crash signal handling and remove unused X11 forward messages from StreamCattachRequest

Commit:1fb8fa1
Author:huerni
Committer:huerni

Implement feature X to enhance user experience and optimize performance

Commit:10e5fae
Author:huerni
Committer:huerni

fix(cattach): add licensing information and improve code comments for clarity

Commit:62b32b1
Author:huerni
Committer:huerni

feat(cattach): enhance PrintStepLayout to display per-node task breakdown and add craned_task_map to CattachStepInfo

Commit:9e7e971
Author:huerni
Committer:huerni

fix(cfored): correct map references, types, and IO routing in server - Fix wrong map reference from `taskIORequestChannelMap` to `stepIORequestChannelMap` in `setRemoteIoToFrontChannel` - Replace incorrect `StreamTaskIORequest` type with `StreamStepIORequest` in channel and buffer initialization - Remove duplicate `getStepDoneChannel` function definition - Fix `TASK_ERR_OUTPUT` to route via `forwardRemoteIoToFront` instead of `forwardRemoteIoToCrun` - Fix indentation in supervisor request switch statement

Commit:38ea077
Author:huerni
Committer:huerni

feat(cattach): add craned_task_map to StreamCtldReply for task ID mapping

Commit:5519716
Author:huerni
Committer:huerni

feat(cattach): add --label, --output-filter, --input-filter, and --quiet flags Introduce `TaskOutputMsg` to carry task ID alongside output data through internal channels, enabling per-task I/O control: - `--label`: prepend `[task_id]: ` to each output line via `applyLabel` - `--output-filter`: suppress output from tasks other than the specified one - `--input-filter`: direct stdin to a specific task instead of broadcasting - `--quiet`: suppress the "Task io forward ready" status message

Commit:918b55e
Author:huerni
Committer:huerni

refactor(cfored): update StreamCtldReply to remove craned_task_map and adjust step_info usage

Commit:7092998
Author:huerni
Committer:huerni

refactor(cattach, cfored): replace StepToCtld with CattachStepInfo in StreamCtldReply and related structures

Commit:10b09b2
Author:huerni
Committer:huerni

feat(cattach): add stderr forwarding support for attached tasks - Add `chanErrOutputFromRemote` channel to `StateMachineOfCattach` for handling stderr output separately from stdout - Implement `StderrWriterRoutine` to write remote stderr output to local `os.Stderr`, with proper draining on task completion - Handle new `TASK_ERR_OUTPUT_FORWARD` reply type in `StateForwarding` to route stderr messages to the dedicated channel - Update `cfored` to forward `TASK_ERR_OUTPUT` messages from supervisor to cattach clients using the new proto message type - Add `TaskIOErrOutputForwardReply` proto message and `TASK_ERR_OUTPUT_FORWARD` enum value to `StreamCattachReply` This enables cattach to properly separate and display stderr output from remote tasks instead of mixing it with stdout.

Commit:2a54365
Author:huerni
Committer:huerni

refactor(cfored): rename proto types to align with step/job semantics - Rename `StreamTaskIORequest` to `StreamStepIORequest` across cattach_server.go and server.go - Rename `TASK_X11_OUTPUT` to `STEP_X11_OUTPUT` and update X11 forward reply to include `CranedId` and `LocalId` fields - Rename `TASK_CANCEL_REQUEST` to `JOB_CANCEL_REQUEST` and `TASK_COMPLETION_REQUEST` to `JOB_COMPLETION_REQUEST` in ctld_client.go - Replace `InteractiveTaskType` with `InteractiveJobType` in completion request payloads - Clean up duplicate imports in cfored.go

Commit:00cceea
Author:huerni
Committer:GitHub

feat: Add CpuTopology message and integrate CPU socket configuration (#437) * feat: Update CPU topology definitions and add socket information * feat: Update CPU topology structure to NodeTopoInfo and adjust references * fix: Remove redundant comment in NodeTopoInfo message

Commit:94a0ebd
Author:huerni
Committer:GitHub

feat: pmix v2 (#421) * feat * feat: merge master * style: fix alignment of FlagMpi variable declaration in calloc cmd.go * fix(calloc): remove unused MPI flag from calloc command * fix: add MpiTypePmix constant and validate MPI type in MainCrun function * fix(crun): validate FlagMpi before comparing with MpiTypePmix in MainCrun function

Commit:61d62dd
Author:NamelessOIer
Committer:GitHub

feat: preempt_v2 (#429) * feat: preempt_v2 * style: cleanup * fix

Commit:7592f86
Author:huerni

feat: Add NodeTopoInfo to CranedInfo and implement topology formatting

Commit:00ef75f
Author:RileyWen

Merge branch 'master' into feat/metrics Resolve proto conflicts: - Supervisor.proto: keep tracing fields + master's TerminateStepRequest - Crane.proto: keep Suspend/Resume/TerminateOrphaned RPCs from feature branch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Commit:4337761
Author:huerni

refactor: Enhance error code documentation in Crane.proto

Commit:b05f276
Author:Junlin Li
Committer:GitHub

feat: Add completing status (#423) * feat: Add completing status * sync protos

Commit:d36b65c
Author:RileyWen

Merge branch 'master' into feat/metrics Sync proto files with backend CraneSched feat/metrics branch. Resolve merge conflict in cacct.go by adopting new ResourceV3 API.

Commit:ea9b083
Author:RileyWen

feat: add service name and span status to SpanInfo in Plugin.proto

Commit:11a2ad7
Author:NamelessOIer
Committer:GitHub

resource v3 (#424) * feat: resource v3 * fix

Commit:0c18873
Author:Zhang Yanwen
Committer:GitHub

feat: add TaskStatus Suspend (#335) * feat: add suspend and resume job commands * add suspended task status to TaskStatus enum * ban cgroupv1 * fix * add suspend in ccancel -t * fix * fix * fix * fix: correct JobStatus enum numbering for Suspended and Deadline Suspended = 10, Deadline = 11 to avoid proto enum collision. Matches backend proto definition.

Commit:5c82c70
Author:NamelessOIer

fix

Commit:cc5967d
Author:NamelessOIer
Committer:NamelessOIer

feat: resource v3

Commit:166380c
Author:edragain
Committer:GitHub

feat: cbatch --deadline (#341) * feat: add cbatch --deadline (squash commits) * fix fmt

Commit:89513ff
Author:Junlin Li
Committer:GitHub

feat: Batch input/output/error option with file pattern (#377) * feat: Crun Cbatch input/output/err fix: Usage showed when crun cmd return exit code 2 feat: Crun input output err feat: Crun varible from env feat: Batch input/output/error option with file pattern Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: crun input Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: Format Signed-off-by: lijunlin <xiafeng.li@foxmail.com> chore: Log format Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: crun output Signed-off-by: lijunlin <xiafeng.li@foxmail.com> style: format Signed-off-by: lijunlin <xiafeng.li@foxmail.com> refactor: crun step res field, inherit job when not specified Signed-off-by: lijunlin <xiafeng.li@foxmail.com> refactor: Update terminology from task to job and pass jobId to StartTerminal refactor: Remove pending reason for step Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: cacct elapsed time Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: Auth uid Signed-off-by: lijunlin <xiafeng.li@foxmail.com> refactor step query Signed-off-by: lijunlin <xiafeng.li@foxmail.com> docs: Add error string Signed-off-by: lijunlin <xiafeng.li@foxmail.com> chore: Sync protos Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix Signed-off-by: lijunlin <xiafeng.li@foxmail.com> refactor feat: Crun step submit feat: Enhance job and step processing with unified data structure and parsing functions Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix:step erro msg fix: x11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: X11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> feat: X11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> task exit status Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: Crun app args Signed-off-by: lijunlin <xiafeng.li@foxmail.com> sync protos Signed-off-by: lijunlin <xiafeng.li@foxmail.com> feat: Introcduce multi level cgrop for step/task Signed-off-by: lijunlin <xiafeng.li@foxmail.com> format sync protos fix: Enhance error handling and cleanup in crun and supervisor channels (#419) docs: Add comments to clarify the purpose of stepDoneChannel in SupervisorChannelKeeper refactor: Improve IO handling with goroutines and context management in StateMachineOfCrun refactor: Improve context management for IO forwarding in StateMachineOfCrun * format * refactor: reorder fields in StepToD message for clarity

Commit:ed2090c
Author:RileyWen

Merge branch 'master' into feat/metrics

Commit:3ac9370
Author:junlinli

refactor: reorder fields in StepToD message for clarity

Commit:c9fa425
Author:Junlin Li
Committer:junlinli

feat: Crun Cbatch input/output/err fix: Usage showed when crun cmd return exit code 2 feat: Crun input output err feat: Crun varible from env feat: Batch input/output/error option with file pattern Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: crun input Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: Format Signed-off-by: lijunlin <xiafeng.li@foxmail.com> chore: Log format Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: crun output Signed-off-by: lijunlin <xiafeng.li@foxmail.com> style: format Signed-off-by: lijunlin <xiafeng.li@foxmail.com> refactor: crun step res field, inherit job when not specified Signed-off-by: lijunlin <xiafeng.li@foxmail.com> refactor: Update terminology from task to job and pass jobId to StartTerminal refactor: Remove pending reason for step Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: cacct elapsed time Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: Auth uid Signed-off-by: lijunlin <xiafeng.li@foxmail.com> refactor step query Signed-off-by: lijunlin <xiafeng.li@foxmail.com> docs: Add error string Signed-off-by: lijunlin <xiafeng.li@foxmail.com> chore: Sync protos Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix Signed-off-by: lijunlin <xiafeng.li@foxmail.com> refactor feat: Crun step submit feat: Enhance job and step processing with unified data structure and parsing functions Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix:step erro msg fix: x11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: X11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> feat: X11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> task exit status Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: Crun app args Signed-off-by: lijunlin <xiafeng.li@foxmail.com> sync protos Signed-off-by: lijunlin <xiafeng.li@foxmail.com> feat: Introcduce multi level cgrop for step/task Signed-off-by: lijunlin <xiafeng.li@foxmail.com> format sync protos fix: Enhance error handling and cleanup in crun and supervisor channels (#419) docs: Add comments to clarify the purpose of stepDoneChannel in SupervisorChannelKeeper refactor: Improve IO handling with goroutines and context management in StateMachineOfCrun refactor: Improve context management for IO forwarding in StateMachineOfCrun

Commit:6a63477
Author:NamelessOIer
Committer:GitHub

refactor: job/step/task misuse (#418) * fix job/step/task misuse * bug fix * fix * format * fix Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * fix * fix: x11 task to step * fix --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Commit:de618f6
Author:NamelessOIer

fix

Commit:71c2db5
Author:NamelessOIer

fix: x11 task to step

Commit:ad156a7
Author:NamelessOIer
Committer:NamelessOIer

fix job/step/task misuse

Commit:0565217
Author:huerni
Committer:GitHub

Feat:Add Qos tres fields and Qos tres limit (#286) * feat: qos add max_jobs_per_account max_submit_jobs_per_user max_submit_jobs_per_account * refactor qos table * refactor * feat: qos add max_jobs_per_account max_submit_jobs_per_user max_submit_jobs_per_account * refactor qos table * feat * feat: qos tres modify * feat: add qos flags DenyOnLimit * feat: add ERR_MAX_TRES_PER_USER_BEYOND and ERR_MAX_TRES_PER_ACCOUNT_BEYOND * fix: flag empty when add qos * fix: 1. grpwall 2. mem bytes 3. CraneErrStr * fix: err print * refactor * refactor * refactor * fix * feat: modify add tres validate * refactor err code name * feat: add / parser * feat: cacctmgr update help * feat: refactor max memory * add help * fix flags format * fix: parseTres * fix: flags * refactor ParseGresForQosLimit * fix: max wall * feat: merge master

Commit:c477f37
Author:huerni
Committer:huerni

refactor err code name

Commit:e7e4e27
Author:huerni
Committer:huerni

refactor

Commit:ef4e296
Author:huerni
Committer:huerni

refactor

Commit:9f15a33
Author:huerni
Committer:huerni

refactor

Commit:706bb41
Author:huerni
Committer:huerni

feat: add ERR_MAX_TRES_PER_USER_BEYOND and ERR_MAX_TRES_PER_ACCOUNT_BEYOND

Commit:b918cfb
Author:huerni
Committer:huerni

feat: add qos flags DenyOnLimit

Commit:9f83830
Author:huerni
Committer:huerni

feat: qos tres modify

Commit:3d3b0a0
Author:huerni
Committer:huerni

feat

Commit:be145ce
Author:Junlin Li
Committer:GitHub

feat: X11 multi connection and multi task for step (#371) * fix: x11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: X11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> feat: X11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> task exit status Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: Crun app args Signed-off-by: lijunlin <xiafeng.li@foxmail.com> sync protos Signed-off-by: lijunlin <xiafeng.li@foxmail.com> feat: Introcduce multi level cgrop for step/task Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix:step erro msg * feat: ntasks (#392) * fix: set default CpusPerTask * fix mem limit * rename req_res_view to req_total_res_view * fix: defalut NodeNum * fix: Mem check * more friendly error message * fix: invalid value check --------- Co-authored-by: NamelessOIer <70872016+NamelessOIer@users.noreply.github.com>

Commit:a75bfb6
Author:zhansan114514
Committer:RileyWen

insert data into influxdb

Commit:a6a0213
Author:edragain
Committer:GitHub

feat: cinfo --list-reasons/-R (#405) * cinfo -R * add single node mode which only used in cinfo -R yet * fix filter * change style * expand pattern * fix format * fix format(2) * fix * fix codestyle

Commit:2dc74cb
Author:RileyWen

chore: sync proto with backend.

Commit:a82174b
Author:edragain
Committer:GitHub

feat: add field SubmitNode to ccontrol show job (#410) * feat:submit_node * add print * change name

Commit:1fc4dab
Author:RileyWen
Committer:RileyWen

feat: crun input/output/error redirection and batch file pattern support - Add crun stdin/stdout/stderr redirection with file options - Add batch input/output/error option with file pattern support - Add crun environment variable passthrough from env - Fix x11 multi connection handling - Fix step error message display - Fix usage shown when crun cmd returns exit code 2

Commit:6b4e7d0
Author:Riley W
Committer:GitHub

feat: reset task id counter (#413) * feat: add ccontrol reset next-task-id / next-task-db-id commands Admin-only commands to reset task ID counters in ctld's embedded DB. Used by the test framework to reset state between test cases without restarting ctld. Usage: ccontrol reset next-task-id # reset to 1 ccontrol reset next-task-id 5 # reset to 5 ccontrol reset next-task-db-id # reset to 1 ccontrol reset next-task-db-id 5 # reset to 5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add delete ALL --force and reset commands for all metadata Support bulk cleanup via CLI: - cacctmgr delete account/qos/wckey/resource ALL --force - ccontrol delete reservation ALL --force - ccontrol reset partition-acl (reload defaults from config) - ccontrol reset next-step-db-id Skip required params (user/server) when deleting ALL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add ccontrol reset task-history command Purges all task/step history from ctld's embedded DB via RPC. Safe alternative to file-level wipe — works regardless of ctld restart state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: standardize formatting in EntityType struct --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Commit:b9385e8
Author:Riley W
Committer:GitHub

Merge branch 'master' into feat/reset-task-id-counter

Commit:8637e43
Author:1daidai1
Committer:GitHub

Add creport cmd (#357) * add creport * refactor * fix: Remove creport activate * refactor: gid parsing and remove unused functions * chore: Remove debug output * chore: Update json format * set default time to UTC time * refactor --------- Co-authored-by: junlinli <xiafeng.li@foxmail.com>

Commit:53b3d8c
Author:RileyWen

feat: add ccontrol reset task-history command Purges all task/step history from ctld's embedded DB via RPC. Safe alternative to file-level wipe — works regardless of ctld restart state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Commit:185a5a3
Author:RileyWen

feat: add delete ALL --force and reset commands for all metadata Support bulk cleanup via CLI: - cacctmgr delete account/qos/wckey/resource ALL --force - ccontrol delete reservation ALL --force - ccontrol reset partition-acl (reload defaults from config) - ccontrol reset next-step-db-id Skip required params (user/server) when deleting ALL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Commit:6a2aa78
Author:RileyWen
Committer:RileyWen

feat: add ccontrol reset next-task-id / next-task-db-id commands Admin-only commands to reset task ID counters in ctld's embedded DB. Used by the test framework to reset state between test cases without restarting ctld. Usage: ccontrol reset next-task-id # reset to 1 ccontrol reset next-task-id 5 # reset to 5 ccontrol reset next-task-db-id # reset to 1 ccontrol reset next-task-db-id 5 # reset to 5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Commit:079d5a7
Author:huerni
Committer:GitHub

feat: Add QOS fields: MaxJobsPerUser, MaxSubmitJobsPerUser, MaxJobsPerAccount, MaxSubmitJobsPerAccount (#252) * feat: qos add max_jobs_per_account max_submit_jobs_per_user max_submit_jobs_per_account * refactor * refactor qos table * refactor modify field array str * feat: task account_chain * refactor * refactor * fix fmt * feat: update doc

Commit:440e743
Author:huerni
Committer:huerni

feat: task account_chain

Commit:31643a8
Author:huerni
Committer:huerni

refactor modify field array str

Commit:9a89113
Author:huerni
Committer:huerni

refactor

Commit:0350ae3
Author:huerni
Committer:huerni

feat: qos add max_jobs_per_account max_submit_jobs_per_user max_submit_jobs_per_account

Commit:1691cc8
Author:huerni
Committer:GitHub

feat: delete wckey add force (#404)

Commit:7d3a23b
Author:Yongkun Li
Committer:GitHub

feat: Add CNI for CoreDNS and DNS related options (#407) * add annotations * add ccon --dns * rebase * add cbatch --pod-dns and adjust format * add cni plugin * refactor: Refactor DNS related logic * refactor: Refactor ipv4 checking * fix: Align with backend * fix: Fix comments * fix: Fix validation bug in dns flags --------- Co-authored-by: edragain2nd <edragain@163.com>

Commit:98db4ad
Author:huerni

feat: delete wckey add force

Commit:fab2756
Author:huerni
Committer:GitHub

feat: cbatch signal (#394) * add cbatch signal * feat: add R|B and support multi signal * refactor * refactor * fix format * refactor * fix step mod * feat: crun and calloc not bash * fix err output --------- Co-authored-by: db <1301189887@qq.com>

Commit:b477f85
Author:NamelessOIer
Committer:NamelessOIer

feat: ntasks

Commit:69a3048
Author:Junlin Li
Committer:junlinli

fix: x11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: X11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> feat: X11 multi conn Signed-off-by: lijunlin <xiafeng.li@foxmail.com> task exit status Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix: Crun app args Signed-off-by: lijunlin <xiafeng.li@foxmail.com> sync protos Signed-off-by: lijunlin <xiafeng.li@foxmail.com> feat: Introcduce multi level cgrop for step/task Signed-off-by: lijunlin <xiafeng.li@foxmail.com> fix:step erro msg

Commit:8f6fcab
Author:huerni
Committer:GitHub

feat: Crunprolog/epilog (#342) * feat: crunprolog and crunepilog * refactor * refactor * refactor * faet: crun add --task_prolog and --task_epilog * merge master * fix and add timeout

Commit:e8b9a35
Author:huerni
Committer:huerni

merge master

Commit:b56f737
Author:huerni
Committer:huerni

faet: crun add --task_prolog and --task_epilog

Commit:e4aef92
Author:huerni
Committer:GitHub

feat: remote license (#372) * feat: licenses * feat: ccontrol show query for multiple specified licenses. * refactor * rebase master * feat: show lic * feat: show lic add json * feat * feat: add query resource * feat * refactor * refactor * feat: show lic add remote * feat: last update show * refactor modify to one request * feat: add flag * feat: add update * refactor * merge master * feat: support parse foorba@db * refactor * refactor * feat: add reserved last_deficit last_consumed in monitor * refactor * refactor * fix pattern * refactor * fix * fix timeStr and show lic * refactor * refactor

Commit:011ff31
Author:huerni
Committer:huerni

refactor

Commit:0a08b26
Author:huerni
Committer:huerni

refactor

Commit:cd04d05
Author:huerni
Committer:huerni

refactor

Commit:d5b5a47
Author:huerni
Committer:huerni

refactor

Commit:446da43
Author:huerni
Committer:huerni

feat: last update show

Commit:2da59b7
Author:huerni
Committer:huerni

refactor modify to one request

Commit:83650e1
Author:huerni
Committer:huerni

feat: add flag

Commit:768986c
Author:huerni
Committer:huerni

merge master

Commit:9c401ef
Author:huerni
Committer:huerni

feat: show lic add remote

Commit:c3ab77b
Author:huerni
Committer:huerni

feat: add query resource

Commit:d84dba8
Author:huerni
Committer:huerni

feat

Commit:23848c9
Author:edragain
Committer:GitHub

fix: duplicate modification message output (#367) * fix issue * add qos rich_error_list * remove brace