Skip to content

[Test] Fix parsing bug with scontrol output for dumping JOB output and command timeouts#7373

Open
himani2411 wants to merge 2 commits into
aws:developfrom
himani2411:develop-intge-test
Open

[Test] Fix parsing bug with scontrol output for dumping JOB output and command timeouts#7373
himani2411 wants to merge 2 commits into
aws:developfrom
himani2411:develop-intge-test

Conversation

@himani2411
Copy link
Copy Markdown
Contributor

Description of changes

Adding checks for filepath before we output or read the file so that we avoid hanging of the command when we have non-printable characters.

  • Use shelex quote for all filepath to prevent unexpected behavior from special characters from scontrol
  • Check for / (absolute path), and contains only printable characters. This catches empty strings, non-printable characters (null bytes, carriage returns), and relative/garbage paths.

Adding OS level timeout so that we can escape the test if PTY or Bash shell doesnt timeout or close the session cleanly.

  • OS-level timeout command sends SIGTERM after 60 seconds, regardless of PTY/login shell/Fabric behavior
  • Fabric's SSH timeout set to 70 seconds as a backup, giving the OS timeout 10 seconds to kill the process before the SSH layer intervenes

Tests

  • test_gb200

References

  • Link to impacted open issues.
  • Link to related PRs in other packages (i.e. cookbook, node).
  • Link to documentation useful to understand the changes.

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Himani Anil Deshpande added 2 commits May 4, 2026 16:08
* Use shelex quote for all filepath to prevent unexpected behaviour from special characters from scontrol
* Check for `/` (absolute path), and contains only printable characters. This catches empty strings, non-printable characters (null bytes, carriage returns), and relative/garbage paths.
* OS-level timeout command sends SIGTERM after 60 seconds, regardless of PTY/login shell/Fabric behavior
* Fabric's SSH timeout set to 70 seconds as a backup, giving the OS timeout 10 seconds to kill the process before the SSH layer intervenes
@himani2411 himani2411 requested review from a team as code owners May 4, 2026 20:28
@himani2411 himani2411 added skip-changelog-update Disables the check that enforces changelog updates in PRs 3.x labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.x skip-changelog-update Disables the check that enforces changelog updates in PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant