Skip to content

Added raid mode#62

Open
dropthemic wants to merge 4 commits into
wernerfred:masterfrom
dropthemic:patch-3
Open

Added raid mode#62
dropthemic wants to merge 4 commits into
wernerfred:masterfrom
dropthemic:patch-3

Conversation

@dropthemic
Copy link
Copy Markdown
Contributor

The current script will not show errors when raid status is in bad state, only when the drives are bad. This pull request will add raid checks. Here are a few examples of the script running:

# raid is re-syncing
python check_synology.py host pass pass2 raid
WARNING - raid status: [Storage Pool 3] status=7 | "Volume 3"=7

# raid is degraed
CRITICAL - raid status: [Storage Pool 3] 1 - raid status: [Storage Pool 1] 11 | "Storage Pool 3"=11 "Storage Pool 1"=1

Added check for raid status
Added raid mode
@animaartificialis
Copy link
Copy Markdown

animaartificialis commented May 20, 2026

Thanks for adding the raid mode! A few issues I noticed while reading through the diff:

1. Severity downgrade in the loop

state is initialised to 'OK' at the top of the script and your loop only ever assigns to it without preserving previous severity. If the first storage returns status 11 (Degrade) and a later storage returns status 7 (RaidSyncing → WARNING), the final state becomes WARNING even though one pool is degraded.

2. Status 2 (Repairing) falls through to CRITICAL

Your inline comment lists Repairing(2) as one of the intermediate operations, but the WARNING tuple is ("3","4","5","6","7","8","9","10")"2" is missing, so it hits the elif != "1" branch and reports CRITICAL.

3. README ↔ code mismatch

The README change says "if status is 4-10 will trigger WARNING", but the code checks 3-10 (and the comment also lists 2). Behaviour for 11 and 12 isn't documented either way.

4. Degrade(11) classification

Degrade(11) and Crashed(12) are both serious enough to warrant CRITICAL — Degrade tolerates a disk loss but doesn't tolerate two, so it's an actionable alert, not just informational. The current code already treats it as CRITICAL via the elif != "1" branch, but explicit categorisation makes intent clearer (see fix below).

Suggested fix

if mode == 'raid':
    output = ''
    perfdata = '|'
    WARNING_STATUSES = {"2", "3", "4", "5", "6", "7", "8", "9", "10"}
    CRITICAL_STATUSES = {"11", "12"}  # 11 = Degrade, 12 = Crashed
    for item in snmpwalk('1.3.6.1.4.1.6574.3.1.1.1'):
        i = item.oid_index or item.oid.split('.')[-1]
        storage_name = snmpget('1.3.6.1.4.1.6574.3.1.1.2.' + str(i))
        raid_status = str(snmpget('1.3.6.1.4.1.6574.3.1.1.3.' + str(i)))
        if raid_status in CRITICAL_STATUSES:
            state = "CRITICAL"
        elif raid_status in WARNING_STATUSES and state != "CRITICAL":
            state = "WARNING"
        elif raid_status != "1" and state == "OK":
            state = "UNKNOWN"
        output += ' - raid status: [' + storage_name + '] status=' + raid_status
        perfdata += ' "' + storage_name + '"=' + raid_status
    print('%s%s %s' % (state, output, perfdata))
    exitCode()

Key changes:

  • Explicit WARNING_STATUSES / CRITICAL_STATUSES sets that match the docstring (status 2 in WARNING, 11 promoted to CRITICAL alongside 12).
  • state == "CRITICAL" is preserved across iterations (no severity downgrade).
  • Unknown values land in UNKNOWN rather than silent CRITICAL — usual Nagios convention.
  • str(snmpget(...)) evaluated once and reused.

Happy to send a follow-up PR with this if you'd prefer.

animaartificialis added a commit to animaartificialis/check_synology that referenced this pull request May 20, 2026
Adds a `raid` mode that surfaces Synology raid status (OID
1.3.6.1.4.1.6574.3.1.1.x) for every storage pool: Normal → OK,
intermediate states (Repairing/Migrating/Expanding/.../Canceling)
→ WARNING, Degrade and Crashed → CRITICAL, anything else → UNKNOWN.

The previous severity is preserved across iterations so a WARNING
pool can never downgrade a CRITICAL one. Status "2" (Repairing)
is included in the WARNING set. README updated with the new mode.

This is an alternative take on wernerfred#62 incorporating review feedback
from issuecomment-4502847249.
Fix RAID state escalation and document/test status mapping
@dropthemic
Copy link
Copy Markdown
Contributor Author

dropthemic commented May 21, 2026

Fixex RAID status handling so severity only escalates across pools.

Changes:

  • treat RAID statuses 2-11 as WARNING
  • treat RAID status 12 as CRITICAL
  • preserve CRITICAL once seen in the RAID loop
  • return UNKNOWN for unexpected non-1 RAID statuses
  • update README to match the actual status mapping
  • add focused regression tests for RAID state transitions
  • correct perfdata units by removing incorrect c usage where values are not counters
  • use GB for storage perfdata values that are already converted to gigabytes

@animaartificialis
Copy link
Copy Markdown

Thanks for iterating on this — the next_raid_state helper and the AST-based test file are great. Two quick questions:

  1. Status 11 (Degrade) sits in RAID_WARNING_STATUSES. I'd typically expect Degrade to land in CRITICAL since a degraded raid is one more disk failure away from Crashed — but I see your changelog explicitly lists it under WARNING, so I'd like to check: is that by design (e.g. your monitoring already separates "alert" and "page" tiers and Degrade lives in the alert tier), or did you happen to consider CRITICAL and prefer WARNING for a different reason? Happy to defer to whatever convention fits your operational model.

  2. The changelog mentions "update README to match the actual status mapping", but the README mode table in a86eca1 still reads if status is 4-10 will trigger WARNING, any other values will be CRITICAL — that doesn't match the new code (2-11 → WARNING with the current set, 12 → CRITICAL, others → UNKNOWN). Did the README change end up in another commit, or is that still pending?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants