Skip to content

feat(halstead): add -Ehalstead extension for Halstead complexity metrics#485

Open
ArmaanjeetSandhu wants to merge 2 commits into
terryyin:masterfrom
ArmaanjeetSandhu:feature/halstead-metrics
Open

feat(halstead): add -Ehalstead extension for Halstead complexity metrics#485
ArmaanjeetSandhu wants to merge 2 commits into
terryyin:masterfrom
ArmaanjeetSandhu:feature/halstead-metrics

Conversation

@ArmaanjeetSandhu

@ArmaanjeetSandhu ArmaanjeetSandhu commented Jun 30, 2026

Copy link
Copy Markdown

Addresses #464.

What this adds

An optional -Ehalstead extension that computes Halstead complexity measures per function:

$ lizard -Ehalstead path/to/code
  NLOC    CCN   token  PARAM  length  H-volume  H-diff  H-effort  location
----------------------------------------------------------------------------
     5      3     29      2       5    116.76    16.9   1973.21 foo@1-5@sample.py

Three columns are displayed (H-volume, H-diff, H-effort). The full set (n1, n2, N1, N2, vocabulary, length, volume, difficulty, effort, time, estimated bugs) is available programmatically via function.halstead and via flat halstead_* attributes that also work with --sort/--Threshold (e.g. -s halstead_volume).

Design

Following the guidance on the issue:

  • HalsteadClassifier is the per-language extension point: classify(token) returns operator / operand / skip, one label per token, preserving 1:1 correspondence with the tokens lizard already emits (so this stays easy to fold into core later). A precise PythonHalsteadClassifier ships in-tree; a generic C-family classifier is the fallback for other languages.
  • Classifier selection (get_classifier) checks, in order: a halstead_classifier attribute on the reader (the seam intended for per-reader hooks in core), a registry keyed by language_names, then the generic fallback.
  • HalsteadMetrics holds the two operator/operand Counters and derives all measures lazily, so values always reflect the final counts.
  • Metrics are exposed on FunctionInfo as read-only properties, so they default to zero for functions that never went through the extension and integrate with the existing sort/threshold/output machinery.

Counting convention

Halstead numbers are sensitive to how operators vs operands are counted. In short: operands are identifiers, numeric and string literals, and value-literal keywords (True/False/None, etc.); operators are punctuation/operator symbols plus operator/control keywords (if, for, return, def, and, …); paired delimiters are counted individually; and tokens are attributed to functions exactly as token_count is (so def/class and the function name belong to the enclosing scope).

Tests

test/testHalstead.py adds 41 tests covering the basic counts, every derived measure (against hand-verified numbers), the flat FunctionInfo attributes, operator/operand classification (keywords, literals, strings, numbers, Python value literals and soft keywords), classifier selection precedence, the HalsteadMetrics value object, end-to-end C++ via the generic classifier, and extension statelessness. No regressions. Verified working via the console script, python -m lizard, and multiprocessing (-t 2).

Adds an optional `-Ehalstead` extension that computes Halstead complexity
measures (volume, difficulty, effort, plus n1/n2/N1/N2, vocabulary, length,
time and estimated bugs) for every function. Three columns (H-volume,
H-diff, H-effort) are shown; all measures are also exposed on FunctionInfo
via a `halstead` object and flat `halstead_*` attributes that work with
--sort and --Threshold.

Operator/operand classification is language-specific behind a small explicit
interface (HalsteadClassifier), staying in 1:1 correspondence with the tokens
lizard already emits. A precise Python classifier ships in-tree; other
languages fall back to a generic C-family classifier. Readers may override
selection via a `halstead_classifier` attribute, the intended seam for folding
this into core later.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12ae6f853b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lizard_ext/lizardhalstead.py Outdated
Comment on lines +303 to +307
FUNCTION_INFO = {
"halstead_volume": {"caption": " H-volume "},
"halstead_difficulty": {"caption": " H-diff "},
"halstead_effort": {"caption": " H-effort "},
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Export all Halstead columns in CSV output

Defining -Ehalstead with multiple FUNCTION_INFO entries makes the current CSV writer drop every Halstead metric, because lizard_ext/csvoutput.py only appends extension columns when len(FUNCTION_INFO) == 1. As a result lizard --csv -Ehalstead ... produces the same CSV columns as a run without the extension, so users cannot consume the new metrics in CSV even though the default table shows them.

Useful? React with 👍 / 👎.

@ArmaanjeetSandhu

ArmaanjeetSandhu commented Jun 30, 2026

Copy link
Copy Markdown
Author

@terryyin the CSV issue flagged by the Codex bot is a pre-existing bug, not something Halstead introduced. The same len == 1 cap was already silently dropping columns for -Eio and -Eduplicated_param_list as well. I've created a separate PR to fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant