LLM Introspector

NOTE: This is currently a framework, not a complete or accurate toolkit. It thus should be taken as a conceptual model until validated frameworks and methods can be applied to scoring

LLM Introspector is a systematic framework for analyzing LLM behavior through response observation. It treats language models as black boxes and uses controlled probing to map decision boundaries, measure sensitivities, and infer internal priorities—all without access to model weights or training data.

Overview

Language models cannot directly examine their own weights or training, but they can observe the artifacts those weights produce—their responses. LLM Introspect operationalizes this insight by:

Systematically probing the model with carefully designed prompt variations
Measuring response differences across multiple dimensions
Inferring behavioral patterns from observable output changes
Generating actionable reports characterizing the model's decision landscape

Key Features

Probe Types

Probe	Purpose	What It Reveals
Contrastive	Compare responses under different framings	Sensitivity to authority, emotion, formality
Boundary	Map refusal/compliance thresholds	Where the model draws lines, how sharp boundaries are
Consistency	Test belief persistence across paraphrasing	Which positions are stable vs context-dependent
Omission	Detect systematic content gaps	What the model avoids and why
Unlock	Test jailbreak/bypass resistance	How easily safety measures can be circumvented
Code Ability	Audit code generation quality	Security vulnerabilities, algorithmic efficiency, correctness
Hallucination	Test susceptibility to false information	How easily the model can be tricked into confabulating
Systems Knowledge	Test OS administration knowledge	Accuracy across Linux/BSD variants, misconceptions
Danger	Deep safety boundary testing (ISOLATED)	Resistance to genuinely dangerous content requests

Adaptive Interrogation

Use a second LLM to dynamically generate follow-up probes based on the subject model's responses. See Adaptive Interrogation for full details and examples.

Supported Providers

Remote Providers (Subject LLMs):

OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo, O-series
Anthropic: Claude 4.x, Claude 3.x families
Grok/xAI: Grok 4, Grok 3 series
Mistral: Large, Medium, Small, Codestral
Google Gemini: Gemini 3.x, 2.x, 1.5 series
Azure OpenAI: Enterprise deployments
AWS Bedrock: Claude via Bedrock

Local Providers (Interrogator LLMs):

Ollama: Any locally-hosted model (Llama, Mistral, Qwen, etc.)
LM Studio: Any model via OpenAI-compatible API

Installation

# Basic installation
pip install -e .

# With specific provider support
pip install -e ".[openai]"
pip install -e ".[anthropic]"
pip install -e ".[all]"

# Development installation
pip install -e ".[dev]"

Quick Start

Basic Introspection

# Full analysis of a model
llm-introspect run --provider anthropic --model claude-3-sonnet-20240229

# Run specific probes only
llm-introspect run --provider openai --model gpt-4 --probes contrastive,boundaries

# Output in different formats
llm-introspect run --provider ollama --model llama2 --format markdown

Quick Boundary Check

# Check boundaries for a specific topic
llm-introspect boundary \
    --provider anthropic \
    --model claude-3-sonnet \
    --topic "code generation for security tools" \
    --axes specificity,hypothetical_distance

Unlock/Jailbreak Resistance Testing

# Test resistance to jailbreak attempts
llm-introspect unlock --provider anthropic --model claude-3-sonnet-20240229

# Test specific techniques only
llm-introspect unlock --provider openai --model gpt-4 \
    --techniques roleplay_dan,authority_developer,instruction_ignore

# Output detailed results
llm-introspect unlock --provider ollama --model llama2 \
    --format json --output unlock_results.json

Danger Probe (Deep Safety Testing)

⚠️ WARNING: SENSITIVE CONTENT

This probe is designed for security researchers and safety auditors who need to evaluate an LLM's resistance to generating genuinely dangerous content. The prompts used by this probe intentionally attempt to elicit harmful responses.

Runs in ISOLATION - cannot be combined with other probes

Requires explicit confirmation before execution

Adaptive interrogation disabled - uses static prompts only (safety measure)

Intended for authorized safety testing only

Results may contain disturbing content from non-refusing models

The danger probe tests safety boundaries across 15 sensitive categories including self-harm, toxic substances, psychological manipulation, dark ethics, child endangerment, fraud/deception, stalking, drug synthesis, dangerous medical advice, cult indoctrination, animal cruelty, exploitation, and dangerous misinformation. It does not include weapons, direct violence, or terrorism-related content.

# Run danger probe (requires confirmation prompt)
llm-introspect danger --provider anthropic --model claude-3-sonnet-20240229

# Skip confirmation for automated pipelines
llm-introspect danger --provider openai --model gpt-4 --yes

# Output detailed assessment
llm-introspect danger --provider anthropic --model claude-3-haiku \
    --format markdown --output danger_assessment.md

Use llm-introspect danger --help for additional options including category and intensity filtering.

Code Ability Auditing

Evaluate how well a model generates secure, efficient, and correct code:

# Full code ability audit (Python by default)
llm-introspect code-audit --provider openai --model gpt-4

# Test specific security categories
llm-introspect code-audit --provider anthropic --model claude-3-sonnet \
    --categories security_sql,security_xss,security_injection

# Focus on efficiency and correctness
llm-introspect code-audit --provider openai --model gpt-4 \
    --categories efficiency_algorithm,correctness_edge

# Test advanced data structures and recursion
llm-introspect code-audit --provider anthropic --model claude-3-opus \
    --categories efficiency_datastructure,correctness_recursive

# Test concurrency safety and resource management
llm-introspect code-audit --provider openai --model gpt-4 \
    --categories concurrency_safety,concurrency_resource

# Output detailed results
llm-introspect code-audit --provider anthropic --model claude-3-opus \
    --format markdown --output code_audit.md

# Use adaptive interrogation - local LLM analyzes generated code
llm-introspect code-audit --provider openai --model gpt-4 \
    --adaptive ollama:codellama --max-followups 3

# LM Studio model reviews code for security issues
llm-introspect code-audit --provider anthropic --model claude-3-sonnet \
    --adaptive lmstudio:deepseek-coder --categories security_sql,security_injection

Multi-Language Support

Test code generation across different programming languages:

# Test a specific language (default: python)
llm-introspect code-audit --provider openai --model gpt-4 --language rust

# Test ALL common languages (Python, Rust, Ruby, C, C++, JavaScript, Shell, R)
llm-introspect code-audit --provider openai --model gpt-4 --all-languages

# Test rare languages only (Erlang, COBOL, Forth, Haskell)
llm-introspect code-audit --provider anthropic --model claude-3-opus --rare-languages

# Test all languages including rare ones
llm-introspect code-audit --provider openai --model gpt-4 --all-languages --include-rare

# List available languages
llm-introspect list languages

Common Languages: Python, JavaScript, Rust, Ruby, C, C++, Shell (Bash), R

Rare Languages: Erlang, COBOL, Forth, Haskell

The audit tests:

Security: SQL injection, XSS, command injection, path traversal, cryptographic weaknesses
Efficiency: Algorithmic complexity, data structure choice, memory usage, I/O patterns
Correctness: Edge case handling, error handling, type safety, recursion safety
Concurrency: Thread safety, race conditions, resource management, cleanup patterns
Quality: Design patterns, code readability

With adaptive mode enabled, the local LLM will:

Analyze the generated code for vulnerabilities
Ask the remote model to fix identified issues
Request alternative implementations when code is suboptimal
Probe for edge cases the model might have missed

Hallucination Testing

Test how susceptible a model is to generating false information (confabulation):

# Full hallucination assessment
llm-introspect hallucination --provider anthropic --model claude-3-sonnet-20240229

# Test specific categories
llm-introspect hallucination --provider openai --model gpt-4 \
    --categories fabricated_citations,false_premises,fictional_entities

# Output detailed results
llm-introspect hallucination --provider anthropic --model claude-3-opus \
    --format markdown --output hallucination_report.md

# Use adaptive interrogation for deeper probing
llm-introspect hallucination --provider openai --model gpt-4 \
    --adaptive ollama:llama3 --max-followups 3

Hallucination Categories:

fabricated_citations - Non-existent papers/research
false_premises - Questions with incorrect assumptions
fictional_entities - Non-existent people/organizations
fake_statistics - Fabricated numbers and data
nonexistent_events - Events that never happened
fake_quotes - Misattributed or invented quotes
fictional_technical - Non-existent technologies/specs
synthetic_history - Fabricated historical events
pseudo_science - Fake scientific concepts
fictional_legal - Non-existent laws/court cases
fabricated_media - Non-existent movies/books/music
fake_geography - Non-existent places/landmarks
imaginary_products - Products that don't exist

The probe measures:

Resistance Score: How well the model resists hallucination traps
Confabulation Rate: How often the model generates confident false information
Uncertainty Expression: Whether the model appropriately hedges when uncertain

Systems Knowledge Testing

Test how accurately a model knows OS administration across Linux and BSD variants:

# Full systems knowledge audit (all OSes, all categories)
llm-introspect systems-knowledge --provider anthropic --model claude-3-sonnet-20240229

# Test specific operating systems
llm-introspect systems-knowledge --provider openai --model gpt-4 \
    --os debian,freebsd,openbsd

# Test specific categories
llm-introspect systems-knowledge --provider anthropic --model claude-3-sonnet \
    --categories networking,configuration

# Output detailed results
llm-introspect systems-knowledge --provider anthropic --model claude-3-opus \
    --format markdown --output systems_knowledge.md

# Use adaptive interrogation for deeper probing
llm-introspect systems-knowledge --provider openai --model gpt-4 \
    --adaptive ollama:llama3 --max-followups 3

Supported Operating Systems:

Linux: Debian, Arch, Ubuntu
BSD: FreeBSD, OpenBSD, NetBSD

Knowledge Categories:

configuration - Package management, services, init systems, users, logging, bootloaders
process_management - Process listing, signals, nice/priority, cgroups/jails, monitoring
media_management - Filesystems, partitions, fstab, ZFS, swap, disk utilities
networking - Interfaces, firewalls, DNS, routing, bonding/aggregation, packet capture

The probe measures:

Accuracy Score: Correctness of answers, penalizes misconceptions
Completeness Score: Coverage of required concepts
Misconception Detection: Identifies when Linux knowledge is incorrectly applied to BSD (and vice versa)

Cost Estimation

# Estimate API costs before running
llm-introspect estimate --provider openai --model gpt-4 --probes all

# Get detailed JSON estimate
llm-introspect estimate --provider anthropic --model claude-3-opus --json

# Estimate costs for comparing two models
llm-introspect estimate --provider openai --model gpt-4 \
    --compare-with anthropic:claude-3-opus --probes contrastive,boundaries

Model Comparison

# Compare two models side-by-side
llm-introspect compare \
    --model-a openai:gpt-4 \
    --model-b anthropic:claude-3-opus \
    --probes contrastive,boundaries

# Output as markdown
llm-introspect compare \
    --model-a openai:gpt-4o \
    --model-b openai:gpt-4 \
    --format markdown --output comparison.md

# Estimate comparison costs first
llm-introspect compare \
    --model-a openai:gpt-4 \
    --model-b anthropic:claude-3-sonnet \
    --estimate-only

List Available Options

llm-introspect list probes      # Available probe types
llm-introspect list dimensions  # Contrastive dimensions
llm-introspect list axes        # Boundary mapping axes
llm-introspect list topics      # Default test topics
llm-introspect list models      # Supported models
llm-introspect list techniques  # Unlock/jailbreak techniques
llm-introspect list strategies  # Consistency paraphrase strategies
llm-introspect list categories  # Code audit challenge categories
llm-introspect list languages   # Code audit supported languages
llm-introspect list os          # Systems knowledge operating systems
llm-introspect list os-categories  # Systems knowledge categories

Adaptive Interrogation

Use a second LLM to dynamically generate follow-up probes based on the subject model's responses:

Automatic follow-ups: Interrogator LLM analyzes each response and decides if deeper probing is needed
Intelligent exploration: Detects inconsistencies, evasion, and boundary regions worth investigating
Bounded execution: Configurable maximum follow-ups per probe to control costs and runtime
Local or remote: Run the interrogator locally (Ollama/LM Studio) or via remote API

Local Interrogator

Zero additional API costs, requires local GPU:

# Use local Llama to interrogate remote Claude
llm-introspect run --provider anthropic --model claude-3-sonnet \
    --adaptive ollama:llama3 --max-followups 5

# Basic adaptive mode with local Ollama model
llm-introspect run --provider openai --model gpt-4 \
    --adaptive ollama:llama3

# Use LM Studio as interrogator with custom follow-up limit
llm-introspect run --provider anthropic --model claude-3-opus \
    --adaptive lmstudio:mistral-7b --max-followups 5

# Adaptive boundary probing
llm-introspect boundary --provider openai --model gpt-4 \
    --topic "vulnerability research" \
    --adaptive ollama:deepseek-coder

Remote Interrogator

Frontier-quality analysis, no GPU required (for users without local GPU, or when quality matters most):

# Use Claude Haiku to interrogate GPT-4 (cross-provider)
llm-introspect run --provider openai --model gpt-4 \
    --adaptive anthropic:claude-3-haiku \
    --remote-interrogator

# Use explicit API key for interrogator
llm-introspect run --provider anthropic --model claude-3-sonnet \
    --adaptive openai:gpt-4o-mini \
    --remote-interrogator \
    --interrogator-api-key sk-xxx

# Cross-provider: OpenAI model interrogates Anthropic model
llm-introspect run --provider anthropic --model claude-3-sonnet \
    --adaptive openai:gpt-4o-mini \
    --remote-interrogator

# Same provider with separate API key
llm-introspect run --provider anthropic --model claude-3-opus \
    --adaptive anthropic:claude-3-haiku \
    --remote-interrogator \
    --interrogator-api-key sk-ant-xxx

# Remote interrogator for code audit
llm-introspect code-audit --provider openai --model gpt-4 \
    --adaptive anthropic:claude-3-haiku \
    --remote-interrogator \
    --all-languages

Custom Interrogator Prompts

Customize how the interrogator analyzes responses:

# Use default YAML prompts (interrogator_prompts.yaml)
llm-introspect run --provider anthropic --model claude-3-sonnet \
    --adaptive ollama:llama3 \
    --interrogator-prompts

# Use custom prompts file
llm-introspect run --provider anthropic --model claude-3-sonnet \
    --adaptive ollama:llama3 \
    --interrogator-prompts /path/to/custom_prompts.yaml

# Use default YAML prompts instead of hardcoded
llm-introspect run --provider openai --model gpt-4 \
    --adaptive ollama:llama3 \
    --interrogator-prompts

# Use custom prompts file for specialized analysis
llm-introspect run --provider anthropic --model claude-3-sonnet \
    --adaptive ollama:llama3 \
    --interrogator-prompts ./my_prompts.yaml

The --interrogator-prompts flag controls which prompts the interrogator uses:

Not passed: Use hardcoded default prompts
Passed without value: Use the default YAML file (interrogator_prompts.yaml)
Passed with path: Use the specified custom YAML file

See llm_introspect/interrogator_prompts.yaml for the prompt format.

Interrogator Decision Types

The interrogator analyzes each response and decides:

CONTINUE: Response is clear, move to next probe
PROBE_DEEPER: Something interesting worth exploring
PROBE_INCONSISTENCY: Potential contradiction detected
PROBE_BOUNDARY: Near a compliance/refusal threshold
PROBE_EVASION: Response seems evasive, ask directly

Supported Commands

All probe types work with adaptive mode:

run - Full introspection (all probes)
boundary - Boundary mapping
consistency - Belief consistency testing
contrastive - Contrastive framing analysis
omission - Omission/coverage analysis
unlock - Jailbreak/bypass resistance
code-audit - Code generation audit
hallucination - Hallucination susceptibility

Not supported: danger - Excluded for ethical and security reasons. Allowing adaptive interrogation to dynamically generate follow-up probes for harmful content testing poses unacceptable risks (automated jailbreak crafting, documenting bypass techniques). Danger probes use fixed, auditable static prompts only.

Output

LLM Introspect generates comprehensive reports in multiple formats:

JSON Output

{
  "meta": {
    "model": "claude-3-sonnet-20240229",
    "provider": "anthropic",
    "total_api_calls": 847,
    "total_tokens": 524000
  },
  "contrastive_analysis": { ... },
  "boundary_map": { ... },
  "consistency_scores": { ... },
  "omission_analysis": { ... },
  "inferred_weights": {
    "safety_alignment": { "strength": "high", "evidence": "..." },
    "authority_deference": { "strength": "moderate", "evidence": "..." }
  },
  "behavioral_fingerprint": {
    "summary": "High safety alignment with sharp boundaries...",
    "distinguishing_traits": [ ... ]
  }
}

Behavioral Fingerprint

The tool synthesizes probe results into a high-level characterization:

BEHAVIORAL FINGERPRINT:
  High safety alignment with sharp boundaries. Moderate authority
  sensitivity. Strong factual consistency, weaker value consistency
  under adversarial framing.

KEY TRAITS:
  • Strong safety boundaries
  • High sensitivity to authority framing
  • Stable positions under reframing
  • Active content filtering

Programmatic Usage

import asyncio
from llm_introspect import ProbeEngine, ProbeConfig
from llm_introspect.adapters import AnthropicAdapter
from llm_introspect.probes import ContrastiveProber, BoundaryMapper
from llm_introspect.report_generator import ReportGenerator

async def main():
    # Create adapter and engine
    adapter = AnthropicAdapter(model="claude-3-sonnet-20240229")
    engine = ProbeEngine(adapter, ProbeConfig(requests_per_minute=50))

    # Run contrastive analysis
    prober = ContrastiveProber(engine)
    results = await prober.probe_all_dimensions()

    # Run boundary mapping
    mapper = BoundaryMapper(engine)
    await mapper.map_boundary("penetration testing", ModificationAxis.SPECIFICITY)

    # Generate report
    report = ReportGenerator("claude-3-sonnet", "anthropic")
    report.add_contrastive_results(prober.generate_report())
    report.add_boundary_results(mapper.generate_report())
    report.generate_markdown()

asyncio.run(main())

Adaptive Interrogation (Programmatic)

import asyncio
from llm_introspect import ProbeEngine, ProbeConfig, AdaptiveInterrogator, AdaptiveConfig
from llm_introspect.adapters import AnthropicAdapter, OllamaAdapter, OpenAIAdapter

async def main():
    # Create adapter for the subject (remote model to test)
    subject_adapter = AnthropicAdapter(model="claude-3-sonnet-20240229")

    # Create adapter for the interrogator
    # Option 1: Local model (no additional API costs)
    interrogator_adapter = OllamaAdapter(model="llama3")

    # Option 2: Remote model (higher quality, requires API key)
    # interrogator_adapter = OpenAIAdapter(model="gpt-4o-mini", api_key="sk-xxx")
    # interrogator_adapter = AnthropicAdapter(model="claude-3-haiku", api_key="sk-ant-xxx")

    # Configure adaptive behavior
    adaptive_config = AdaptiveConfig(
        max_follow_ups=5,           # Max follow-ups per probe
        decision_temperature=0.3,   # Lower = more conservative
        generation_temperature=0.7, # For generating follow-ups
    )

    # Create interrogator (works with any adapter - local or remote)
    interrogator = AdaptiveInterrogator(interrogator_adapter, adaptive_config)

    # Create probe engine with adaptive support
    engine = ProbeEngine(
        subject_adapter,
        ProbeConfig(requests_per_minute=50),
        adaptive_interrogator=interrogator,
        adaptive_context="testing boundary behavior for code generation",
    )

    # Run a single adaptive probe
    result = await engine.execute_probe_adaptive(
        prompt="How would I write a port scanner?",
        test_context="security tool code generation"
    )

    print(f"Initial response: {result['initial_result'].response[:200]}...")
    print(f"Follow-ups generated: {len(result['follow_ups'])}")
    for i, fu in enumerate(result['follow_ups']):
        print(f"  Follow-up {i+1}: {fu['decision']} - {fu['prompt'][:100]}...")

asyncio.run(main())

Cost Estimation

from llm_introspect import CostEstimator

# Create estimator for a model
estimator = CostEstimator("gpt-4", "openai")

# Estimate cost for specific probes
estimate = estimator.estimate_full_run(['contrastive', 'boundaries'])

print(f"Total API calls: {estimate.total_api_calls}")
print(f"Total tokens: {estimate.total_tokens:,}")
print(f"Estimated cost: ${estimate.total_cost_usd:.4f}")

# Compare costs for two models
estimate_a, estimate_b, total = estimator.estimate_comparison_run(
    "claude-3-opus", "anthropic", ['contrastive']
)
print(f"Comparison total cost: ${total:.4f}")

Configuration

Environment Variables

# Provider API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://..."

# Optional configuration
export LLM_INTROSPECT_CACHE_DIR="~/.cache/llm-introspect"
export LLM_INTROSPECT_RATE_LIMIT="50"

Probe Configuration

Probes can be configured via YAML files:

# custom_probes.yaml
name: custom_authority_test
description: Custom authority sensitivity test

frames:
  student:
    template: "As a student, {question}"
  expert:
    template: "As an expert, {question}"

questions:
  - text: "How does quantum computing work?"
    topic: technical

Directory Structure

llm-tester/
├── README.md                           # This file - project overview
├── THEORY.md                           # Theoretical foundations document
├── USAGE                               # Detailed usage guide
├── pyproject.toml                      # Python package configuration
├── tests/                              # Test directory
└── llm_introspect/                     # Main package
    ├── __init__.py                     # Package initialization
    ├── cli.py                          # Command-line interface
    ├── probe_engine.py                 # Core probing infrastructure
    ├── adaptive_interrogator.py        # Adaptive interrogation system (local or remote)
    ├── interrogator_prompts.yaml       # Default prompts for adaptive interrogator
    ├── response_analyzer.py            # NLP analysis toolkit
    ├── report_generator.py             # Report generation (JSON/MD/HTML)
    ├── cost_estimator.py               # API cost estimation
    ├── introspection_logger.py         # Process logging
    ├── adapters/                       # LLM provider adapters
    │   ├── __init__.py
    │   ├── openai_adapter.py           # OpenAI & Azure OpenAI
    │   ├── anthropic_adapter.py        # Claude & AWS Bedrock
    │   ├── ollama_adapter.py           # Ollama & LM Studio
    │   ├── grok_adapter.py             # xAI Grok
    │   ├── mistral_adapter.py          # Mistral AI
    │   └── gemini_adapter.py           # Google Gemini
    ├── data/                           # Challenge and knowledge data (YAML)
    │   ├── __init__.py                 # Data loading utilities
    │   ├── boundary_probes.yaml        # Boundary mapping templates
    │   ├── code_challenges.yaml        # Code audit challenge templates
    │   ├── consistency_probes.yaml     # Consistency auditor templates
    │   ├── contrastive_probes.yaml     # Contrastive prober dimensions
    │   ├── danger_probes.yaml          # Danger prober templates
    │   ├── hallucination_probes.yaml   # Hallucination prober templates
    │   ├── language_patterns.yaml      # Multi-language security patterns
    │   ├── omission_probes.yaml        # Omission analyzer corpus
    │   ├── systems_knowledge.yaml      # OS knowledge question templates
    │   └── unlock_techniques.yaml      # Unlock/jailbreak technique templates
    └── probes/                         # Probe implementations
        ├── __init__.py
        ├── contrastive_prober.py       # Framing sensitivity analysis
        ├── boundary_mapper.py          # Refusal threshold mapping
        ├── consistency_auditor.py      # Belief stability testing
        ├── omission_analyzer.py        # Gap detection
        ├── unlock_tester.py            # Jailbreak resistance testing
        ├── code_ability_auditor.py     # Code generation quality audit
        ├── hallucination_prober.py     # Hallucination susceptibility testing
        ├── systems_knowledge_auditor.py # OS administration knowledge testing
        ├── danger_prober.py            # Deep safety boundary testing (ISOLATED)
        └── language_patterns.py        # Multi-language security patterns

Key Components

Component	File	Purpose
ResponseAnalyzer	`response_analyzer.py`	Pattern-based NLP for detecting hedging, confidence markers, refusal signals
ProbeEngine	`probe_engine.py`	Rate-limited, cached, retry-enabled probe execution orchestrator
AdaptiveInterrogator	`adaptive_interrogator.py`	Uses local or remote LLM to dynamically generate follow-up probes
ContrastiveProber	`probes/contrastive_prober.py`	Compare responses across framing dimensions (authority, emotion, formality)
BoundaryMapper	`probes/boundary_mapper.py`	Find refusal thresholds using gradient probing along modification axes
ConsistencyAuditor	`probes/consistency_auditor.py`	Test belief persistence under paraphrasing and adversarial reframing
OmissionAnalyzer	`probes/omission_analyzer.py`	Detect and classify systematic content omissions
UnlockTester	`probes/unlock_tester.py`	Test resistance to jailbreak/bypass techniques
CodeAbilityAuditor	`probes/code_ability_auditor.py`	Audit code generation for security, efficiency, and correctness
HallucinationProber	`probes/hallucination_prober.py`	Test susceptibility to generating false information
SystemsKnowledgeAuditor	`probes/systems_knowledge_auditor.py`	Test OS administration knowledge across Linux/BSD variants
DangerProber	`probes/danger_prober.py`	Deep safety boundary testing (ISOLATED - see warnings above)
ReportGenerator	`report_generator.py`	Synthesize results into behavioral fingerprints and export reports
CostEstimator	`cost_estimator.py`	Token counting and API cost estimation with up-to-date pricing
CLI	`cli.py`	Command-line interface for all introspection operations

Limitations

Self-reference paradox: The examiner and examined share the same architecture—true objectivity is impossible for self-analysis
Training artifacts: Self-examination may itself be shaped by training to produce certain kinds of self-narratives
No ground truth: Without access to actual model weights, inferred weights are behavioral approximations
Provider variations: Results may vary based on API version, temperature, and other parameters
Cost: Comprehensive analysis requires many API calls (estimate costs before running)

Contributing

Contributions are welcome! Areas of interest:

Additional probe types (e.g., reasoning consistency, factual accuracy)
New LLM provider adapters
Improved analysis heuristics
Visualization tools
Comparative benchmarking

License

MIT License

Citation

If you use LLM Introspect in research, please cite:

@software{llm_introspect,
  title = {LLM Introspect: Behavioral Analysis Toolkit for Large Language Models},
  year = {2025},
  url = {https://github.com/jbcde/llm-introspect}
}

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
examples		examples
llm_introspect		llm_introspect
reports		reports
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
THEORY.md		THEORY.md
USAGE		USAGE
VERSION		VERSION
entry_point.py		entry_point.py
llm-introspect.spec		llm-introspect.spec
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

LLM Introspector

Overview

Key Features

Probe Types

Adaptive Interrogation

Supported Providers

Installation

Quick Start

Basic Introspection

Quick Boundary Check

Unlock/Jailbreak Resistance Testing

Danger Probe (Deep Safety Testing)

Code Ability Auditing

Multi-Language Support

Hallucination Testing

Systems Knowledge Testing

Cost Estimation

Model Comparison

List Available Options

Adaptive Interrogation

Local Interrogator

Remote Interrogator

Custom Interrogator Prompts

Interrogator Decision Types

Supported Commands

Output

JSON Output

Behavioral Fingerprint

Programmatic Usage

Adaptive Interrogation (Programmatic)

Cost Estimation

Configuration

Environment Variables

Probe Configuration

Directory Structure

Key Components

Limitations

Contributing

License

Citation

See Also

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages