CLMN

Concept-based Language Models via Neural Symbolic Reasoning

Bridging Performance and Interpretability in NLP through Neural-Symbolic Reasoning

Deep learning models in NLP often function as "black boxes", limiting their adoption in high-stakes domains like healthcare and finance where transparency is essential. CLMN is a novel neural-symbolic framework that reconciles performance and interpretability — achieving state-of-the-art accuracy while providing human-readable logical explanations for every prediction.

Architecture

The input sentence is processed by a PLM. The Concept Layer predicts specific aspects (e.g., food, service), which are then fed into a Concept Reasoning Layer (using fuzzy logic) and combined with the PLM's features for the final sentiment prediction.

Key Features

Continuous Concept Embeddings

Projects concepts into an interpretable space while preserving semantic information — no information loss from rigid binary bottlenecks.

Neural-Symbolic Reasoning

Utilizes fuzzy logic-based reasoning to model dynamic concept interactions — negation, contextual modification, and more.

Joint Training

Supplements original text features with concept-aware representations to achieve superior performance without sacrificing interpretability.

Quick Start

# Clone the repository
git clone https://github.com/YourUsername/CLMN.git
cd CLMN

# Install dependencies
pip install torch transformers gensim datasets scikit-learn pandas tqdm

# Train the model (joint mode with BERT backbone)
cd run_cebab
python cbm_joint.py

Dataset

The project utilizes an augmented version of the CEBaB dataset, referred to as aug-CEBaB-yelp.

Property	Details
Source	Human-annotated concepts: Food, Ambiance, Service, Noise
Augmentation	ChatGPT-generated concepts: Cleanliness, Price, Menu Variety, etc.
Labels	Each concept classified as `Positive`, `Negative`, or `Unknown`
Base Data	Yelp restaurant reviews

Usage

Configuration

Key hyperparameters used in the paper:

max_len            = 512
num_epochs         = 25
batch_size         = 8
concept_loss_weight = 100   # α₁
y2_weight          = 10     # α₂

Supported Backbones

Backbone	Model Name	Notes
BERT	`bert-base-uncased`	Default, best overall
RoBERTa	`roberta-base`	Highest original accuracy
GPT-2	`gpt2`	Autoregressive baseline
LSTM	`lstm`	Uses FastText embeddings

Training

# In the script, set:
mode = 'joint'
data_type = "aug_cebab_yelp"
model_name = "bert-base-uncased"  # or roberta-base, gpt2, lstm

cd run_cebab
python cbm_joint.py

Results

CLMN demonstrates that interpretability does not require sacrificing accuracy. Extensive experiments show CLMN outperforms existing concept-based methods in both accuracy and explanation quality.

Performance on aug-CEBaB-yelp

Backbone	O-Acc	O-F1	C-Acc (Concept)	R-F1 (Reasoning)
BERT	69.49	79.72	85.85	76.49
RoBERTa	80.92	71.21	86.09	76.51
GPT-2	75.39	63.39	85.18	75.76
LSTM	65.65	47.54	66.60	57.10

Interpretability

CLMN provides transparent, human-readable explanations by explicitly deriving the logic behind every prediction:

Step 1  Concept Extraction    →  "food was good" (✅ Positive Food)
                                  "loud"          (❌ Negative Noise)

Step 2  Logical Reasoning     →  food ∧ ¬noise ∧ ¬price ...

Step 3  Final Prediction      →  ★★★★ (4/5 rating)

This derivation process allows users to verify why the model assigned a specific rating, addressing the trust issues inherent in black-box models.

Example of CLMN's interpretable reasoning pipeline on a restaurant review.

Citation

If you use this code or findings in your research, please cite:

@article{yang2025clmn,
  title   = {CLMN: Concept based Language Models via Neural Symbolic Reasoning},
  author  = {Yang, Yibo},
  journal = {arXiv preprint arXiv:2510.10063},
  year    = {2025}
}

Made with passion for interpretable AI

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dataset		dataset
resources		resources
run_cebab		run_cebab
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLMN

Concept-based Language Models via Neural Symbolic Reasoning

Table of Contents

Architecture

Key Features

Continuous Concept Embeddings

Neural-Symbolic Reasoning

Joint Training

Quick Start

Dataset

Usage

Configuration

Supported Backbones

Training

Results

Performance on aug-CEBaB-yelp

Interpretability

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CLMN

Concept-based Language Models via Neural Symbolic Reasoning

Table of Contents

Architecture

Key Features

Continuous Concept Embeddings

Neural-Symbolic Reasoning

Joint Training

Quick Start

Dataset

Usage

Configuration

Supported Backbones

Training

Results

Performance on aug-CEBaB-yelp

Interpretability

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages