Bridging Performance and Interpretability in NLP through Neural-Symbolic Reasoning
Deep learning models in NLP often function as "black boxes", limiting their adoption in high-stakes domains like healthcare and finance where transparency is essential. CLMN is a novel neural-symbolic framework that reconciles performance and interpretability — achieving state-of-the-art accuracy while providing human-readable logical explanations for every prediction.
The input sentence is processed by a PLM. The Concept Layer predicts specific aspects (e.g., food, service), which are then fed into a Concept Reasoning Layer (using fuzzy logic) and combined with the PLM's features for the final sentiment prediction.
|
Projects concepts into an interpretable space while preserving semantic information — no information loss from rigid binary bottlenecks. |
Utilizes fuzzy logic-based reasoning to model dynamic concept interactions — negation, contextual modification, and more. |
Supplements original text features with concept-aware representations to achieve superior performance without sacrificing interpretability. |
# Clone the repository
git clone https://github.com/YourUsername/CLMN.git
cd CLMN
# Install dependencies
pip install torch transformers gensim datasets scikit-learn pandas tqdm
# Train the model (joint mode with BERT backbone)
cd run_cebab
python cbm_joint.pyThe project utilizes an augmented version of the CEBaB dataset, referred to as aug-CEBaB-yelp.
| Property | Details |
|---|---|
| Source | Human-annotated concepts: Food, Ambiance, Service, Noise |
| Augmentation | ChatGPT-generated concepts: Cleanliness, Price, Menu Variety, etc. |
| Labels | Each concept classified as Positive, Negative, or Unknown |
| Base Data | Yelp restaurant reviews |
Key hyperparameters used in the paper:
max_len = 512
num_epochs = 25
batch_size = 8
concept_loss_weight = 100 # α₁
y2_weight = 10 # α₂| Backbone | Model Name | Notes |
|---|---|---|
| BERT | bert-base-uncased |
Default, best overall |
| RoBERTa | roberta-base |
Highest original accuracy |
| GPT-2 | gpt2 |
Autoregressive baseline |
| LSTM | lstm |
Uses FastText embeddings |
# In the script, set:
mode = 'joint'
data_type = "aug_cebab_yelp"
model_name = "bert-base-uncased" # or roberta-base, gpt2, lstmcd run_cebab
python cbm_joint.pyCLMN demonstrates that interpretability does not require sacrificing accuracy. Extensive experiments show CLMN outperforms existing concept-based methods in both accuracy and explanation quality.
| Backbone | O-Acc | O-F1 | C-Acc (Concept) | R-F1 (Reasoning) |
|---|---|---|---|---|
| BERT | 69.49 | 79.72 | 85.85 | 76.49 |
| RoBERTa | 80.92 | 71.21 | 86.09 | 76.51 |
| GPT-2 | 75.39 | 63.39 | 85.18 | 75.76 |
| LSTM | 65.65 | 47.54 | 66.60 | 57.10 |
CLMN provides transparent, human-readable explanations by explicitly deriving the logic behind every prediction:
Step 1 Concept Extraction → "food was good" (✅ Positive Food)
"loud" (❌ Negative Noise)
Step 2 Logical Reasoning → food ∧ ¬noise ∧ ¬price ...
Step 3 Final Prediction → ★★★★ (4/5 rating)
This derivation process allows users to verify why the model assigned a specific rating, addressing the trust issues inherent in black-box models.
If you use this code or findings in your research, please cite:
@article{yang2025clmn,
title = {CLMN: Concept based Language Models via Neural Symbolic Reasoning},
author = {Yang, Yibo},
journal = {arXiv preprint arXiv:2510.10063},
year = {2025}
}Made with passion for interpretable AI
