NLP Pipeline for Support Ticket Routing and Summarization
Fast routing • Auto-summarization • Entity extraction
Live Demo — interactive demo on synthetic data
Support teams triage large volumes of unstructured tickets by hand: figuring out which team a ticket belongs to, reading long threads, and pulling out the key details. Ticket Intel automates the first pass — it routes a ticket to a category, produces an extractive summary, and extracts entities, keywords, and sentiment — using a classic, cheap, interpretable ML stack rather than a large language model.
Why not an LLM? For high-volume routing you rarely need GPT-class reasoning to know that "refund please" is a billing ticket. A TF-IDF + Naive Bayes pipeline trains in milliseconds, predicts in well under a millisecond per ticket, costs effectively nothing to run, and is fully interpretable. The router is abstracted so a transformer can be dropped in later (see Future Work).
- Real benchmark — Banking77:
13,083 real customer-banking queries labelled with 77 fine-grained intents
(
card_arrival,card_payment_fee_charged,pending_top_up, …). A widely-cited intent-classification benchmark. This is what the reported results are measured on. It is not committed — see How to Run for the download.src/data/loader.pyauto-detects thetext/categorycolumns. - Zero-setup demo: a balanced synthetic set of 90 example tickets in
src/models/router.py(DEMO_DATA), used automatically when no dataset is present so the API, dashboard, and tests run with no download. Six categories:Account,Billing,Bug,Feature Request,General,Performance.
| Component | Approach | Notes |
|---|---|---|
| Routing | TfidfVectorizer (1–2 grams) → MultinomialNB, wrapped in an sklearn Pipeline |
Preprocessing lives inside the Pipeline, so there is no train/test leakage |
| Summarization | Extractive — word-frequency sentence scoring | No hallucination risk; pure stdlib |
| Entity extraction | Rule-based regex (email, money, date, version, error code, URL) | Deterministic, no model |
| Sentiment / keywords | Lexicon polarity + frequency | Lightweight heuristics |
| Evaluation | Stratified out-of-fold cross-validation + a most-frequent-class baseline | src/models/evaluate.py |
| API | FastAPI (async) | /route, /summarize, /insights, /batch, /health |
| UI | Streamlit | app.py (single-file demo) and src/ui/dashboard.py (full dashboard) |
Hyperparameters and the random seed live in src/config.py.
Measured with 5-fold stratified out-of-fold cross-validation — every scored prediction is on data the model did not train on — against a most-frequent-class baseline. Reproduce with
python -m src.models.evaluate --input tickets.csv.
Banking77 — 77-intent classification (13,083 queries):
| Metric | Model | Most-frequent baseline |
|---|---|---|
| Accuracy | 0.820 | 0.017 |
| Macro F1 | 0.817 | — |
| Weighted F1 | 0.818 | — |
A ~47× lift over the baseline on a genuinely hard 77-class problem, from a plain
TF-IDF + Naive Bayes pipeline. Error analysis (per-class F1 in metrics.json):
the model nails lexically distinctive intents (passcode_forgotten 0.98,
apple_pay_or_google_pay 0.97) and struggles where intents overlap
(pending_top_up 0.54, top_up_failed 0.64) — a sensible, expected failure mode.
Transformer models reach ~0.93 on Banking77, which sets a concrete target for the
transformer upgrade noted in Future Work.
Zero-setup sanity check (synthetic DEMO_DATA, 90 examples): 0.53 accuracy
vs 0.17 baseline — confirms the pipeline learns signal with no download required
(python main.py evaluate).
git clone https://github.com/CCallahan308/ticket-intel.git
cd ticket-intel
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python main.py train # train the router (uses synthetic demo data)
python main.py evaluate # cross-validate + write metrics.json
python main.py api # FastAPI server -> http://localhost:8000/docs
python main.py ui # full Streamlit dashboard
streamlit run app.py # single-file demo appTrain/evaluate on the real Banking77 dataset:
# Download the official train/test splits and combine into tickets.csv
B=https://raw.githubusercontent.com/PolyAI-LDN/task-specific-datasets/master/banking_data
curl -sL $B/train.csv -o b77_train.csv && curl -sL $B/test.csv -o b77_test.csv
python -c "import pandas as pd; pd.concat([pd.read_csv('b77_train.csv'), pd.read_csv('b77_test.csv')]).to_csv('tickets.csv', index=False)"
python src/models/train_router.py --input tickets.csv # train on real data
python -m src.models.evaluate --input tickets.csv # -> metrics.jsonRun the tests:
pytest -qdocker build -t ticket-intel-api -f Dockerfile.api .
docker run -p 8000:8000 ticket-intel-api
docker build -t ticket-intel-ui -f Dockerfile.ui .
docker run -p 8501:8501 ticket-intel-uisrc/
├── config.py # paths, hyperparameters, random seed (single source of truth)
├── api/ # FastAPI routes + Pydantic schemas
├── models/
│ ├── router.py # TF-IDF + Naive Bayes pipeline (+ saved metadata)
│ ├── summarizer.py # extractive summarizer
│ ├── insights.py # regex entities, keywords, sentiment
│ ├── train_router.py # training CLI
│ ├── evaluate.py # cross-validation + baseline -> metrics.json
│ └── artifacts/ # saved model + metadata + metrics (git-ignored)
├── data/loader.py # dataset loading, column detection, text cleaning
└── ui/ # Streamlit dashboard components
app.py # single-file Streamlit demo (wired to the real model)
main.py # CLI entry point (api | ui | train | evaluate)
notebooks/01_EDA.ipynb # exploratory data analysis (run against tickets.csv)
- Transformer router.
TicketRouteris abstracted so a fine-tuned DistilBERT/RoBERTa classifier could replace TF-IDF+NB when the accuracy budget justifies the compute. (Not yet implemented.) - Hyperparameter search. The TF-IDF/NB params in
config.pyare sensible defaults;GridSearchCVon a real dataset would tune them. - Class imbalance handling. Real ticket data is imbalanced; add
class_weight/ resampling and report weighted F1. - Held-out test set + monitoring once trained on real data at scale.
MIT