Automated thesis/dissertation DOCX formatting workflow: extract formatting requirements from a template, audit your paper against them, and produce a corrected copy — without touching the original file.
| Input | Output | |
|---|---|---|
| Template | A .docx thesis template (school/department format standard) |
requirements.json + requirements.md — every detectable formatting rule with evidence |
| Your paper | Your .docx thesis or paper |
format_audit.csv + format_audit.md — per-item pass/fail report |
| Fix | — | paper_formatted.docx — a new corrected copy; a changes.csv + changes.md detailing what was fixed, skipped, or left for manual review |
The original paper file is never modified. All corrections are written to a new copy.
flowchart LR
A[Template DOCX] --> B[extract_requirements.py]
B --> C{requirements.json}
D[Your Paper DOCX] --> E[audit_format.py]
C --> E
E --> F{format_audit.csv}
C --> G[apply_format.py]
F --> G
D --> G
G --> H[paper_formatted.docx]
G --> I[changes.csv]
Or in plain text:
Template → extract requirements → Paper → audit → Formatted copy + Change report
The extraction inspects all accessible DOCX parts: styles, numbering, headers, footers, page setup, sections, tables, captions, footnotes, endnotes, text boxes, fields, hyperlinks, and more.
The built-in format checklist covers 200+ inspection points as a minimum baseline. Template-specific rules found in comments, examples, or inline instructions are added on top.
git clone https://github.com/wksudud/format-thesis-docx.git
cd format-thesis-docxNo external dependencies — all scripts use Python 3 standard library only.
Run the three scripts in order:
# Step 1: Extract formatting requirements from the template
python scripts/extract_requirements.py --template "template.docx" --out "work/requirements"
# Step 2: Audit your paper against the extracted requirements
python scripts/audit_format.py --requirements "work/requirements/requirements.json" --paper "paper.docx" --out "work/audit"
# Step 3: Create a formatted copy (original is left untouched)
python scripts/apply_format.py --requirements "work/requirements/requirements.json" --audit "work/audit/format_audit.csv" --paper "paper.docx" --out "work/fixed"| File | Description |
|---|---|
requirements.json / requirements.md |
All extracted template requirements with source evidence |
format_audit.csv / format_audit.md |
Every checked item — passes and failures |
paper_formatted.docx |
New copy with automatic fixes applied |
changes.csv / changes.md |
Before/after changes, skipped items, unresolved tasks |
format-thesis-docx/
README.md
LICENSE
SKILL.md # Skill definition (Claude Code / Multica)
scripts/
extract_requirements.py # Extract formatting rules from template DOCX
audit_format.py # Audit paper against requirements
apply_format.py # Apply fixes and produce corrected copy
references/
docx-format-checklist.md # Minimum anti-omission checklist (200+ items)
agents/
openai.yaml # Agent configuration
- DOCX only —
.doc,.pdf, and WPS-only formats are not supported; convert to.docxfirst. - Formatting only — the tool does not rewrite academic content, translate text, alter citations, or change figure numbering.
- Not fully automatic — some requirements (e.g., "caption must be above the figure") may involve judgment calls. When an automatic repair could damage content, the item is left unchanged and reported as a manual task.
- Review the output — always review
changes.mdto confirm what was changed and what needs your attention.
This repository contains only code and documentation. It does not include real thesis data, sample papers with personal information, API keys, tokens, or credentials.
When using the tool, your paper and template remain on your local machine. The scripts do not upload anything.