Skip to content

wksudud/format-thesis-docx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

format-thesis-docx

Automated thesis/dissertation DOCX formatting workflow: extract formatting requirements from a template, audit your paper against them, and produce a corrected copy — without touching the original file.

Input / Output Contract

Input Output
Template A .docx thesis template (school/department format standard) requirements.json + requirements.md — every detectable formatting rule with evidence
Your paper Your .docx thesis or paper format_audit.csv + format_audit.md — per-item pass/fail report
Fix paper_formatted.docx — a new corrected copy; a changes.csv + changes.md detailing what was fixed, skipped, or left for manual review

The original paper file is never modified. All corrections are written to a new copy.

Workflow

flowchart LR
    A[Template DOCX] --> B[extract_requirements.py]
    B --> C{requirements.json}
    D[Your Paper DOCX] --> E[audit_format.py]
    C --> E
    E --> F{format_audit.csv}
    C --> G[apply_format.py]
    F --> G
    D --> G
    G --> H[paper_formatted.docx]
    G --> I[changes.csv]
Loading

Or in plain text:

Template → extract requirements → Paper → audit → Formatted copy + Change report

What It Checks

The extraction inspects all accessible DOCX parts: styles, numbering, headers, footers, page setup, sections, tables, captions, footnotes, endnotes, text boxes, fields, hyperlinks, and more.

The built-in format checklist covers 200+ inspection points as a minimum baseline. Template-specific rules found in comments, examples, or inline instructions are added on top.

Installation

git clone https://github.com/wksudud/format-thesis-docx.git
cd format-thesis-docx

No external dependencies — all scripts use Python 3 standard library only.

Usage

Run the three scripts in order:

# Step 1: Extract formatting requirements from the template
python scripts/extract_requirements.py --template "template.docx" --out "work/requirements"

# Step 2: Audit your paper against the extracted requirements
python scripts/audit_format.py --requirements "work/requirements/requirements.json" --paper "paper.docx" --out "work/audit"

# Step 3: Create a formatted copy (original is left untouched)
python scripts/apply_format.py --requirements "work/requirements/requirements.json" --audit "work/audit/format_audit.csv" --paper "paper.docx" --out "work/fixed"

Expected Outputs

File Description
requirements.json / requirements.md All extracted template requirements with source evidence
format_audit.csv / format_audit.md Every checked item — passes and failures
paper_formatted.docx New copy with automatic fixes applied
changes.csv / changes.md Before/after changes, skipped items, unresolved tasks

Project Structure

format-thesis-docx/
  README.md
  LICENSE
  SKILL.md                              # Skill definition (Claude Code / Multica)
  scripts/
    extract_requirements.py             # Extract formatting rules from template DOCX
    audit_format.py                     # Audit paper against requirements
    apply_format.py                     # Apply fixes and produce corrected copy
  references/
    docx-format-checklist.md            # Minimum anti-omission checklist (200+ items)
  agents/
    openai.yaml                         # Agent configuration

Limitations

  • DOCX only.doc, .pdf, and WPS-only formats are not supported; convert to .docx first.
  • Formatting only — the tool does not rewrite academic content, translate text, alter citations, or change figure numbering.
  • Not fully automatic — some requirements (e.g., "caption must be above the figure") may involve judgment calls. When an automatic repair could damage content, the item is left unchanged and reported as a manual task.
  • Review the output — always review changes.md to confirm what was changed and what needs your attention.

Privacy

This repository contains only code and documentation. It does not include real thesis data, sample papers with personal information, API keys, tokens, or credentials.

When using the tool, your paper and template remain on your local machine. The scripts do not upload anything.

License

MIT

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages