This repository contains Dockerized solutions for both Round 1A and Round 1B of the Adobe Hackathon 2025.
├── 1A/
│ ├── input/ # Input PDFs
│ ├── output/ # Output JSONs (outline extraction)
│ ├── Dockerfile # Docker setup for 1A
│ ├── process_pdfs.py # Extracts titles and hierarchical headings
│ └── run_all.bat # Script to run the 1A pipeline
│
├── 1B/
│ ├── docs/ # Contains container1/, container2/, ... with PDFs
│ ├── input/ # Input JSONs (e.g., input1.json, input2.json)
│ ├── output/ # Output JSONs (e.g., output1.json, output2.json)
│ ├── Dockerfile # Docker setup for 1B
│ ├── process_documents.py # Persona-driven document analysis
│ └── run_all.bat # Script to run all 1B test cases
- Extracts the title from the first page of each PDF
- Identifies and organizes hierarchical headings (H1, H2, H3)
- Outputs a structured JSON for each PDF
- Place your PDFs inside the
1A/input/folder. - Run the following command in your terminal
run_all.bat
Output will be saved to 1A/output/ as one .json per .pdf.
- Given a persona and a job-to-be-done, the system:
- Selects the most relevant sections from multiple PDFs
- Ranks their importance
- Performs fine-grained sub-section analysis
- Outputs a structured output.json for each test case
Each test case includes:
- docs/containerX/ # Folder with PDFs
- input/inputX.json # JSON describing the persona and task
- Example: docs/container1/ contains relevant PDFs input/input1.json contains the persona and task definition
- Place PDF sets in:
docs/container1/,docs/container2/, etc. Create matching input JSON files:input/input1.json,input/input2.json, etc. - Run the following command in your terminal
run_all.bat
Output will be saved in:
output/output1.json,
output/output2.json
✅ CPU-only execution
✅ Model size < 1GB
✅ No internet access during execution
✅ Should process 3–5 documents in under 60 seconds
- Docker ensures platform-independent, reproducible builds.
- network none enforces offline execution
- To Build Docker Image Manually
docker build --platform linux/amd64 -t pdf-processor .