Adobe Hackathon 2025 – Round 1A & 1B

This repository contains Dockerized solutions for both Round 1A and Round 1B of the Adobe Hackathon 2025.

Folder Structure

├── 1A/

│ ├── input/ # Input PDFs

│ ├── output/ # Output JSONs (outline extraction)

│ ├── Dockerfile # Docker setup for 1A

│ ├── process_pdfs.py # Extracts titles and hierarchical headings

│ └── run_all.bat # Script to run the 1A pipeline

│

├── 1B/

│ ├── docs/ # Contains container1/, container2/, ... with PDFs

│ ├── input/ # Input JSONs (e.g., input1.json, input2.json)

│ ├── output/ # Output JSONs (e.g., output1.json, output2.json)

│ ├── Dockerfile # Docker setup for 1B

│ ├── process_documents.py # Persona-driven document analysis

│ └── run_all.bat # Script to run all 1B test cases

Round 1A – PDF Outline Extraction

What It Does

Extracts the title from the first page of each PDF
Identifies and organizes hierarchical headings (H1, H2, H3)
Outputs a structured JSON for each PDF

How to Run (1A)

Place your PDFs inside the 1A/input/ folder.
Run the following command in your terminal run_all.bat

Output will be saved to 1A/output/ as one .json per .pdf.

Round 1B – Persona-Driven Document Intelligence

What It Does

Given a persona and a job-to-be-done, the system:
Selects the most relevant sections from multiple PDFs
Ranks their importance
Performs fine-grained sub-section analysis
Outputs a structured output.json for each test case

Input Format

Each test case includes:

docs/containerX/ # Folder with PDFs
input/inputX.json # JSON describing the persona and task
Example: docs/container1/ contains relevant PDFs input/input1.json contains the persona and task definition

How to Run (1B)

Place PDF sets in: docs/container1/, docs/container2/, etc. Create matching input JSON files: input/input1.json, input/input2.json, etc.
Run the following command in your terminal run_all.bat

Output will be saved in: output/output1.json, output/output2.json

Model Constraints

✅ CPU-only execution

✅ Model size < 1GB

✅ No internet access during execution

✅ Should process 3–5 documents in under 60 seconds

Docker Information

Docker ensures platform-independent, reproducible builds.
network none enforces offline execution
To Build Docker Image Manually docker build --platform linux/amd64 -t pdf-processor .

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
1A		1A
1B		1B
ReadMe.md		ReadMe.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adobe Hackathon 2025 – Round 1A & 1B

Folder Structure

Round 1A – PDF Outline Extraction

What It Does

How to Run (1A)

Round 1B – Persona-Driven Document Intelligence

What It Does

Input Format

How to Run (1B)

Model Constraints

Docker Information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adobe Hackathon 2025 – Round 1A & 1B

Folder Structure

Round 1A – PDF Outline Extraction

What It Does

How to Run (1A)

Round 1B – Persona-Driven Document Intelligence

What It Does

Input Format

How to Run (1B)

Model Constraints

Docker Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages