Skip to content

MNJXT/Adobe

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

Adobe Hackathon 2025 – Round 1A & 1B

This repository contains Dockerized solutions for both Round 1A and Round 1B of the Adobe Hackathon 2025.


Folder Structure

├── 1A/

│ ├── input/ # Input PDFs

│ ├── output/ # Output JSONs (outline extraction)

│ ├── Dockerfile # Docker setup for 1A

│ ├── process_pdfs.py # Extracts titles and hierarchical headings

│ └── run_all.bat # Script to run the 1A pipeline

├── 1B/

│ ├── docs/ # Contains container1/, container2/, ... with PDFs

│ ├── input/ # Input JSONs (e.g., input1.json, input2.json)

│ ├── output/ # Output JSONs (e.g., output1.json, output2.json)

│ ├── Dockerfile # Docker setup for 1B

│ ├── process_documents.py # Persona-driven document analysis

│ └── run_all.bat # Script to run all 1B test cases


Round 1A – PDF Outline Extraction

What It Does

  • Extracts the title from the first page of each PDF
  • Identifies and organizes hierarchical headings (H1, H2, H3)
  • Outputs a structured JSON for each PDF

How to Run (1A)

  1. Place your PDFs inside the 1A/input/ folder.
  2. Run the following command in your terminal run_all.bat

Output will be saved to 1A/output/ as one .json per .pdf.

Round 1B – Persona-Driven Document Intelligence

What It Does

  • Given a persona and a job-to-be-done, the system:
  • Selects the most relevant sections from multiple PDFs
  • Ranks their importance
  • Performs fine-grained sub-section analysis
  • Outputs a structured output.json for each test case

Input Format

Each test case includes:

  • docs/containerX/ # Folder with PDFs
  • input/inputX.json # JSON describing the persona and task
  • Example: docs/container1/ contains relevant PDFs input/input1.json contains the persona and task definition

How to Run (1B)

  1. Place PDF sets in: docs/container1/, docs/container2/, etc. Create matching input JSON files: input/input1.json, input/input2.json, etc.
  2. Run the following command in your terminal run_all.bat

Output will be saved in: output/output1.json, output/output2.json


Model Constraints

✅ CPU-only execution

✅ Model size < 1GB

✅ No internet access during execution

✅ Should process 3–5 documents in under 60 seconds

Docker Information

  • Docker ensures platform-independent, reproducible builds.
  • network none enforces offline execution
  • To Build Docker Image Manually docker build --platform linux/amd64 -t pdf-processor .

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 73.2%
  • Batchfile 18.2%
  • Dockerfile 8.6%