Skip to content

feat: allow file I/O from s3#590

Draft
dionhaefner wants to merge 1 commit into
mainfrom
dion/s3-inputs
Draft

feat: allow file I/O from s3#590
dionhaefner wants to merge 1 commit into
mainfrom
dion/s3-inputs

Conversation

@dionhaefner

Copy link
Copy Markdown
Contributor

Relevant issue or PR

n/a

Description of changes

InputPath and InputFileReference now allow s3://... urls being passed, which are downloaded to a Tesseract-local tempdir during validation. This makes it so caller and callee don't need to share a common file system to exchange files.

(Not sure what a similar solution would look like for output files, therefore keeping as draft for now.)

Testing done

@codecov

codecov Bot commented May 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.77419% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 77.22%. Comparing base (ed7c578) to head (bea4204).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
tesseract_core/runtime/experimental.py 95.65% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #590      +/-   ##
==========================================
+ Coverage   77.11%   77.22%   +0.10%     
==========================================
  Files          32       32              
  Lines        4488     4514      +26     
  Branches      738      740       +2     
==========================================
+ Hits         3461     3486      +25     
- Misses        724      725       +1     
  Partials      303      303              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PasteurBot

Copy link
Copy Markdown
Contributor

Benchmark Results

Benchmarks use a no-op Tesseract to measure pure framework overhead.

🚀 0 faster, ⚠️ 2 slower, ✅ 34 unchanged

Notable changes

Benchmark Baseline Current Change Status
decoding/base64_1,000 0.025ms 0.028ms +14.7% ⚠️ slower
decoding/binref_10,000,000 7.805ms 8.636ms +10.6% ⚠️ slower
Full results
Benchmark Baseline Current Change Status
api/apply_1,000 0.497ms 0.499ms +0.4%
api/apply_100,000 0.496ms 0.502ms +1.1%
api/apply_10,000,000 0.501ms 0.503ms +0.4%
cli/apply_1,000 1747.882ms 1681.821ms -3.8%
cli/apply_100,000 1780.207ms 1721.223ms -3.3%
cli/apply_10,000,000 1793.258ms 1748.700ms -2.5%
decoding/base64_1,000 0.025ms 0.028ms +14.7% ⚠️ slower
decoding/base64_100,000 0.613ms 0.568ms -7.5%
decoding/base64_10,000,000 64.894ms 65.689ms +1.2%
decoding/binref_1,000 0.185ms 0.186ms +0.7%
decoding/binref_100,000 0.232ms 0.236ms +1.9%
decoding/binref_10,000,000 7.805ms 8.636ms +10.6% ⚠️ slower
decoding/json_1,000 0.098ms 0.099ms +1.2%
decoding/json_100,000 9.280ms 9.306ms +0.3%
decoding/json_10,000,000 1105.922ms 1101.846ms -0.4%
encoding/base64_1,000 0.028ms 0.028ms +1.3%
encoding/base64_100,000 0.144ms 0.146ms +1.4%
encoding/base64_10,000,000 21.686ms 21.180ms -2.3%
encoding/binref_1,000 0.275ms 0.278ms +1.1%
encoding/binref_100,000 0.446ms 0.444ms -0.4%
encoding/binref_10,000,000 16.637ms 16.607ms -0.2%
encoding/json_1,000 0.140ms 0.141ms +0.7%
encoding/json_100,000 14.283ms 14.794ms +3.6%
encoding/json_10,000,000 1492.747ms 1488.241ms -0.3%
http/apply_1,000 2.917ms 2.892ms -0.8%
http/apply_100,000 9.240ms 8.582ms -7.1%
http/apply_10,000,000 668.531ms 670.165ms +0.2%
roundtrip/base64_1,000 0.058ms 0.059ms +1.6%
roundtrip/base64_100,000 0.721ms 0.772ms +6.9%
roundtrip/base64_10,000,000 87.528ms 86.685ms -1.0%
roundtrip/binref_1,000 0.474ms 0.476ms +0.5%
roundtrip/binref_100,000 0.688ms 0.689ms +0.2%
roundtrip/binref_10,000,000 24.799ms 25.570ms +3.1%
roundtrip/json_1,000 0.248ms 0.245ms -0.9%
roundtrip/json_100,000 21.541ms 20.638ms -4.2%
roundtrip/json_10,000,000 2589.194ms 2586.968ms -0.1%
  • Runner: Linux 6.17.0-1010-azure x86_64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants