feat: add AddImageFromBytes API for in-memory image insertion#30
feat: add AddImageFromBytes API for in-memory image insertion#30mmonterroca wants to merge 4 commits intomasterfrom
Conversation
Add support for adding images from raw byte data without requiring file system access. This addresses the use case where users have limited access to a file system (issue #29). New public API methods on ParagraphBuilder: - AddImageFromBytes(data, format) - AddImageFromBytesWithSize(data, format, size) - AddImageFromBytesWithPosition(data, format, size, pos) New domain.Paragraph interface methods: - AddImageFromBytes(data, format) - AddImageFromBytesWithSize(data, format, size) - AddImageFromBytesWithPosition(data, format, size, pos) New internal constructors: - core.NewImageFromBytes(id, data, format) - core.NewImageFromBytesWithSize(id, data, format, size) - core.NewImageFromBytesWithPosition(id, data, format, size, pos) Includes unit tests and updated example 08_images. Closes #29
…around The CLI handler previously wrote base64 image data to a temporary file and then called AddImage(path). Now it uses AddImageFromBytes directly, eliminating unnecessary disk I/O and temp file cleanup.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce026b2b52
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
internal/core/image.go
Outdated
| if format == "" { | ||
| return nil, errors.InvalidArgument("NewImageFromBytes", "format", format, "image format is required") | ||
| } |
There was a problem hiding this comment.
Validate format for AddImageFromBytes inputs
NewImageFromBytes only checks that format is non-empty, so any arbitrary value (for example "jepg") is accepted and flows into package generation. Because MediaManager.detectContentType falls back to application/octet-stream for unknown extensions, this produces DOCX media entries with unrecognized content types instead of failing fast. This is a regression for base64 CLI images now routed through this path (applyImage), where invalid formats previously errored via AddImage/detectImageFormat; please reject unsupported formats (or normalize/validate against known constants) before creating the image.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces the ability to add images directly from byte data, bypassing the need for temporary files when handling base64-encoded images. It adds several new methods to the Paragraph interface and ParagraphBuilder, along with corresponding implementations, unit tests, and an example. The review feedback suggests improving error reporting consistency by passing the invalid value to the error constructor and using a setter method for image positioning instead of direct field access via type assertion.
internal/core/image.go
Outdated
| // NewImageFromBytes creates a new image from raw byte data. | ||
| func NewImageFromBytes(id string, data []byte, format domain.ImageFormat) (domain.Image, error) { | ||
| if len(data) == 0 { | ||
| return nil, errors.InvalidArgument("NewImageFromBytes", "data", nil, "image data cannot be empty") |
There was a problem hiding this comment.
The third argument to errors.InvalidArgument should ideally be the invalid value itself for better error reporting, consistent with the check for format below.
| return nil, errors.InvalidArgument("NewImageFromBytes", "data", nil, "image data cannot be empty") | |
| return nil, errors.InvalidArgument("NewImageFromBytes", "data", data, "image data cannot be empty") |
| docxImg := img.(*docxImage) | ||
| docxImg.position = pos |
There was a problem hiding this comment.
While direct field access via type assertion is consistent with existing code in this file, using the SetPosition method (if available on the interface) or checking the error from the setter would be more robust.
| docxImg := img.(*docxImage) | |
| docxImg.position = pos | |
| if err := img.SetPosition(pos); err != nil { | |
| return nil, err | |
| } |
There was a problem hiding this comment.
Pull request overview
Adds first-class support for embedding images from in-memory byte slices (no filesystem required), and updates the CLI to avoid temporary files when inserting base64 images—addressing the use case in #29.
Changes:
- Introduces
AddImageFromBytes*APIs ondomain.ParagraphandParagraphBuilder. - Adds
core.NewImageFromBytes*constructors plus unit tests for these constructors. - Optimizes CLI base64 image handling to call
AddImageFromBytes*directly (no temp file I/O), and updates the images example.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
internal/core/paragraph.go |
Implements paragraph-level AddImageFromBytes* methods and hooks them into image attachment. |
internal/core/image.go |
Adds new in-memory image constructors (NewImageFromBytes*). |
internal/core/image_test.go |
Adds unit tests covering the new constructors (valid bytes + error cases). |
domain/paragraph.go |
Extends the public Paragraph interface with AddImageFromBytes*. |
builder.go |
Adds builder chaining methods for AddImageFromBytes*. |
builder_test.go |
Adds builder tests for error propagation on empty image data. |
cmd/docxgo/handlers.go |
Updates CLI image handler to embed base64 images directly from bytes. |
examples/08_images/main.go |
Demonstrates creating/embedding a PNG entirely in memory using the new API. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func NewImageFromBytes(id string, data []byte, format domain.ImageFormat) (domain.Image, error) { | ||
| if len(data) == 0 { | ||
| return nil, errors.InvalidArgument("NewImageFromBytes", "data", nil, "image data cannot be empty") | ||
| } | ||
| if format == "" { | ||
| return nil, errors.InvalidArgument("NewImageFromBytes", "format", format, "image format is required") | ||
| } | ||
|
|
There was a problem hiding this comment.
NewImageFromBytes only checks format != "" and then uses it to build the file extension/target. This allows unsupported/invalid formats (e.g. "xyz", ".png", uppercase) to flow through and produce media parts with application/octet-stream (or malformed names), which can lead to unreadable DOCX files. Consider normalizing (strings.ToLower, trim leading '.') and validating against the supported set (png/jpeg/gif/bmp/tiff/svg/webp, etc.), similar to detectImageFormat, and return InvalidArgument for unknown formats.
| copyData := make([]byte, len(data)) | ||
| copy(copyData, data) | ||
|
|
||
| target := fmt.Sprintf("media/image%s.%s", id, format) | ||
|
|
||
| return &docxImage{ | ||
| id: id, | ||
| format: format, | ||
| size: size, | ||
| originalSize: size, | ||
| data: copyData, | ||
| target: target, | ||
| description: "", | ||
| position: domain.DefaultImagePosition(), | ||
| }, nil |
There was a problem hiding this comment.
NewImageFromBytes clones data into copyData, but (*docxImage).Data() also returns a cloned slice. In the common flow (Paragraph.attachImage calls img.Data() and MediaManager.Add stores the passed slice), this results in an extra full-image copy vs ReadImageFromReader/NewImage and can double peak memory for large in-memory images. Consider adopting a single-copy strategy (e.g., store data directly like ReadImageFromReader does, or add an internal/raw accessor used by core/manager code paths) while still keeping external immutability guarantees.
| size, err := getImageDimensions(data) | ||
| if err != nil { | ||
| return nil, errors.Wrap(err, "NewImageFromBytes") | ||
| } |
There was a problem hiding this comment.
NewImageFromBytes calls getImageDimensions(data), which currently routes through strings.NewReader(string(data)) and retries decode; this forces an extra allocation proportional to image size and does unnecessary work for byte slices. Since this API is specifically for in-memory images, consider updating dimension detection to use bytes.NewReader(data) (and a single decode path) to avoid large allocations and improve performance.
| format := domain.ImageFormat(img.Format) | ||
| if format == "" { | ||
| format = domain.ImageFormatPNG | ||
| } |
There was a problem hiding this comment.
applyImage now accepts any non-empty img.Format string for base64 images and passes it to AddImageFromBytes*. Previously, unsupported formats would fail via NewImage()/detectImageFormat; now an invalid format like "xyz" will silently embed with an unknown extension/content type (likely producing a corrupted/unopenable DOCX). Please normalize/validate the format (lowercase, map jpg/jpeg, reject unknown) before calling the new APIs (or let AddImageFromBytes return a clear error).
internal/core/paragraph.go
Outdated
| if err := p.attachImage(img, fmt.Sprintf("image%s.%s", id, format)); err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
These methods build sourceName from the input format (not from the created img). If NewImageFromBytes starts normalizing/validating formats (recommended), this can drift (e.g., input "JPG" normalized to "jpeg" but sourceName still uses "JPG"). Consider deriving the extension from img.Format() (or otherwise using the normalized value returned/used by the constructor) when calling attachImage, to ensure media naming/content type stays consistent.
- Validate format against known image formats, reject unsupported values - Normalize format (lowercase, trim dots, map jpg->jpeg, tiff->tif) - Extract shared normalizeImageFormat() used by both detectImageFormat and NewImageFromBytes - Fix getImageDimensions to use bytes.NewReader instead of strings.NewReader(string(data)) to avoid unnecessary allocation - Remove double copy in NewImageFromBytes (Data() already returns copy) - Use img.Format() for sourceName in paragraph methods to stay consistent after normalization - Fix error arg: pass data instead of nil in InvalidArgument - Add tests for invalid format rejection and format normalization
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/core/paragraph.go
Outdated
| return nil, errors.Wrap(err, "Paragraph.AddImageFromBytesWithPosition") | ||
| } | ||
|
|
||
| if err := p.attachImage(img, fmt.Sprintf("image%s.%s", id, format)); err != nil { |
There was a problem hiding this comment.
In AddImageFromBytesWithPosition, the media filename is built using the raw format argument. This can diverge from the image’s normalized format (img.Format()), leading to inconsistent extensions (e.g., "JPG" vs "jpeg") or even double-dots if callers pass a leading ".". Use the created image’s format (or the normalized value returned by the constructor) when generating sourceName to keep content-type/extension consistent across the three AddImageFromBytes* methods.
| if err := p.attachImage(img, fmt.Sprintf("image%s.%s", id, format)); err != nil { | |
| if err := p.attachImage(img, fmt.Sprintf("image%s.%s", id, img.Format())); err != nil { |
| return &docxImage{ | ||
| id: id, | ||
| format: normalized, | ||
| size: size, | ||
| originalSize: size, | ||
| data: data, | ||
| target: target, | ||
| description: "", |
There was a problem hiding this comment.
NewImageFromBytes stores the caller-provided data slice directly in the docxImage. Because slices are references, callers can mutate the underlying bytes after insertion, which can corrupt the embedded image (and create data races if reused concurrently). Make a defensive copy of data (as NewImageFromPackage does) before storing it.
| t.Run("normalizes jpg to jpeg", func(t *testing.T) { | ||
| // Use PNG data but verify format normalization logic | ||
| data := createTestImageBytes(t, 50, 50) | ||
| img, err := NewImageFromBytes("img12d", data, "JPG") | ||
| if err != nil { | ||
| t.Fatalf("NewImageFromBytes() error = %v", err) | ||
| } | ||
| if img.Format() != domain.ImageFormatJPEG { | ||
| t.Errorf("Format() = %v, want jpeg", img.Format()) | ||
| } |
There was a problem hiding this comment.
The "normalizes jpg to jpeg" test passes PNG-encoded bytes while declaring the format as JPG. This can mask real-world failures because the produced DOCX would be labeled as JPEG even though the payload is PNG. Generate actual JPEG bytes for this case (or change the assertion to only test string normalization without constructing an image from real bytes).
- Restore defensive copy of data in NewImageFromBytes to prevent caller mutation (consistent with NewImageFromPackage) - Fix AddImageFromBytesWithPosition to use img.Format() instead of raw format parameter for sourceName consistency - Fix jpg normalization test to use actual JPEG-encoded bytes instead of PNG bytes with JPG format declaration
Summary
Add support for adding images from raw byte data without requiring file system access, addressing the use case described in #29.
Changes
New public API (
ParagraphBuilder)AddImageFromBytes(data []byte, format ImageFormat)— inline image from bytesAddImageFromBytesWithSize(data, format, size)— with custom dimensionsAddImageFromBytesWithPosition(data, format, size, pos)— floating with positioningNew domain interface methods (
Paragraph)AddImageFromBytes,AddImageFromBytesWithSize,AddImageFromBytesWithPositionInternal constructors (
core)NewImageFromBytes,NewImageFromBytesWithSize,NewImageFromBytesWithPositionCLI handler optimization
applyImage()now usesAddImageFromBytesdirectly for base64 images instead of writing to a temp file and callingAddImage(path). Eliminates unnecessary disk I/O.Tests & examples
examples/08_imageswith a new section demonstrating in-memory image creationUsage
Closes #29