Add --output-file flag for writing values to file#3410
Conversation
This allows to write the output to a file and introduces the options -o and --output-file that take a filename as parameter. When not specifying -o, stdout will be used for compatibility. This will be helpful when calling jq inside a docker context as it means jq will not have to be called from within a shell with output redirection.
To enable instantiating multiple VMs, instead pass ofile when needed. There is only one use of jv_dump outside of tests, bytecode dumps, and tracing; pass ofile to it.
c81d1e3 to
7bf1484
Compare
When both --binary is given before --output-file, the file wasn't marked as binary. When checking whether stdout is a TTY, it didn't handle --output-file.
b0bfb27 to
7f6e10f
Compare
|
Well that was a lot of back and forth with CI to identify the problem with Windows 😅. I'm confident in this now. |
|
How can this be progressed? |
|
@wader Any more thoughts on this? |
| mkdir $d/dir | ||
| ! $JQ -o $d/dir -n . 2> $d/err | ||
| grep "jq: Could not open --output-file .*/dir" $d/err > /dev/null | ||
| rmdir $d/dir |
There was a problem hiding this comment.
is there any concerns to think about if output file is one of the input files? will get truncated before reading? streaming mode?
There was a problem hiding this comment.
I don't think we can give a good experience for this, as the input files are opened after the output file is opened. Perhaps there's some way the file descriptors can be opened with copy-on-write semantics, so the input can still read the existing contents?
I should write a test for this.
There was a problem hiding this comment.
I think that opening the file for reading, then opening it again for writing and truncating it, will allow the read fd to read the original contents. But we open them in the other order and inputs are opened lazily, so I don't know how to do this without eagerly opening all inputs, since they could all alias with sym/hardlinks. I'm skeptical it's worth it. Thoughts?
There was a problem hiding this comment.
Yeap seems messy :( tiny bit worried someone will try to use this as a "--in-place" workaround and be disappointed :)
There was a problem hiding this comment.
Let's just add --in-place. I wouldn't mind implementing that. No need to complicate this so much for a mistake.
There was a problem hiding this comment.
What about storing the output file path and lazily opening the file when the full output is ready to be written?
There was a problem hiding this comment.
What about storing the output file path and lazily opening the file when the full output is ready to be written?
This approach would mean all output must be kept in a buffer before it can be written. As such, streaming becomes impossible and the amount of data to be written is constrained by the amount of memory of the machine.I do not think there will be a great approach here. I can see these options
- do not stream as @Pandapip1 has described (this has the above consequence)
- write to a temporary file instead and rename the file after opening the last input file (yikes)
- open all input files right at the start before opening the output as @thaliaarchi has described
I am not sure how much work it would be. also there is the workaround of sorting the input files in case writing to one of these is desired. It is an edge case. Maybe documenting it for now could be good enough?
Edit: In case this did not come through, I share @thaliaarchi s scepticism.
There was a problem hiding this comment.
Yes, this is a big deal. In general I expect tools that support -o FILE to also support using the same file as an input. Think of sed -i -- -i means "in-place". This is very tricky stuff to get right.
Sorry i should i reviewed this earlier, looks overall good 👍 had some questions. Would be good if some more maintainer could have a look. |
| options |= PROVIDE_NULL; | ||
| } else if (isoption(&text, 'f', "from-file", is_short)) { | ||
| options |= FROM_FILE; | ||
| } else if (isoption(&text, 'o', "output-file", is_short)) { |
There was a problem hiding this comment.
what should happen if output file arg is used more than once?
There was a problem hiding this comment.
The file from the first flag is opened, then closed when the second flag is parsed. I think this is fine.
There was a problem hiding this comment.
No. Each file will be truncated, and only the last one will be written to (after being truncated). This seems like a footgun.
There was a problem hiding this comment.
What should the user interface be in this case? Should jq -o a -o b . inputfile be allowed?
Writing multiple -o options feels odd to me.
In any case, it seems to me that the fopen/fclose bit needs to move to after all options have been parsed into the proximity of line 565 - because even if it should be allowed to specify -o only once, the code could catch a violation of that rule only after the first output file has already been truncated, which would potentially truncate inputs.
In case a temporary file is used as output to be renamed afterwards, not moving the fopen() till after parsing all options would leave temporary files in the file system, so I think the fopen needs to move in all cases and I had not considered that in my original commit.
|
I'll add a test for the same file used in |
4383345 to
c102d05
Compare
|
I've added those additional tests. |
wader
left a comment
There was a problem hiding this comment.
Looks good too me. Hope some more maintainer have thought about this
|
|
||
| Write output values to the named file instead of standard out. | ||
|
|
||
| The outputs from `--debug-dump-disasm` and `--debug-trace` are |
There was a problem hiding this comment.
Maybe these should always have gone to stderr though, no? I would be for making that change.
|
|
||
| * `-o` / `--output-file filename`: | ||
|
|
||
| Write output values to the named file instead of standard out. |
There was a problem hiding this comment.
What are the rename-into-place or truncate-and-rewrite semantics? I think that needs to be stated clearly, though it's true that many tools that have -o FILE options don't say so clearly.
|
We had some lengthy discussion of Basically, there are footguns here, and not having a Here's my take:
|
|
@wader thanks for reviewing this and pinging me. @thaliaarchi I think this is close, but rename into place semantics will be much safer, and should be a simple change. |
using a temporary file has pitfalls too. This write-up is for java but the principle applies here too https://www.javathinking.com/blog/what-is-a-safe-way-to-create-a-temp-file-in-java/
|
Since you're not trying to lock prior to truncation you just give up on "names that cannot be guessed" and then you don't have to worry about "deletion on gracefule exit and application crashes". Using |
$ # what should happen here:
$ jq -f p -o a a
$ # ?
$ # Answer (a): a ends up empty
$ # Answer (b): a ends up having a transformation of its original content$ # what should happen here:
$ jq -f p -o a a a
$ # ?
$ # Answer (a): a ends up empty
$ # Answer (b): a ends up having twice the transformation of its original content$ # How is this:
$ jq -f p -o b a
$ # better than
$ jq -f p a > b
$ # ?
$ # Answer: well, if `set -o noclobber` were in effect then
$ # this allows one to avoid having to unset it,
$ # but then why set it?On Linux but Yes, using Notice that
I guess the familiarity reason is the reason several PRs have been opened for this feature since the Why does |
|
I forgot to ask earlier: $ # What is the intent here:
$ jq -f p -o a -o b c
$ # ?
$ # Answer (a): `a` gets truncated, `b` gets truncated, `a` ends up
$ # empty, `b` ends up with a transformation of `c`
$ # Answer (b): `a` and `b` end up with the same content (a
$ # transformation of `c`)
$ # Answer (c): this should be an error (and `a` gets truncated)!
$ # Answer (d): this should be an error (and neither `a` nor `b` gets truncated)!
$ # Answer (e): something else?Currently the answer would be (a). This is bound to be surprising to some users. Though: $ date >a >b
$ file a b
a: empty
b: ASCII textbut that brings me back to asking: how this is better than the shell's I/O redirection? And, ah, right: one way in which Why add I'm not asking that to annoy you. I'm genuinely curious. My guess is: familiarity. Familiarity is a reasonable and sufficient reason for adding |
|
Also, as noted by @wader earlier,
But then I notice that it says singular "input" and "output": IMO we should prefer to add |
|
Ok, thinking about it, here's my current take:
and:
@thaliaarchi @christf @wader thoughts? |
|
FYI, doing some research on
|
|
Using a well-known temp file naming scheme would be a problem if the file is in a world-writable directory. So it'd have to be |
It is a great question and I am not at all annoyed by it. #3367 contains the gist The motivation is to use the official jq docker image as defined in https://github.com/jqlang/jq/blob/master/Dockerfile, in a Tekton-pipeline (a continuous integration tool that runs on kubernetes. Its pipelines are described in steps that are executed in a container each. When there is data to be passed between steps, this happens via files in an input or output area). As best practice dictates, the jq image published by this project is minimal. This means that in the tekton use-case, there is "just jq" and no shell available. And of course, familiarity is a factor: Many tools support -o, so it feels natural to me if jq does so too. I may be overlooking something but to me it looks the most elegant approach for this use case. Edit: --in-place will not help in the above use case, as the input area and the output area are separated (they are separate workspaces/mounts). |
Got it. This is basically the shell-less spawn/exec argument I mentioned. We should definitely add this then. To me this is a much stronger argument than the familiarity argument. So now let's settle the semantics of |
|
my two cents: If there is a case that requires using temporary files (I remember acutely having said "yikes" to temp files above. I still do not like them. They increase memory usage in the general case just to allow implementing a corner case correctly and leaving them around after the application is killed, is a problem. I disagree that we can assert jq is never being killed. jq could be used to handle very large data for example and the OOMkiller might come in. I think jq should try to clean up in every case but SIGKILL, this means anothet increase of the code base.), we might as well lean on them to implement the feature consistently. Would reverting 77dcaf3 be bad? It certainly was right to commit this at the time when it wasn't used. |
Right! Every way I can think of to avoid needing a temp file ends up with some obnoxious pathology. Imagine we tried to avoid a temp file when the I believe we can't avoid a temp file. Like you I'd rather avoid that. We could have
It's fine to revert it. |
|
Ok, so here's my proposal:
EDIT: I've got Claude writing a commit for this. We'll see how it goes. |
This reverts commit 77dcaf3.
|
@christf @thaliaarchi I've pushed two commits, 5be80fe which restores Please review! |
84fc28e to
ce5df61
Compare
Write output to a temporary file in the same directory as the destination, then rename into place on success. This makes it safe to use an input file as the output file (e.g., jq '.x += 1' f -o f). The predictable name .jq-<base>.tmp is tried first with O_EXCL; on EEXIST we fall back to mkstemp(). Signal handlers (SIGABRT, SIGHUP, SIGINT, SIGPIPE, SIGQUIT, SIGTERM) and atexit() clean up the temp file on abnormal exit. Add -O/--clobber-output for the old immediate-truncate behavior. Defer OpenBSD pledge() to after option parsing so the promise set can include "cpath" when -o is used. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ce5df61 to
78314f5
Compare
|
You can see why 13 years ago I didn't care for |
|
@thaliaarchi any particular reason this PR targets the 1.8.1 branch? |
I'm not sure where you're seeing this. The merge base is |
|
I notice you kept the behavior of my version, renamed as -O/--clobber-output. With the temp-then-rename approach, this doesn't seem very useful to keep, and the claim you've documented of it being faster isn't very strong to me, as it's only a few extra fs operations. Do you have a use case in mind? It might be useful so another tool can read the output file while jq is writing, but redirection can do that and I doubt you'd have the Docker/other constraints and want to do this. |
|
What is the reason for keeping logic to have a predictable and an unpredictable name? |
I am happy to test my use case with both and will come back. |
The "faster" claim is Claude's. Oops. But as to use-case and utility, |
Just a thought I had, that whenever the |
both (-o and -O) work for my use case with tekton |
|
Should we delay the actual opening of the output file until after parsing all options to avoid truncating multiple files? It is a weird corner case but when it occurs, it almost certainly will be annoying for the user. |
|
I'd rather remove -O/--clobber-output, keeping only temp-then-rename -o/--output-file. |
I agree. Jq should decide for one of the implementations to keep code base maintainable. Since there are inherent issues with -O, choosing -o seems prudent. |
|
Hi everyone, jaq maintainer here. :) Just to give a bit more context: I think that if |

This is a further development on @christf's #3367 (feat: allow writing to output file), which handles some edge cases not considered.
Other uses of stdout, such as tests, fuzzers, disassembly, and debug traces, remain directed to stdout; only JSON values are written to the output file. Consequently, users of
--debug-dump-disasmand--debug-tracecan now separate the debug and JSON outputs. On Windows, the output file is marked as binary when--binaryis passed.This incorporates the commit from #3367, but with minor fixes squashed in.
Fixes #2418
Closes #3367