[#105] - add -o/--outfile option to allow writes to input file#2616
Conversation
|
Hmmm, can you test this: What does that do? Using What I'd expect is that either we first process all the inputs and then truncate and re-write the file, or that we write all the outputs to a temp file and then rename it into place, and of these two I generally prefer the latter because it's atomic. |
|
@itchyny @pkoppstein @owenthereal please do not merge this yet. See questions and comments above. |
|
You're right about "w+", I forgot about that, that was a leftover from earlier tries to do the opening earlier in the process. Re tempfiles: Yeah that sounds good too. Would you have a preference in term of retaining the inode? I mean, writing to temp file and then moving would replace the original file, but writing the contents of the tempfile to the target is double the I/O. I would need a pointer to an example of how to retain/inherit the file attributes, if the file would be overwritten. And would you prefer including a writability test at the start or let it just fail at the end? I would guess the former. Is a fopen/fclose in append mode enough to do that? Or is there a more idiomatic way? |
But if you open it with
You can have atomicity XOR keep the inode. There's no way to keep the inode AND have atomicity. If you try to keep the file identity then any other process that has that file open for reading will get its toes stepped on (i.e., there will be races between reading and writing). On the other hand, changing the file identity is useful for other processes to detect that the file "has changed". That's the problem with "in-place" editing. For something like SQLite3, say, where the file contents is structured in some way that allows racing reads and writes, then you want to do in-place, naturally. But for JSON I don't think this is a good idea. That said, a file-identity-preserving in-place write mode that truncates after the last write would be racy but would leave the file not-corrupted at the end, and it's reasonable to believe that some will want that. But then we'd need to have two in-place update modes. Another issue is that So if we really want to be serious about this we might want:
Now, the
With the rename approach I'd rather let it fail at the end. Windows will need special consideration: we'll need to always open these files in ways that do not preclude renaming new ones into place, which means on Windows we need to use |
This actually works without problem, even with |
|
Regarding the implementation, let me know what you decide on, I'd be happy to implement whatever you think is a good approach. I don't really mind too much about the internals. |
Ah, right, I brain-o'ed. Yes, (Also, I updated my previous reply, FYI.) |
As I've never needed this feature but you do, let's start with: what semantics do you think |
|
My use case semantics are: "Change the value in this file to "x"' which I need quite often in a build/release process, for example, build-specific config files generated by a script. Multiple inputs don't really need to work at all for that. I get that that would be a nice-to-have, for consistency sake, but I'd be happy if jq only supported a single inplace option, which ignores stdin and stdout and file arguments entirely. I am pretty sure that most people that want this option are looking at jq for the exact same use case (based on my all-seeing google attempts ;)) |
Does https://github.com/nicowilliams/inplace not help with the case where you're editing a single file in place? Is it just ugly or icky, or hard to memorize to have to use a second program for this? I do think it's probably hard to memorize, especially when
Right, but I think I'm trying to discover what the right semantics are -- I'm not trying to give you a hard time. It's just that in this particular case there's a lot to think about, and that's really why we've never been keen to do anything about this request, but since you're willing to do the work, maybe we should be willing to take it once we discover what semantics we want.
I bet it would cover 90% of use cases, but be very surprising to 10% of users. Maybe that's good enough, but I'm not ready to reach that conclusion. Certainly if we go with this we'd have to make sure it's well-documented. |
Though, once again one can just script that case. That's another thing: keeping |
|
Another thing is that whatever we do I don't want us to be sad about later w/o being able to change it due to having to maintain backwards compatibility. Now, |
|
BSD sed man on macOS says this about I guess this is the sed inplace implementation on FreeBSD and macOS https://cgit.freebsd.org/src/tree/usr.bin/sed/main.c#n361 Haven't fiddled around much with syscall tracing on macOS but i got dtruss to say this. Look like it uses a temp file and does rename: Hope that helps... and now i also think i understand where jq got its behaviour to treat all input files as one continuous stream :) |
|
NetBSD's |
|
Argh, this page hadn't drawn your comment in when I commented about NetBSD's |
|
This approach of using Even if you delay the opening of the output file at the first output write (lazy open/truncation), it will only either appear to work while actually not working correct for very small input files that If the file is too long for example, There are UNIX programs with a With If If you implement jq -o foo.json -s '.[] | .hello = "foo"' foo.jsonBut it is not a great idea since it will only work if you use The only sensible way to do those type of "inplace editing" things is to write the output to a temporary file, and then mv the temporary file to the path of the input file (on success; otherwise delete the tmp file and fail). This is already easily possible using a bit of scripting: if jq .something file.json > file.json.tmp
then mv file.json.tmp file.json
else rm file.json.tmp
fiOr even using the sponge utility provided by many packages and takes care of redirecting the output to a temporary file, and mving it to file.json when the input closed: jq .something file.json | sponge file.jsonBut I understand we may want to implement a This is just a matter of creating the temporary file, redirecting stdout to that file, and finally either deleting the file or moving the file to the path of the input file based on whether the jq script executed successfully or not. For those who want to try implement this, I will point out that there are some subtle caveats of using a temporary file and renaming it the path of the input that you may want to consider. If say you are root, and you don't own the file you are modifying inplace, the temporary file you create will have a different user/group owner (and maybe also permissions compared to the original file), it will be owned by root, and not the original user, and it may end up being not executable while the original file was executable. If you are not root, but another user that doesn't own the input file, but has read permissions to it, and write permission to the directory in which the file is in, you may not be have permissions to change the ownership of that file to the original owner, and it will be owned by you instead of the original user; GNU sed simply ignores the error in that case, and lets it be that way. |
Exactly. Thanks!
Yes. Exactly how
And not just root. On some systems other users can have this sort of privilege, so don't just check if And not just on Unix-y systems but also on Windows (though perhaps we wouldn't demand that).
+1 |
|
Had a look at how GNU sed does it. Has some chown error fallback logic and also copies ACLs https://git.savannah.gnu.org/cgit/sed.git/tree/sed/execute.c#n667 |
I get that, no worries. In return, I understand all the considerations that are raised, I am not trying to ignore them nor do I think they're invalid. I don't have any skin in the game besides making my own life easier. That's really all it is: convenience (not having to install multiple utilities on any machine that I need it on, not having to think about escaping/scripting when feeding the command to an SSH connection or Based on everything said, I am inclined to conclude that using temp files in the way So let's hash out the following first. I am basing this on the GNU sed implementation, not sure if BSD sed differs in any of the following:
Let me know your thoughts. And feel free to add considerations if you have others. |
|
One reason to want a for in in "${files[@]}"; do
jq "${slurfiles[@]}" "${args[@]}" "$i" > "${i}.tmp"
mv "${i}.tmp}" "$i"
donecan be replaced with this: So I think a [BTW, @stedolan has wanted to make it so no new command-line options are needed. If we ever finish my and @leonid-s-usov's FFI/co-routines branch we could just do all I/O and what not directly in jq-coded library functions using new C-coded built-ins. I think that's a worthy goal, though not directly relevant to this PR because that work is kinda stuck at the moment, but it's worth keeping in mind.] |
NetBSD's |
|
An initial implementation could just refuse to edit-in-place any files not owned by |
|
OK, I'll probably work on a preliminary version tomorrow or the day after. Let's pick it up from there. |
|
Just a little heads-up since I haven't updated on the topic at all: I got occupied with some other stuff and I've solved my immediate problem with a bit of bash scripting. I am still willing to pick this up, but priorities just have shifted a bit. |
|
Is this still worked on? |
|
It was never working. |
|
Any plans to add this feature? |
|
Hi @drm, @nicowilliams, @wader, @emanuele6, @mailsanchu I went ahead and "just did it", see #3488. I've also wanted this functionality for over a decade. #GoAi Please leave any comments on my PR. |
This PR attempts to solve a very long-standing debate on "in-place" editing. The simple solution is to add an outfile option, which can be used to write the resulting output to. The implementation makes sure to delay the writing until after the first input is read.
Will output:
In other (more complicated) situations, however, either all input must be buffered or copied, which is not desirable, or the output should be written to a temporary file, which has it's own limitations. I think, though, for the common use case, this should suffice and is Good Enough(tm). Maybe the caveat on how this works should be more explicitly documented in the 'usage' output.
Happy to help if there's anything wrong with the implementation.