Persist zones#550
Conversation
bal-e
left a comment
There was a problem hiding this comment.
@ximon18 and I had a discussion while I was writing this review, and decided that I will add commits changing the PR to implement a new approach for the zone data storage state machine. I've left these comments here for us to come back once I'm done; I also didn't review everything, so there will be more to come.
|
|
||
| /// The index of the current loaded instance. | ||
| pub(super) curr_loaded_index: bool, | ||
|
|
||
| /// The index of the current signed instance. | ||
| pub(super) curr_signed_index: bool, |
There was a problem hiding this comment.
Are these fields necessary, if this state is only used at the very beginning (where they would both be 0/false)?
| pub(super) curr_signed_index: bool, | ||
| } | ||
|
|
||
| pub struct UninitializedStorage { |
There was a problem hiding this comment.
This type (and RestoringPersistedStorage below) need documentation, similar to the other types in this file. I'm happy to write it by adding to the PR.
| assert!( | ||
| old_reviewer.loaded_index == self.curr_loaded_index, | ||
| "'old_reviewer' does not point to the current instance", | ||
| "'old_reviewer' does not point to the current instance: {} vs:{}", |
| let viewer = unsafe { ZoneViewer::new(self.data.clone(), true, true) }; | ||
|
|
||
| let signed = unsafe { &*self.data.signed[0].get() }; | ||
| let serial = signed.soa.as_ref().unwrap().rdata.serial; |
There was a problem hiding this comment.
I don't think the serial number should be extracted here. Perhaps we need methods on LoadedZoneBuilt and SignedZoneBuilt that give you readers so you can observe the data, or just getters for the serial?
| assert!( | ||
| Arc::ptr_eq(&signed_built.data, &self.data), | ||
| "'built' is for a different zone" | ||
| ); |
There was a problem hiding this comment.
You should also check loaded_built. And the assert message should clarify which parameter it is referring to.
| let (transition, state) = if let ZoneDataStorage::Uninitialized(s) = state { | ||
| let s = s.initialize(); | ||
| t.move_to(ZoneDataStorage::Passive(s)); | ||
| transition(&mut self.state.storage.machine) | ||
| } else { | ||
| (t, state) | ||
| }; |
There was a problem hiding this comment.
I'd prefer the above commented-out initialize() function over this silent automatic conversion. Especially since it should only occur at startup and we can handle it outside this function.
There was a problem hiding this comment.
With the current setup, we might actually want to do this check in other places, but it is hard to tell.
| let (s, loaded_reviewer, signed_reviewer, viewer, serial) = | ||
| s.finish(loaded_built, signed_built); | ||
| self.state.storage.loaded_reviewer = loaded_reviewer; | ||
| self.state.storage.signed_reviewer = signed_reviewer; |
There was a problem hiding this comment.
This code does not correctly handle the reviewer types. They should be passed back into the storage state machine, so it can guarantee that viewers only exist for the permitted 0/1 instances. We should discuss what the right adjustments to the storage state machine's flow are.
| /// The zone is in an empty initial uninitialized state pending possible | ||
| /// restoration from persisted state or initialization directly to an | ||
| /// empty passive state. | ||
| Uninitialized(UninitializedStorage), |
There was a problem hiding this comment.
It might be better to omit this state entirely and only have a Restoring state, which can then decide to proceed with or skip restoring. Restoring might already need that functionality, e.g. if the previously persisted instances could not be found.
| Uninitialized(UninitializedStorage), | ||
|
|
||
| /// The zone is being restored from persisted state. | ||
| Restoring(RestoringPersistedStorage), |
There was a problem hiding this comment.
The enum variant name should match the underlying name, for consistency. I think the underlying type should be RestoringStorage.
| @@ -86,9 +94,6 @@ pub enum ZoneDataStorage { | |||
| impl ZoneDataStorage { | |||
| /// Construct a new [`ZoneDataStorage`]. | |||
| pub fn new() -> (Self, LoadedZoneReviewer, SignedZoneReviewer, ZoneViewer) { | |||
There was a problem hiding this comment.
Perhaps we want to change this function so it no longer returns reviewers? That might be easier all around.
Will need changing based on PR #589.
Currently fails due to an unrelated TSIG bug causing dig to fail.
- Merges in the ixfr-out and full-signer-update-state branches as these were needed for local testing, but should be obsoleted / synced with main when PRs #631 ("Save last serial and key tags in zone state")and #605 ("Add IXFR out support") get merged. - Extends the persist-restore system test to cover more cases. Should perhaps be split out into separate smaller tests. - Actually adds restored diffs to the zone storage to be served by IXFR out. Will need updating to match the changes/fixes that have since been made in PR #605. - Fixes an issue where persisted diffs from multiple SOA serials, e.g. 1..2..3 would be condensed on restore so that only a single IXFR diff from 1..3 would be available instead of two diffs from 1..2 and 2..3 being available. - Clears the set of known persisted data file paths for a zone if any of those files are missing or cannot be parsed during restoration.
As step.if doesn't seem to be honoured by nektos/act.
Not via TCP but via single SOA response, per RFC 1995. Also adds a system test using dig.
The signed diff is not available when the loaded diff is restored but should be stored with the corresponding loaded diff, so make the signed diff an Option to be made Some as soon as it is restored.
- Don't hold the zone state lock while restoring as it blocks the zone status command. - Log restoring started at INFO level as well as restoring complete. - Only log restoring complete if restoring actually happened. - Log the zone being restored at INFO level, not just TRACE level.
Otherwise zone status of a restored zone doesn't show that it is published.
…een extracted to PR #650.
Validated with the ixfr-out system test from PR #650.
Status
Zone edits (e.g. via file edit and
zone reloador due to changes received via XFR) are persisted both for loaded and signed zones. On application restart persisted zones are restored andcascade zone statusanddig AXFRappear to work as they did prior to application shutdown.The code has been initially reviewed and led to some wanted changes:
zonedatacrate interface to enable moving some of the changes to Cascade itself.save_now()instead ofmark_dirty(). This is because persisting zone data to disk not enough, as if there was a power outage between callingmark_dirty()and publication of the zone the paths to the persisted files (that are recorded in state) would not have been written to disk, thussave_now()should be invoked instead. We may actually want to invokesave_now()in the publication server itself to ensure all published zone related state is persisted, but that is out of scope for this PR. Update: It's not clear to me that this is actually needed.tl;dr
src/persistence/persist.rsto write snapshots and deltas to disk (currently in DNS AXFR/IXFR wire format).src/persistence/restore.rsto restore zones by loading the zone snapshot into a zone replacer and passing the zone diffs one at a time to a zone patcher.src/persistence/to store generated/restored diffs for use by IXFR out.ZoneStatewith two vecs of paths to persisted snapshot and delta files.cascade zone statusoutput.Known issues:
Alternative snapshot/delta storage formats
A choice would need to be driven by requirements:
Some ideas:
DiffDatato JSON/CBOR/whatever. Would tie the format to the internal format, changing the internal format would require writing additional code to handle storage format migration or support parsing old as well as new format. A PRO of this approach is being able to storage additional meta data with the snapshot/delta, if needed.Cargo.*,crates/,etc/,integration-tests/,src/):actthrough theact-wrapper(as described inTESTING.md)?