-
Notifications
You must be signed in to change notification settings - Fork 423
Clarify the state of an instance after a failed update or delete #570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
c6d21bf
0ce0e67
0c1308a
a8bc00f
550c6e5
549b855
546cd23
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,7 @@ | |
| - [Fetching a Service Binding](#fetching-a-service-binding) | ||
| - [Unbinding](#unbinding) | ||
| - [Deprovisioning](#deprovisioning) | ||
| - [Corrupt Service Instances](#corrupt-service-instances) | ||
| - [Orphans](#orphans) | ||
|
|
||
| ## API Overview | ||
|
|
@@ -782,6 +783,7 @@ For success responses, the following fields are defined: | |
| | --- | --- | --- | | ||
| | state* | string | Valid values are `in progress`, `succeeded`, and `failed`. While `"state": "in progress"`, the Platform SHOULD continue polling. A response with `"state": "succeeded"` or `"state": "failed"` MUST cause the Platform to cease polling. | | ||
| | description | string | A user-facing message that can be used to tell the user details about the status of the operation. If present, MUST be a non-empty string. | | ||
| | instance_corrupt | boolean | For failed update and deprovisioning operations, this field indicates whether the instance is still usable or not. If the value is `true`, the Service Instance MUST be considered corrupt and the Platform SHOULD NOT allow the creation of new bindings. If the value is `false`, the Service Instance is in an unmodified and usable state. If not specified, the state of the resource is unspecified by this specification. | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there invalid combinations of Can I update a corrupt instance?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know this might sound kind of odd, but I thought I'd toss it out there... this "instance_corrupt" field is sort of the equivalent of returning a 5xx status code in the synchronous case. However, its not as useful because in the state=failed+instance_corrupt=false case we don't know whether its the equivalent of a 400 or 402. All we have is the description text, which is nice for humans but computers can't do a lot with it if its trying to make some decision based on 400 vs 402. The same applies to 5xx error, we don't know which one it is - if that's meaningful. Would it make more sense to make our sync and async flows more symmetrical and have a "status_code" field instead that's set to the HTTP status code that we would have returned if the flow was synchronous? Then we don't need to have special spec language to deal with two possible ways of getting back the status code from the op. The Platform side of it would be easier too because then there's just one type of field to check regardless of sync vs async.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (I assume you mean 422, not 402.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "instance_corrupt" for failed update concerns me a lot. The expectations of the failed update is that the state of resource will be rolled back to the previous successful state. |
||
|
|
||
| \* Fields with an asterisk are REQUIRED. | ||
|
|
||
|
|
@@ -1190,9 +1192,13 @@ $ curl http://username:password@service-broker-url/v2/service_instances/:instanc | |
| | 400 Bad Request | MUST be returned if the request is malformed or missing mandatory data. | | ||
| | 422 Unprocessable entity | MUST be returned if the requested change is not supported or if the request cannot currently be fulfilled due to the state of the Service Instance (e.g. Service Instance utilization is over the quota of the requested plan). Additionally, a `422 Unprocessable Entity` MUST be returned if the Service Broker only supports asynchronous update for the requested plan and the request did not include `?accepts_incomplete=true`; in this case the response body MUST contain a error code `"AsyncRequired"` (see [Service Broker Errors](#service-broker-errors)). The error response MAY include a helpful error message in the `description` field such as `"This Service Plan requires client support for asynchronous service operations."`. | | ||
|
|
||
| Responses with any other status code MUST be interpreted as a failure. | ||
| When the response includes a 4xx status code, the Service Broker MUST NOT | ||
| apply any of the requested changes to the Service Instance. | ||
| apply any of the requested changes to the Service Instance and the | ||
| Service Instance MUST be in an unmodified and usable state. | ||
|
|
||
| Responses with any other status code MUST be interpreted as a failure. | ||
| The Service Instance MUST be considered corrupt and the Platform SHOULD NOT | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should have a section on corrupt instances that explains:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This "Responses with any other status code..." is kind of worrisome. If it said "Any other 5xx error..." then I agree. But what about some 3xx status code? If people agree I can open a new PR to address this since this isn't really part of this PR
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fmui did you want to address these comments?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fmui ^^
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually, does "any other" mean non-4xx or does it mean "not mentioned above in the table or previous paragraph" ? I think we need to clarify this.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I worry that if a Service Broker has an intermittent failure, and returns 500s for a period of time, then instances that receive update calls during this time will be marked as corrupt, leaving them unusable. I would prefer the broker be explicit about when an instance is corrupt rather than inferring it. It seems kinda risky.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I completely agree with @Samze. I'm also starting to question how prevalent this problem is in the wild. Do service brokers ever leave instances as corrupt today? Do they allow this to happen or have they built preventative measures?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't this call into question the entire orphan mitigation strategy?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Samze I agree with you, but that would be an incompatible change. The PR just clarifies the current spec, it doesn't change it. The current spec says, a status code 500 must be interpreted as a failure. Following the spec, orphan mitigation must kick in. Depending on the platform, there is shorter or longer period of time between the failure and orphan mitigation. Within this timeframe, the platform must already today consider the service instance as unusable. Otherwise, why should it trigger the orphan mitigation process later? So, the PR just spells out that platform should treat this instance as broken.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mattmcneeney There are brokers that update in multiple steps. If one step fails, they can’t move forwards or backwards and then they wait for orphan mitigation. Sure, they could immediately clean up, but from a platform perspective this is not any different. |
||
| allow the creation of new bindings. | ||
|
|
||
| #### Body | ||
|
|
||
|
|
@@ -1672,8 +1678,13 @@ $ curl 'http://username:password@service-broker-url/v2/service_instances/:instan | |
| | 410 Gone | MUST be returned if the Service Instance does not exist. | | ||
| | 422 Unprocessable Entity | MUST be returned if the Service Broker only supports asynchronous deprovisioning for the requested plan and the request did not include `?accepts_incomplete=true`. The response body MUST contain error code `"AsyncRequired"` (see [Service Broker Errors](#service-broker-errors)). The error response MAY include a helpful error message in the `description` field such as `"This Service Plan requires client support for asynchronous service operations."`. | | ||
|
|
||
| When the response includes a 4xx status code other than 410 Gone, the | ||
| Service Instance MUST be in an unmodified and usable state. | ||
|
|
||
| Responses with any other status code MUST be interpreted as a failure and the | ||
| Platform MUST remember the Service Instance. | ||
| Platform MUST remember the Service Instance. The Service Instance MUST be | ||
| considered corrupt and the Platform SHOULD NOT allow the creation of | ||
| new bindings. | ||
|
|
||
| #### Body | ||
|
|
||
|
|
@@ -1689,6 +1700,18 @@ For success responses, the following fields are defined: | |
| } | ||
| ``` | ||
|
|
||
| ## Corrupt Service Instances | ||
|
|
||
| When an update or delete operation fails, the Service Instance MAY be corrupt. | ||
| A corrupt instance MAY be misconfigured, in an invalid state, not reachable, or | ||
| not working at all. | ||
| Platforms SHOULD NOT try to create bindings for this instance anymore. | ||
| Whether or not a corrupt instance can be repaired by, for example, updating it | ||
| again, is undefined. | ||
| Deprovisioning a corrupt instance SHOULD still be possible. A Platform MUST | ||
| remember the Service Instance until it is successfully deprovisioned or it has | ||
| been cleaned up as an orphan. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It really feels like we need some text here about the Platform performing orphan mitigation. What if we combine #598 into this PR so we get the text just right. |
||
|
|
||
| ## Orphans | ||
|
|
||
| The Platform is the source of truth for Service Instances and Service Bindings. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add
corruptto the state value list?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or a new
instance_state= string field?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What other states came to mind @n3wscott ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can add a new value to "state" because that would break existing implementations that are only expecting values from the list provided - unless we say that there are only 2 impls (CF and svccat). TBH I'm not sure we can really make that claim given I know of at least one other implementation of a platform that's not CF or SVCCAT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattmcneeney
rolled_backSee #508