Skip to content

Latest commit

 

History

History
477 lines (375 loc) · 21.1 KB

0193.md

File metadata and controls

477 lines (375 loc) · 21.1 KB
id state created placement
193
approved
2019-07-26
category order
polish
30

Errors

Effective error communication is an important part of designing simple and intuitive APIs. Services returning standardized error responses enable API clients to construct centralized common error handling logic. This common logic simplifies API client applications and eliminates the need for cumbersome custom error handling code.

Guidance

Services must return a google.rpc.Status message when an API error occurs, and must use the canonical error codes defined in google.rpc.Code. More information about the particular codes is available in the gRPC status code documentation.

Error messages should help a reasonably technical user understand and resolve the issue, and should not assume that the user is an expert in your particular API. Additionally, error messages must not assume that the user will know anything about its underlying implementation.

Error messages should be brief but actionable. Any extra information should be provided in the details field. If even more information is necessary, you should provide a link where a reader can get more information or ask questions to help resolve the issue. It is also important to set the right tone when writing messages.

The following sections describe the fields of google.rpc.Status.

Status.message

The message field is a developer-facing, human-readable "debug message" which should be in English. (Localized messages are expressed using a LocalizedMessage within the details field. See LocalizedMessage for more details.) Any dynamic aspects of the message must be included as metadata within the ErrorInfo that appears in details.

The message is considered a problem description. It is intended for developers to understand the problem and is more detailed than ErrorInfo.reason, discussed later.

Messages should use simple descriptive language that is easy to understand (without technical jargon) to clearly state the problem that results in an error, and offer an actionable resolution to it.

For pre-existing (brownfield) APIs which have previously returned errors without machine-readable identifiers, the value of message must remain the same for any given error. For more information, see Changing Error Messages.

Status.code

The code field is the status code, which must be the numeric value of one of the elements of the google.rpc.Code enum.

For example, the value 5 is the numeric value of the NOT_FOUND enum element.

Status.details

The details field allows messages with additional error information to be included in the error response, each packed in a google.protobuf.Any message.

Google defines a set of standard detail payloads for error details, which cover most common needs for API errors. Services should use these standard detail payloads when feasible.

Each type of detail payload must be included at most once. For example, there must not be more than one BadRequest message in the details, but there may be a BadRequest and a PreconditionFailure.

All error responses must include an ErrorInfo within details. This provides machine-readable identifiers so that users can write code against specific aspects of the error.

The following sections describe the most common standard detail payloads.

ErrorInfo

The ErrorInfo message is the primary way to send a machine-readable identifier. Contextual information should be included in metadata in ErrorInfo and must be included if it appears within an error message.

The reason field is a short snake_case description of the cause of the error. Error reasons are unique within a particular domain of errors. The reason must be at most 63 characters and match a regular expression of [A-Z][A-Z0-9_]+[A-Z0-9]. (This is UPPER_SNAKE_CASE, without leading or trailing underscores, and without leading digits.)

The reason should be terse, but meaningful enough for a human reader to understand what the reason refers to.

Good examples:

  • CPU_AVAILABILITY
  • NO_STOCK
  • CHECKED_OUT
  • AVAILABILITY_ERROR

Bad examples:

  • THE_BOOK_YOU_WANT_IS_NOT_AVAILABLE (overly verbose)
  • ERROR (too general)

The domain field is the logical grouping to which the reason belongs. The domain must be a globally unique value, and is typically the name of the service that generated the error, e.g. pubsub.googleapis.com.

The (reason, domain) pair form a machine-readable way of identifying a particular error. Services must use the same (reason, domain) pair for the same error, and must not use the same (reason, domain) pair for logically different errors. The decision about whether two errors are "the same" or not is not always clear, but should generally be considered in terms of the expected action a client might take to resolve them.

The metadata field is a map of key/value pairs providing additional dynamic information as context. Each key within metadata must be at most 64 characters long, and conform to the regular expression [a-z][a-zA-Z0-9-_]+.

Any request-specific information which contributes to the Status.message or LocalizedMessage.message messages must be represented within metadata. This practice is critical so that machine actors do not need to parse error messages to extract information.

For example consider the following message:

An <e2-medium> VM instance with <local-ssd=3,nvidia-t4=2> is currently unavailable in the <us-east1-a> zone. Consider trying your request in the <us-central1-f,us-central1-c> zone(s), which currently has/have capacity to accommodate your request. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation.

The ErrorInfo.metadata map for the same error could be:

  • "zone": "us-east1-a"
  • "vmType": "e2-medium"
  • "attachment": "local-ssd=3,nvidia-t4=2"
  • "zonesWithCapacity": "us-central1-f,us-central1-c"

Additional contextual information that does not appear in an error message may also be included in metadata to allow programmatic use by the client.

The metadata included for any given (reason,domain) pair can evolve over time:

  • New keys may be included
  • All keys that have been included must continue to be included (but may have empty values)

In other words, once a user has observed a given key for a (reason, domain) pair, the service must allow them to rely on it continuing to be present in the future.

The set of keys provided in each (reason, domain) pair is independent from other pairs, but services should aim for consistent key naming. For example, two error reasons within the same domain should not use metadata keys of vmType and virtualMachineType.

LocalizedMessage

google.rpc.LocalizedMessage is used to provide an error message which should be localized to a user-specified locale where possible.

If the Status.message field has a sub-optimal value which cannot be changed due to the constraints in the Changing Error Messages section, LocalizedMessage may be used to provide a better error message even when no user-specified locale is available.

Regardless of how the locale for the message was determined, both the locale and message fields must be populated.

The locale field specifies the locale of the message, following IETF bcp47 (Tags for Identifying Languages). Example values: "en-US", "fr-CH", "es-MX".

The message field contains the localized text itself. This should include a brief description of the error and a call to action to resolve the error. The message should include contextual information to make the message as specific as possible. Any contextual information in the message must be included in ErrorInfo.metadata. See ErrorInfo for more details of how contextual information may be included in a message and the corresponding metadata.

The LocalizedMessage payload should contain the complete resolution to the error. If more information is needed than can reasonably fit in this payload, then additional resolution information must be provided in a Help payload. See the Help section for guidance.

Help

When other textual error messages (in Status.message or LocalizedMessage.message) don't provide the user sufficient context or actionable next steps, or if there are multiple points of failure that need to be considered in troubleshooting, a link to supplemental troubleshooting documentation must be provided in the Help payload.

Provide this information in addition to a clear problem definition and actionable resolution, not as an alternative to them. The linked documentation must clearly relate to the error. If a single page contains information about multiple errors, the ErrorInfo.reason value must be used to narrow down the relevant information.

The description field is a textual description of the linked information. This must be suitable to display to a user as text for a hyperlink. This must be plain text (not HTML, Markdown etc).

Example description value: "Troubleshooting documentation for STOCKOUT errors"

The url field is the URL to link to. This must be an absolute URL, including scheme.

Example url value: "https://cloud.google.com/compute/docs/resource-error"

For publicly-documented services, even those with access controls on actual usage, the linked content must be accessible without authentication.

For privately-documented services, the linked content may require authentication.

Error messages

Textual error messages can be present in both Status.message and LocalizedMessage.message fields. Messages should be succinct but actionable, with request-specific information (such as a resource name or region) providing precise details where appropriate. Any request-specific details must be present in ErrorInfo.metadata.

Changing error messages

Changing the content of Status.message over time must be done carefully, to avoid breaking clients who have previously had to rely on the message for all information. See the rationale section for more details.

For a given RPC:

  • If the RPC has always returned ErrorInfo with machine-readable information, the content of Status.message may change over time. (For example, the API producer may provide a clearer explanation, or more request-specific information.)
  • Otherwise, the content of Status.message must be stable, providing the same text with the same request-specific information. Instead of changing Status.message, the API should include a LocalizedMessage within Status.details.

Even if an RPC has always returned ErrorInfo, the API may keep the existing Status.message stable and add a LocalizedMessage within Status.details.

The content of LocalizedMessage.details may change over time.

Partial errors

APIs should not support partial errors. Partial errors add significant complexity for users, because they usually sidestep the use of error codes, or move those error codes into the response message, where the user must write specialized error handling logic to address the problem.

However, occasionally partial errors are necessary, particularly in bulk operations where it would be hostile to users to fail an entire large request because of a problem with a single entry.

Methods that require partial errors should use long-running operations, and the method should put partial failure information in the metadata message. The errors themselves must still be represented with a google.rpc.Status object.

Permission Denied

If the user does not have permission to access the resource or parent, regardless of whether or not it exists, the service must error with PERMISSION_DENIED (HTTP 403). Permission must be checked prior to checking if the resource or parent exists.

If the user does have proper permission, but the requested resource or parent does not exist, the service must error with NOT_FOUND (HTTP 404).

HTTP/1.1+JSON representation

When clients use HTTP/1.1 as per AIP-127, the error information is returned in the body of the response, as a JSON object. For backward compatibility reasons, this does not map precisely to google.rpc.Status, but contains the same core information. The schema is defined in the following proto:

message Error {
  message Status {
    // The HTTP status code that corresponds to `google.rpc.Status.code`.
    int32 code = 1;
    // This corresponds to `google.rpc.Status.message`.
    string message = 2;
    // This is the enum version for `google.rpc.Status.code`.
    google.rpc.Code status = 4;
    // This corresponds to `google.rpc.Status.details`.
    repeated google.protobuf.Any details = 5;
  }

  Status error = 1;
}

The most important difference is that the code field in the JSON is an HTTP status code, not the direct value of google.rpc.Status.code. For example, a google.rpc.Status message with a code value of 5 would be mapped to an object including the following code-related fields (as well as the message, details etc):

{
  "error": {
    "code": 404,          // The HTTP status code for "not found"
    "status": "NOT_FOUND" // The name in google.rpc.Code for value 5
  }
}

The following JSON shows a fully populated HTTP/1.1+JSON representation of an error response.

{
  "error": {
    "code": 429,
    "message": "The zone 'us-east1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.ErrorInfo",
        "reason": "RESOURCE_AVAILABILITY",
        "domain": "compute.googleapis.com",
        "metadata": {
          "zone": "us-east1-a",
          "vmType": "e2-medium",
          "attachment": "local-ssd=3,nvidia-t4=2",
          "zonesWithCapacity": "us-central1-f,us-central1-c"
        }
      },
      {
        "@type": "type.googleapis.com/google.rpc.LocalizedMessage",
        "locale": "en-US",
        "message": "An <e2-medium> VM instance with <local-ssd=3,nvidia-t4=2> is currently unavailable in the <us-east1-a> zone. Consider trying your request in the <us-central1-f,us-central1-c> zone(s), which currently has/have capacity to accommodate your request. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation."
      },
      {
        "@type": "type.googleapis.com/google.rpc.Help",
        "links": [
          {
            "description": "Additional information on this error",
            "url": "https://cloud.google.com/compute/docs/resource-error"
          }
        ]
      }
    ]
  }
}

Rationale

Requiring ErrorInfo

ErrorInfo is required because it further identifies an error. With only approximately twenty available values for Status.status, it is difficult to disambiguate one error from another across an entire API Service.

Also, error messages often contain dynamic segments that express variable information, so there needs to be machine-readable component of every error response that enables clients to use such information programmatically.

Including LocalizedMessage

LocalizedMessage was selected as the location to present alternate error messages. While LocalizedMessage may use a locale specified in the request, a service may provide a LocalizedMessage even without a user-specified locale, typically to provide a better error message in situations where Status.message cannot be changed. Where the locale is not specified by the user, it should be en-US (US English).

A service may include LocalizedMessage even when the same message is provided in Status.message and when localization into a user-specified locale is not supported. Reasons for this include:

  • An intention to support user-specified localization in the near future, allowing clients to consistently use LocalizedMessage and not change their error-reporting code when the functionality is introduced.
  • Consistency across all RPCs within a service: if some RPCs include LocalizedMessage and some only use Status.message for error messages, clients have to be aware of which RPCs will do what, or implement a fall-back mechanism. Providing LocalizedMessage on all RPCs allows simple and consistent client code to be written.

Updating Status.message

If a client has ever observed an error with Status.message populated (which it always will be) but without ErrorInfo, the developer of that client may well have had to resort to parsing Status.message in order to find out information beyond just what Status.code conveys. That information may be found by matching specific text (e.g. "Connection closed with unknown cause") or by parsing the message to find out metadata values (e.g. a region with insufficient resources). At that point, Status.message is implicitly part of the API contract, so must not be updated - that would be a breaking change. This is one reason for introducing LocalizedMessage into the Status.details.

RPCs which have always included ErrorInfo are in a better position: the contract is then more about the stability of ErrorInfo for any given error. The reason and domain need to be consistent over time, and the metadata provided for any given (reason,domain) can only be expanded. It's still possible that clients could be parsing Status.message instead of using ErrorInfo, but they will always have had a more robust option available to them.

Further reading

  • For which error codes to retry, see [AIP-194][aip-194].
  • For how to retry errors in client libraries, see AIP-4221.

Changelog

  • 2024-10-18: Rewrite/restructure for clarity.
  • 2024-01-10: Incorporate guidance for writing effective messages.
  • 2023-05-17: Change the recommended language for Status.message to be the service's native language rather than English.
  • 2023-05-17: Specify requirements for changing error messages.
  • 2023-05-10: Require ErrorInfo for all error responses.
  • 2023-05-04: Require uniqueness by message type for error details.
  • 2022-11-04: Added guidance around PERMISSION_DENIED errors previously found in other AIPs.
  • 2022-08-12: Reworded/Simplified intro to add clarity to the intent.
  • 2020-01-22: Added a reference to the ErrorInfo message.
  • 2019-10-14: Added guidance restricting error message mutability to if there is a machine-readable identifier present.
  • 2019-09-23: Added guidance about error message strings being able to change.