Issue Triage Guidelines

These guidelines serve as a primary document for triaging incoming issues to Kubernetes. SIGs and projects are encouraged to use this guidance as a starting point, and customize to address specific triaging needs.

Table of Contents

Scope

These guidelines serve as a primary document for triaging incoming issues to Kubernetes. SIGs and projects are encouraged to use this guidance as a starting point, and customize to address specific triaging needs.

Note: These guidelines only apply to the Kubernetes repository. Usage for other Kubernetes-related GitHub repositories is TBD.

What Is Triaging?

Issue triage is a process by which a SIG intakes and reviews new GitHub issues and requests, and organizes them to be actioned—either by its own members, or by other SIGs. Triaging involves categorizing issues and pull requests based on factors such as priority/urgency, SIG ownership of the issue, and the issue kind (bug, feature, etc.).

Triage can happen asynchronously and continuously, or in regularly scheduled meetings. Several Kubernetes SIGs and projects have adopted their own approaches to triaging.

Why Is Triaging Beneficial?

SIGs who triage regularly say it offers a number of benefits, such as:

  • Speeding up issue management
  • Keeping contributors engaged by shortening response times
  • Preventing work from lingering endlessly
  • Replacing special requests and one-offs with a neutral process that acts like a boundary
  • Greater transparency, interesting discussions, and more collaborative, informed decision-making
  • Building prioritization, negotiation and decision-making skills, which are critical to most tech roles
  • Reinforcement of SIG community and culture

People who enjoy product management and iterating on processes tend to enjoy triaging because it empowers their SIGs to maintain a steady, continuous flow of work that is assessed and prioritized based on feedback and value.

How to Triage: A Step-by-Step Flow

This guide walks you through a standard triaging process, beginning with tools and tips.

These are tools that your SIG can use to make the triage process simpler, more efficient and faster.

Permissions and the Bot

Opening new issues and leaving comments on other people’s issues are possible for all contributors. However, permission to assign specific labels (such as triage), change milestones, or close other contributors issues is only granted to the author of an issue, assignees, and organization members. For this reason, we use a bot to manage labelling and triaging. For a full list of the bot’s commands and permissions, see the Prow command reference page.

Triage Party

Triage Party is a tool for triaging incoming GitHub issues for large open-source projects, built with the GitHub API. Made public in April 2020, it facilitates “massively multi-player GitHub triage” and reduces contributor response latency.

Its features include:

  • Queries across multiple repositories
  • Queries that are not possible on GitHub:
    • conversation direction (tag: recv, tag: send)
    • duration (updated: +30d)
    • regexp (label: priority/.*)
    • reactions (reactions: >=5)
    • comment popularity (comments-per-month: >0.9)
  • Multiplayer mode: for simultaneous group triage of a pool of issues
  • Button to open issue groups as browser tabs (pop-ups must be disabled)
  • “Shift-Reload” for live data pull

GitHub Project Boards

GitHub offers project boards, set up like kanban boards, to help teams organize and track their workflow in order to get work done. The Release Team has come to depend on their project board for planning new Kubernetes releases; they also use it as an archive to show the work done for past releases.

Other SIGs are also using project boards:

We encourage more SIGs to use project boards to enhance visibility and tracking. If you’d like some help getting started, visit GitHub’s documentation or reach out to SIG Contributor Experience.

DevStats

The CNCF has created a suite of Grafana dashboards and charts for collecting metrics related to all the CNCF projects. The Kubernetes dashboard can be used to help SIGs view real-time metrics on many aspects of their workflow, including:

  • Issue Velocity: How quickly issues are resolved
  • PR Velocity: Including PR workload per SIG, PR time to approve and merge, and other data

Process Pointers and Advice from SIGs

Several SIGs consistently meet weekly or monthly to triage issues. Here are some details about their processes.

Running a Triage Meeting: Tips from api-machinery

The api-machinery SIG has found that triage meetings offer valuable opportunities for newcomers to listen, learn, and start contributing. The SIG hold triage meetings every Tuesday and Thursday and archive recordings via their YouTube playlist. Watch an example of one of their meetings.

In a typical triage meeting, api-machinery members sort through every issue that they haven’t triaged since the previous meeting, using a simple query and issue number to track open PRs and issues. They usually follow this process:

  1. Read through the comments and the code briefly to understand what the issue is about.
  2. Determine by consensus if it belongs to the api-machinery SIG or not. If not, remove the sig/api-machinery label.
  3. Label other SIGs, if appropriate
  4. Discuss briefly the technical implications
  5. Assign people with expertise in the domain to review, comment, reject, etc.

The api-machinery SIG has found that consistently meeting on a regular, fixed schedule is key to the success of a triaging effort. More frequent, small meetings are better than infrequent, large meetings. They also offer a few other pointers for successful triage meetings:

  • We try to balance the load, and ask people if they are okay taking on an issue before assigning it to them.
  • We skip issues that are closed.
  • We also skip cherrypicks, because we consider that the code change was reviewed in the original PR.
  • We ensure participation from the entire SIG and support company diversity.
  • We use this opportunity to add help wanted and good first issue labels.

Triage Guide by cluster-lifecycle

The cluster-lifecycle SIG has developed a triaging page detailing their process, including the Milestones stage. Here is a March 2020 presentation delivered to the SIG chairs and leads group on their process.

Step One: Review Newly Created Open Issues

The first step in a successful triage meeting is reviewing newly created open issues. Kubernetes issues are listed here. Labels are the primary tools for triaging. Here’s a comprehensive label list.

New issues are automatically assigned a needs-triage label indicating that these issues are currently awaiting triage. After triaging an issue, the issue owning SIG will use the bot command /triage accepted. This command removes the needs-triage label and adds the triage/accepted label.

Note that adding labels requires Kubernetes GitHub org membership. If you are not an org member, you should add your triage findings as a comment.

Conducting Searches

GitHub allows you to filter out types of issues and pull requests, which helps you discover items in need of triaging. This table includes some predetermined searches for convenience:

SearchWhat it sorts
created-ascUntriaged issues by age
needs-sigIssues that need to be assigned to a SIG
is:open is:issueNewest incoming issues
comments-descBusiest untriaged issues, sorted by # of comments
comments-ascIssues that need more attention, based on # of comments

We suggest preparing your triage by filtering out the oldest, unlabelled issues and pull requests first.

Step Two: Triage Issues by Type

Use these triage/ and kind/support labels to find open issues that can be quickly closed. A triage engineer can add the appropriate labels.

Depending on your permissions, either close or comment on any issues that are identified as support requests, duplicates, or not-reproducible bugs, or that lack enough information from the reporter.

Support Requests

Some people mistakenly use GitHub issues to file support requests. Usually they are asking for help configuring some aspect of Kubernetes. To handle such an issue, direct the author to use our support request channels. Then apply the kind/support label, which is directed to our support structures, and apply the close label.

Please find more detailed information about Support Requests in the Further Notes section.

Abandoned or Wrongly Placed Issues

If an issue is abandoned or in the wrong place, either close or comment on it.

Needs More Information

The triage/needs-information label indicates an issue needs more information in order for work to continue; comment on or close it.

Bugs

First, validate if the problem is a bug by trying to reproduce it.

If you can reproduce it:

  • Define its priority.
  • Search for duplicates to see if the issue has been reported already. If a duplicate is found, let the issue reporter know, reference the original issue, and close the duplicate.

If you can’t reproduce it:

  • Contact the issue reporter with your findings .
  • Close the issue if both the parties agree that it could not be reproduced.

If you need more information to further work on the issue:

  • Let the reporter know it by adding an issue comment. Include /triage needs-information in the comment to apply the triage/needs-information label.

In all cases, if you do not get a response within 20 days, close the issue with an appropriate comment. If you have permission to close someone else’s issue, first /assign the issue to yourself, then /close it. If you do not, please leave a comment describing your findings.

Help Wanted/Good First Issues

To identify issues that are specifically groomed for new contributors, we use the help wanted and good first issue labels. To use these labels:

  • Review our specific guidelines for how to use them.
  • If the issue satisfies these guidelines, you can add the help wanted label with the /help command. and the good first issue label with the /good-first-issue command. Please note that adding the good first issue label will also automatically add the help wanted label.
  • If an issue has these labels but does not satisfy the guidelines, please ask for more details to be added to the issue or remove the labels using the /remove-help or /remove-good-first-issue commands.

Kind Labels

Usually the kind label is applied by the person submitting the issue. Issues that feature the wrong kind (for example, support requests labelled as bugs) can be corrected by someone triaging; double-checking is a good approach. Our issue templates aim to steer people to the right kind.

Step Three: Define Priority

We use GitHub labels for prioritization. If an issue lacks a priority label, this means it has not been reviewed and prioritized yet.

We aim for consistency across the entire project. However, if you notice an issue that you believe to be incorrectly prioritized, please leave a comment offering your counter-proposal and we will evaluate it.

Priority labelWhat it meansExamples
priority/critical-urgentTeam leaders are responsible for making sure that these issues (in their area) are being actively worked on—i.e., drop what you’re doing. Stuff is burning. These should be fixed before the next release.user-visible bugs in core features
broken builds
tests and critical security issues
priority/important-soonMust be staffed and worked on either currently or very soon—ideally in time for the next release. Important, but wouldn’t block a release.[XXXX]
priority/important-longtermImportant over the long term, but may not be currently staffed and/or may require multiple releases to complete. Wouldn’t block a release.[XXXX]
priority/backlogGeneral agreement that this is a nice-to-have, but no one’s available to work on it anytime soon. Community contributions would be most welcome in the meantime, though it might take a while to get them reviewed if reviewers are fully occupied with higher-priority issues—for example, immediately before a release.[XXXX]
priority/awaiting-more-evidencePossibly useful, but not yet enough support to actually get it done.Mostly placeholders for potentially good ideas, so that they don’t get completely forgotten, and can be referenced or deduped every time they come up

Step Four: Find and Set the Right SIG(s) to Own an Issue

Components are divided among Special Interest Groups (SIGs). The bot assists in finding a proper SIG to own an issue.

  • For example, typing /sig network in a comment should add the sig/network label.
  • Multiword SIGs use dashes: for example, /sig cluster-lifecycle.
  • Keep in mind that these commands must be on their own lines, and at the front of the comment.
  • If you are not sure about who should own an issue, defer to the SIG label only.
  • If you feel an issue should warrant a notification, ping a team with an @ mention, in this format: @kubernetes/sig-<group-name>-<group-suffix>. Here, the <group-suffix> can be one of:
    • bugs
    • feature-requests
    • pr-reviews
    • test-failures
    • proposals For example: @kubernetes/sig-cluster-lifecycle-bugs, can you have a look at this?

Self-Assigning

If you think you can fix the issue, assign it to yourself with just the /assign command. If you cannot self-assign for permissions-related reasons, leave a comment that you’d like to claim it and begin working on a PR.

When an issue already has an assignee, do not assign it to yourself or create a PR without talking to the existing assignee or going through the Follow Up steps as described in this document. Creating a PR when someone else is already working on an issue is not a good practice and is discouraged.

Step Five: Follow Up

If No PR is Created for an Issue Within the Current Release Cycle

If an issue is owned by a developer but a PR has not been created within 30 days, a triage engineer should contact the issue owner and ask them to either create a PR or release ownership.

If a SIG Label Is Assigned, but No Action Is Taken Within 30 Days

If you find an issue with a SIG label assigned, but there’s no evidence of movement or discussion within 30 days, then gently poke the SIG about this pending issue. Also, consider attending one of their meetings to bring up the issue.

If an Issue Has No Activity After 90 Days

When an issue goes 90 days without activity, the k8s-triage-robot adds the lifecycle/stale label to that issue. You can block the bot by applying the /lifecycle frozen label preemptively, or remove the label with the /remove-lifecycle stale command. The k8s-triage-robot adds comments in the issue that include additional details. If you take neither step, the issue will eventually be auto-closed.

Further Notes

Support Requests: Channels

These should be directed to the following:

User Support Response: Example

If you see support questions on [email protected] or issues asking for support, try to redirect them to Discuss. Here is an example response:

Please re-post your question to our Discussion Forums.

We are trying to consolidate the channels to which questions for help/support are posted so that we can improve our efficiency in responding to your requests, and to make it easier for you to find answers to frequently asked questions and how to address common use cases.

We regularly see messages posted in multiple forums, with the full response thread only in one place or, worse, spread across multiple forums. Also, the large volume of support issues on GitHub is making it difficult for us to use issues to identify real bugs.

Members of the Kubernetes community use Discussion Forums to field support requests. Before posting a new question, please search these for answers to similar questions, and also familiarize yourself with:

Again, thanks for using Kubernetes.

The Kubernetes Team