Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old version pods are created when deploying a new version with more replicas - Reopen old issue? #120157

Closed
nagygergo opened this issue Aug 24, 2023 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@nagygergo
Copy link

nagygergo commented Aug 24, 2023

What happened?

Same issue as #105395, can't reopen it.

We've observed this issue and it was causing issues when we wanted to change the version of the software deployed while also scaling out the deployment. We have some data store configuration and database schema update scripts implemented as init containers. In our case needed to do a non backward compatible configuration upgrade, so the old version of the service was not able to work with the new config, and the new version of the service was not able to work with the new config.

The following happened:

  1. Updated deployment to have 2 replicas instead of 1. (default maxUnavailable and maxSurge configs)
  2. New ReplicaSet created with 1 podv2. Old ReplicaSet scaled up to 2 podv1-s.
  3. podv2 updated data store config.
  4. new podv1 updated data store config.
  5. Eventually podv2 failed liveness probe, because couldn't contact data store correctly.
  6. RollingUpdate never finished.

What did you expect to happen?

Old replicaset is not scaled out, but stays on it's previous size. The current behaviour is trying to saturate the new replicacount, even if with old pods, while doing rolling upgrade.
As the old replicaset is connected to the old intent of the deployment, there is no guarantee that it can support scaling out to the size described in the new deployment.

How can we reproduce it (as minimally and precisely as possible)?

Taken from #105395
#1. Provision a simple nginx deployment with 2 replicas using the below manifest.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    release: testdeploy
    app: nginx
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 30%
  selector:
    matchLabels:
      release: testdeploy
      app: nginx
  template:
    metadata:
      labels:
        release: testdeploy
        app: nginx
      annotations:
    spec:
      imagePullSecrets:
      tolerations:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchLabels:
                  release: testdeploy
                  app: nginx
      containers:
        - name: web
          image: "nginx:1.20"

The deployment works as expected.

 kubectl -n test-elasticsearch get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-844bf5fdb8-vn72z   1/1     Running   0          15s
nginx-844bf5fdb8-w45ww   1/1     Running   0          15s

#2. Update the deployment by just changing the image version and also increase the replicas to 3 as shown below

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    release: testdeploy
    app: nginx
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 30%
  selector:
    matchLabels:
      release: testdeploy
      app: nginx
  template:
    metadata:
      labels:
        release: testdeploy
        app: nginx
      annotations:
    spec:
      imagePullSecrets:
      tolerations:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchLabels:
                  release: testdeploy
                  app: nginx
      containers:
        - name: web
          image: "nginx:1.21"

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.6", GitCommit:"11902a838028edef305dfe2f96be929bc4d114d8", GitTreeState:"clean", BuildDate:"2023-06-14T09:49:08Z", GoVersion:"go1.19.10", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

K8s on bare metal, internally built

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

@nagygergo nagygergo added the kind/bug Categorizes issue or PR as related to a bug. label Aug 24, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 24, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RyanAoh
Copy link
Member

RyanAoh commented Aug 25, 2023

/sig apps
cc @soltysh

@k8s-ci-robot
Copy link
Contributor

@RyanAoh: The label(s) sig/app cannot be applied, because the repository doesn't have them.

In response to this:

/sig app
cc @soltysh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RyanAoh
Copy link
Member

RyanAoh commented Aug 25, 2023

/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 25, 2023
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Sep 29, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 26, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@github-project-automation github-project-automation bot moved this from Needs Triage to Closed in SIG Apps Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
Archived in project
Development

No branches or pull requests

4 participants