Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS #146

Closed
bgrant0607 opened this issue Jun 18, 2014 · 24 comments
Closed

DNS #146

bgrant0607 opened this issue Jun 18, 2014 · 24 comments
Labels
area/downward-api sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@bgrant0607
Copy link
Member

Provide DNS resolution for pod addresses.

@bgrant0607
Copy link
Member Author

We should also make it possible to plug in other naming/discovery mechanisms, such as etcd, Eureka, and Consul. One prerequisite is that it needs to be possible to get pod IP addresses #385.

@bgrant0607 bgrant0607 changed the title Container DNS Pod DNS Jul 11, 2014
@smarterclayton
Copy link
Contributor

It would also be nice if higher level services can participate with dns resolution and decorate lower level services where necessary. Also, registration of DNS with external parties is important in larger organizations where important services must be registered to a corporate DNS. DNS also plays into a number of components like protocol aware load balancers where ports are shared and hosts determine routing - we'd want to be able to tie things as CNAMEs as well.

@bgrant0607
Copy link
Member Author

@smarterclayton I'd be interested in hearing more about your requirements.

I'm currently thinking that, for the most part, pod DNS is not so useful, except as a shorthand for ipv6 addresses in debugging scenarios.

Except for possible future special cases, pods are relatively disposable and won't have stable addresses or even stable names. In particular, replicationControllers treat pods as fungible, and as discussed in #557 we don't currently plan to reschedule pods, in order to leave future flexibility for layers on top to manage rescheduling, migration, a variety of rolling update schemes, etc.

We could implement dynamic DNS, but would need to hammer on DNS caching problems, in linux, in language platforms and libraries, in applications, etc.

So, I'm thinking we should support IP per service (I think I mentioned that in the new networking.md doc) and DNS for services.

@smarterclayton
Copy link
Contributor

IP per service can get very expensive - we run today roughly 1 load balancer per container. So in some cases we may want services to be cheap (which might be a different type of service, really). IP per service seems like it could be optional - I would argue if you're doing local proxying you'd be better off hiding the remote proxy port and ip from the containers anyway to prevent people from baking assumptions in about the remote destination, and if you need a service at a known ip and port you're typically talking more about an edge service.

However, when you need it, it should be possible to do easily.

@smarterclayton
Copy link
Contributor

(above predicated on ipv4 and current state of networking in a lot of environments)

@smarterclayton
Copy link
Contributor

It's probably also worth distinguishing between details an api consumer is required to provide, vs those the infrastructure picks for her. For instance, when creating a service a consumer may omit the service port, but expect to get back a value for port and a value for IP, chosen by the infrastructure. I have an assumption that software running in the infrastructure generally is flexible to remote ports (especially if injected), so not every consumer of a service needs an IP. However, if a specific port is requested, it may be at the discretion of the infrastructure whether to satisfy that request or reject it.

@thockin
Copy link
Member

thockin commented Jul 25, 2014

I feel like there are decisions here I am not aware of and do not
understand.

Pod DNS for non-replicated pods IS "IP per service". Maybe we don't want
people to run non-replicated pods? Then we should not offer it.

pods "won't have stable addresses or even stable names" ? In identifiers.md
we said:

4) Each pod instance on an apiserver has a PodID (a UUID) that is

unique across space and time
1) If not specified by the client, the apiserver will assign this
identifier
2) This identifier will persist for the lifetime of the pod, even if
the pod is stopped and started or moved across hosts

"we don't currently plan to reschedule pods" - under what circumstances?
If a minion dies, we won't trigger a reschedule? Or are we asserting that
the "rescheduled" pod is a different pod, despite being the only one the
user asked for?

On Thu, Jul 24, 2014 at 4:27 PM, bgrant0607 [email protected]
wrote:

@smarterclayton https://github.com/smarterclayton I'd be interested in
hearing more about your requirements.

I'm currently thinking that, for the most part, pod DNS is not so useful,
except as a shorthand for ipv6 addresses in debugging scenarios.

Except for possible future special cases, pods are relatively disposable
and won't have stable addresses or even stable names. In particular,
replicationControllers treat pods as fungible, and as discussed in #557
#557 we don't
currently plan to reschedule pods, in order to leave future flexibility for
layers on top to manage rescheduling, migration, a variety of rolling
update schemes, etc.

We could implement dynamic DNS, but would need to hammer on DNS caching
problems, in linux, in language platforms and libraries, in applications,
etc.

So, I'm thinking we should support IP per service (I think I mentioned
that in the new networking.md doc) and DNS for services.

Reply to this email directly or view it on GitHub
#146 (comment)
.

@lavalamp
Copy link
Member

"we don't currently plan to reschedule pods" - under what circumstances?

+1; replication controllers start a new pod instead of waiting for the system to reschedule a pod. This is a nifty control loop thingy, but I have not been convinced the part of the system that makes it necessary (where we don't move pods off of damaged hosts) is a feature and not a bug. In fact I consider it a bug at the moment.

@thockin
Copy link
Member

thockin commented Jul 25, 2014

As a naive user I would expect the Pod Create API to return me an ID that I
can use forever and ever to address the Pod I just Created.

On Thu, Jul 24, 2014 at 10:13 PM, Daniel Smith [email protected]
wrote:

"we don't currently plan to reschedule pods" - under what circumstances?

+1; replication controllers start a new pod instead of waiting for the
system to reschedule a pod. This is a nifty control loop thingy, but I have
not been convinced the part of the system that makes it necessary (where we
don't move pods off of damaged hosts) is a feature and not a bug. In fact I
consider it a bug at the moment.

Reply to this email directly or view it on GitHub
#146 (comment)
.

@lavalamp
Copy link
Member

As a naive user I would expect the Pod Create API to return me an ID that I can use forever and ever to address the Pod I just Created.

I believe that is the case, if you make a pod with a blank ID, apiserver should fill in the ID field and it would appear in the response. Otherwise it's bug.

@thockin
Copy link
Member

thockin commented Jul 25, 2014

So if the machine that pod was on disappears suddenly, and we re-schedule
that pod, it should have the same ID. Thus it is the same pod.

On Thu, Jul 24, 2014 at 10:28 PM, Daniel Smith [email protected]
wrote:

As a naive user I would expect the Pod Create API to return me an ID that
I can use forever and ever to address the Pod I just Created.

I believe that is the case, if you make a pod with a blank ID, apiserver
should fill in the ID field and it would appear in the response. Otherwise
it's bug.

Reply to this email directly or view it on GitHub
#146 (comment)
.

@bgrant0607
Copy link
Member Author

@thockin @lavalamp

This is a question of the layer of abstraction provided by the apiserver vs. higher-level layers, which haven't been built yet. Single-use pods are a useful building block.

Currently a pod is scheduled only upon creation. If a user wants a pod or N pods to continue to exist indefinitely, they must create a replicationController. This mechanism is very robust and flexible. Pods may disappear for any reason, host death only being one, and they will be replaced by the replicationController.

This sort of gives us on-demand restarts and reschedules for free. And, it allows for very flexible update policies, as discussed in #492 . It also enables rescheduling policy, backup pods, migration, etc. to be built at a higher layer. And one could imagine it working even in the case where we want to move a pod from one apisever "cell" to another (by decreasing the replica count in one cell and increasing it in another).

There are also cases where one wants to take a pod out of service for data recovery or debugging, but need to replace it in the serving set (aka zombies). And there are cases where we can't kill pods that we need to reschedule/replace (aka phantoms). These scenarios happen even with singletons.

Is a rescheduled pod the same pod? It would have a different address, given our current networking implementation. The service abstraction is the only thing we currently provide in k8s to deal with that. We could try to paper over that by updating DDNS, but we'd need to confront TTL/caching issues.

The pod would also lose its durable-ish local storage and everything else associated with its previous host. It would only be able to migrate its state in the case of remote storage, such as PD.

IMO, we need to be able to control dynamic reassignment of roles within the user's application via the API.

@bgrant0607
Copy link
Member Author

@smarterclayton Re. automatic port assignment for the existing service approach: Agree. See #390 .

@lavalamp
Copy link
Member

@bgrant0607 OK I find that pretty convincing. Absolutely definitely do not want "backup" pods to ever be a thing in k8s.

It does mean that you basically always have to make a service & rep. contoller even for a single pod, and we'll need to make service names DNS-y--it's no longer critical that pod names are DNS-y--which I think is a change from what @thockin and @smarterclayton were thinking when they hashed out the pod naming system.

(I think, in this case, apiserver should offer some DNS-like pod lookup for debugging purposes, but not for production purposes.)

@smarterclayton
Copy link
Contributor

To your last point, I think someone floated a "dns-service" that does nothing except expose the pod IPs of a pod label query being just as useful for availability as a proxy, as long as the rate_of_pod_failure * number_of_pods > average_latency_of_dns_propagate, especially for integrations with clients that already do some variation of this. But in that case the pod name still doesn't have to be DNS'y and our previous pains to make names resolvable were in vain.

@erictune
Copy link
Member

While working on #170 I was thinking that it would be more consistent if we got rid of Pods as a create-able API object, and just had PodTemplates and ReplicationControllers. It sounds like that is in line with what @bgrant0607 said above.

So, then, if you can't create a pod, can you at least list them? Or is the Binding object that is being added in #592 sufficient to represent a (potentially) running pod?

@erictune
Copy link
Member

I guess we need to have a Pod object for the scheduler to find to Bind. But it would be nice to remove the duplication of data between a pod and a podTemplate (in the context of #170).

@smarterclayton
Copy link
Contributor

If the duplication is removed, doesn't that imply that a podTemplate cannot be removed from the system until all dependent pods are also deleted? And that a user cannot prevent templates from being used until that happens? Seems fraught with coupling issues to me (even with immutable templates, and if they're immutable how do you change who can create them?)

@erictune
Copy link
Member

Good points Clayton. Also saw that Brian made the same point on #170 last
night .. just read it. Okay, suggestion withdrawn.

On Fri, Jul 25, 2014 at 3:22 PM, Clayton Coleman [email protected]
wrote:

If the duplication is removed, doesn't that imply that a podTemplate
cannot be removed from the system until all dependent pods are also
deleted? And that a user cannot prevent templates from being used until
that happens? Seems fraught with coupling issues to me (even with immutable
templates, and if they're immutable how do you change who can create them?)


Reply to this email directly or view it on GitHub
#146 (comment)
.

@thockin
Copy link
Member

thockin commented Jul 26, 2014

Hmm, I will have to digest this. It sounds good, but it does change some of
how I pictured the while stack.

It feels unfortunate that we say replication controller becomes required
for the vast majority of use cases - do we need a simpler singleton
controller that manages the "exactly one" semantic?

I don't think we should waive the name format requirements at lower levels,
though. It is still useful to have a tight spec. Maybe we want to change
them, e.g. to C tokens rather than DNS labels, but only if there is a good
reason.
On Jul 25, 2014 3:24 PM, "erictune" [email protected] wrote:

Good points Clayton. Also saw that Brian made the same point on #170 last
night .. just read it. Okay, suggestion withdrawn.

On Fri, Jul 25, 2014 at 3:22 PM, Clayton Coleman [email protected]

wrote:

If the duplication is removed, doesn't that imply that a podTemplate
cannot be removed from the system until all dependent pods are also
deleted? And that a user cannot prevent templates from being used until
that happens? Seems fraught with coupling issues to me (even with
immutable
templates, and if they're immutable how do you change who can create
them?)

Reply to this email directly or view it on GitHub
<
https://github.com/GoogleCloudPlatform/kubernetes/issues/146#issuecomment-50211352>

.

Reply to this email directly or view it on GitHub
#146 (comment)
.

@lavalamp
Copy link
Member

... a simpler singleton controller that manages the "exactly one" semantic?

SGTM

@smarterclayton
Copy link
Contributor

Maybe I misread 170 and above, but I didn't see a change being proposed to the model after Eric's retraction. I got the following:

  • pod template contents can't be read by repl controller loop (eventually, for security)
  • repl controller loop only has permission to create pod from template id, and specify any overrides allowed by the template
  • binding represents tying a pod to a host - a pod gets bound iff a binding exists
  • repl controller with 1 replica functions as singleton controller as described in last two comments
  • a user can create a pod adhoc at any point without specifying a template
  • a template is required for a repl controller
  • pod template can be changed at any time - it's up to the user to decide whether doing so is a good idea (vs creating a new template and updating the controller).

Or did I miss something?

@bgrant0607
Copy link
Member Author

@thockin @smarterclayton @erictune This is more the topic of #170, but, since the discussion is here...

I wouldn't be opposed to a simple singleton replicationController -- like a regenerating Pod. It would be nice if it could function anywhere a naked pod could.

If we wanted to get really slick, we could make replicationController capable of making replicas of anything. It just needs a URL to post to in order to create new replicas, a label selector to figure out how many replicas exist, and maybe some kind of health/liveness/readiness check for dynamic filtering of instances that aren't fully functional at the moment.

This would make our primitives more composable. For instance, if/when we were to add a MigratablePod, the replicationController could control a set of replicas of them. Maybe we could even have a replicationController control a set of replicationControllers. Once we added config generators, a replication controller could replicate replicationController/service pairs.

@thockin
Copy link
Member

thockin commented Oct 7, 2014

Closing in favor of #1261

@thockin thockin closed this as completed Oct 7, 2014
vishh pushed a commit to vishh/kubernetes that referenced this issue Apr 6, 2016
Refactor API http handlers to prepare for versioning.
feiskyer added a commit to feiskyer/kubernetes that referenced this issue Oct 8, 2016
xingzhou pushed a commit to xingzhou/kubernetes that referenced this issue Dec 15, 2016
…eting-schedule-2

Add the updated meeting time and additional links to the sig-federation README.
iaguis pushed a commit to kinvolk/kubernetes that referenced this issue Feb 6, 2018
Cherry-pick test deflake patches.
seans3 pushed a commit to seans3/kubernetes that referenced this issue Apr 10, 2019
pjh pushed a commit to pjh/kubernetes that referenced this issue Jan 31, 2022
* add scripts for hello-app-redis

* add license headers and region tags to each script, also modify app/v1beta1 to app/v1 in app-deployment.yaml

* fix some typos
krunalhinguu pushed a commit to krunalhinguu/kubernetes that referenced this issue Jul 19, 2024
…te-from-drone

[1.27] migrate from drone to GH Actions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/downward-api sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

7 participants