-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS #146
Comments
We should also make it possible to plug in other naming/discovery mechanisms, such as etcd, Eureka, and Consul. One prerequisite is that it needs to be possible to get pod IP addresses #385. |
It would also be nice if higher level services can participate with dns resolution and decorate lower level services where necessary. Also, registration of DNS with external parties is important in larger organizations where important services must be registered to a corporate DNS. DNS also plays into a number of components like protocol aware load balancers where ports are shared and hosts determine routing - we'd want to be able to tie things as CNAMEs as well. |
@smarterclayton I'd be interested in hearing more about your requirements. I'm currently thinking that, for the most part, pod DNS is not so useful, except as a shorthand for ipv6 addresses in debugging scenarios. Except for possible future special cases, pods are relatively disposable and won't have stable addresses or even stable names. In particular, replicationControllers treat pods as fungible, and as discussed in #557 we don't currently plan to reschedule pods, in order to leave future flexibility for layers on top to manage rescheduling, migration, a variety of rolling update schemes, etc. We could implement dynamic DNS, but would need to hammer on DNS caching problems, in linux, in language platforms and libraries, in applications, etc. So, I'm thinking we should support IP per service (I think I mentioned that in the new networking.md doc) and DNS for services. |
IP per service can get very expensive - we run today roughly 1 load balancer per container. So in some cases we may want services to be cheap (which might be a different type of service, really). IP per service seems like it could be optional - I would argue if you're doing local proxying you'd be better off hiding the remote proxy port and ip from the containers anyway to prevent people from baking assumptions in about the remote destination, and if you need a service at a known ip and port you're typically talking more about an edge service. However, when you need it, it should be possible to do easily. |
(above predicated on ipv4 and current state of networking in a lot of environments) |
It's probably also worth distinguishing between details an api consumer is required to provide, vs those the infrastructure picks for her. For instance, when creating a service a consumer may omit the service port, but expect to get back a value for port and a value for IP, chosen by the infrastructure. I have an assumption that software running in the infrastructure generally is flexible to remote ports (especially if injected), so not every consumer of a service needs an IP. However, if a specific port is requested, it may be at the discretion of the infrastructure whether to satisfy that request or reject it. |
I feel like there are decisions here I am not aware of and do not Pod DNS for non-replicated pods IS "IP per service". Maybe we don't want pods "won't have stable addresses or even stable names" ? In identifiers.md
unique across space and time "we don't currently plan to reschedule pods" - under what circumstances? On Thu, Jul 24, 2014 at 4:27 PM, bgrant0607 [email protected]
|
+1; replication controllers start a new pod instead of waiting for the system to reschedule a pod. This is a nifty control loop thingy, but I have not been convinced the part of the system that makes it necessary (where we don't move pods off of damaged hosts) is a feature and not a bug. In fact I consider it a bug at the moment. |
As a naive user I would expect the Pod Create API to return me an ID that I On Thu, Jul 24, 2014 at 10:13 PM, Daniel Smith [email protected]
|
I believe that is the case, if you make a pod with a blank ID, apiserver should fill in the ID field and it would appear in the response. Otherwise it's bug. |
So if the machine that pod was on disappears suddenly, and we re-schedule On Thu, Jul 24, 2014 at 10:28 PM, Daniel Smith [email protected]
|
This is a question of the layer of abstraction provided by the apiserver vs. higher-level layers, which haven't been built yet. Single-use pods are a useful building block. Currently a pod is scheduled only upon creation. If a user wants a pod or N pods to continue to exist indefinitely, they must create a replicationController. This mechanism is very robust and flexible. Pods may disappear for any reason, host death only being one, and they will be replaced by the replicationController. This sort of gives us on-demand restarts and reschedules for free. And, it allows for very flexible update policies, as discussed in #492 . It also enables rescheduling policy, backup pods, migration, etc. to be built at a higher layer. And one could imagine it working even in the case where we want to move a pod from one apisever "cell" to another (by decreasing the replica count in one cell and increasing it in another). There are also cases where one wants to take a pod out of service for data recovery or debugging, but need to replace it in the serving set (aka zombies). And there are cases where we can't kill pods that we need to reschedule/replace (aka phantoms). These scenarios happen even with singletons. Is a rescheduled pod the same pod? It would have a different address, given our current networking implementation. The service abstraction is the only thing we currently provide in k8s to deal with that. We could try to paper over that by updating DDNS, but we'd need to confront TTL/caching issues. The pod would also lose its durable-ish local storage and everything else associated with its previous host. It would only be able to migrate its state in the case of remote storage, such as PD. IMO, we need to be able to control dynamic reassignment of roles within the user's application via the API. |
@smarterclayton Re. automatic port assignment for the existing service approach: Agree. See #390 . |
@bgrant0607 OK I find that pretty convincing. Absolutely definitely do not want "backup" pods to ever be a thing in k8s. It does mean that you basically always have to make a service & rep. contoller even for a single pod, and we'll need to make service names DNS-y--it's no longer critical that pod names are DNS-y--which I think is a change from what @thockin and @smarterclayton were thinking when they hashed out the pod naming system. (I think, in this case, apiserver should offer some DNS-like pod lookup for debugging purposes, but not for production purposes.) |
To your last point, I think someone floated a "dns-service" that does nothing except expose the pod IPs of a pod label query being just as useful for availability as a proxy, as long as the rate_of_pod_failure * number_of_pods > average_latency_of_dns_propagate, especially for integrations with clients that already do some variation of this. But in that case the pod name still doesn't have to be DNS'y and our previous pains to make names resolvable were in vain. |
While working on #170 I was thinking that it would be more consistent if we got rid of Pods as a create-able API object, and just had PodTemplates and ReplicationControllers. It sounds like that is in line with what @bgrant0607 said above. So, then, if you can't create a pod, can you at least list them? Or is the Binding object that is being added in #592 sufficient to represent a (potentially) running pod? |
I guess we need to have a Pod object for the scheduler to find to Bind. But it would be nice to remove the duplication of data between a pod and a podTemplate (in the context of #170). |
If the duplication is removed, doesn't that imply that a podTemplate cannot be removed from the system until all dependent pods are also deleted? And that a user cannot prevent templates from being used until that happens? Seems fraught with coupling issues to me (even with immutable templates, and if they're immutable how do you change who can create them?) |
Good points Clayton. Also saw that Brian made the same point on #170 last On Fri, Jul 25, 2014 at 3:22 PM, Clayton Coleman [email protected]
|
Hmm, I will have to digest this. It sounds good, but it does change some of It feels unfortunate that we say replication controller becomes required I don't think we should waive the name format requirements at lower levels,
|
SGTM |
Maybe I misread 170 and above, but I didn't see a change being proposed to the model after Eric's retraction. I got the following:
Or did I miss something? |
@thockin @smarterclayton @erictune This is more the topic of #170, but, since the discussion is here... I wouldn't be opposed to a simple singleton replicationController -- like a regenerating Pod. It would be nice if it could function anywhere a naked pod could. If we wanted to get really slick, we could make replicationController capable of making replicas of anything. It just needs a URL to post to in order to create new replicas, a label selector to figure out how many replicas exist, and maybe some kind of health/liveness/readiness check for dynamic filtering of instances that aren't fully functional at the moment. This would make our primitives more composable. For instance, if/when we were to add a MigratablePod, the replicationController could control a set of replicas of them. Maybe we could even have a replicationController control a set of replicationControllers. Once we added config generators, a replication controller could replicate replicationController/service pairs. |
Closing in favor of #1261 |
Refactor API http handlers to prepare for versioning.
Fix rbd map for ceph Jewel
…eting-schedule-2 Add the updated meeting time and additional links to the sig-federation README.
Cherry-pick test deflake patches.
Fix scheduled release date for 1.5
* add scripts for hello-app-redis * add license headers and region tags to each script, also modify app/v1beta1 to app/v1 in app-deployment.yaml * fix some typos
…te-from-drone [1.27] migrate from drone to GH Actions
Provide DNS resolution for pod addresses.
The text was updated successfully, but these errors were encountered: