Kubernetes News

The Kubernetes project blog
  1. When most people think of contributing to an open source project, I suspect they probably think of contributing code changes, new features, and bug fixes. As a software engineer and a long-time open source user and contributor, that's certainly what I thought. Although I have written a good quantity of documentation in different workflows, the massive size of the Kubernetes community was a new kind of "client." I just didn't know what to expect when Google asked my compatriots and me at Lion's Way to make much-needed updates to the Kubernetes Development Guide.

    This article originally appeared on the Kubernetes Contributor Community blog.

    The Delights of Working With a Community

    As professional writers, we are used to being hired to write very specific pieces. We specialize in marketing, training, and documentation for technical services and products, which can range anywhere from relatively fluffy marketing emails to deeply technical white papers targeted at IT and developers. With this kind of professional service, every deliverable tends to have a measurable return on investment. I knew this metric wouldn't be present when working on open source documentation, but I couldn't predict how it would change my relationship with the project.

    One of the primary traits of the relationship between our writing and our traditional clients is that we always have one or two primary points of contact inside a company. These contacts are responsible for reviewing our writing and making sure it matches the voice of the company and targets the audience they're looking for. It can be stressful -- which is why I'm so glad that my writing partner, eagle-eyed reviewer, and bloodthirsty editor Joel handles most of the client contact.

    I was surprised and delighted that all of the stress of client contact went out the window when working with the Kubernetes community.

    "How delicate do I have to be? What if I screw up? What if I make a developer angry? What if I make enemies?" These were all questions that raced through my mind and made me feel like I was approaching a field of eggshells when I first joined the #sig-contribex channel on the Kubernetes Slack and announced that I would be working on the Development Guide.

    "The Kubernetes Code of Conduct is in effect, so please be excellent to each other." — Jorge Castro, SIG ContribEx co-chair

    My fears were unfounded. Immediately, I felt welcome. I like to think this isn't just because I was working on a much needed task, but rather because the Kubernetes community is filled with friendly, welcoming people. During the weekly SIG ContribEx meetings, our reports on progress with the Development Guide were included immediately. In addition, the leader of the meeting would always stress that the Kubernetes Code of Conduct was in effect, and that we should, like Bill and Ted, be excellent to each other.

    This Doesn't Mean It's All Easy

    The Development Guide needed a pretty serious overhaul. When we got our hands on it, it was already packed with information and lots of steps for new developers to go through, but it was getting dusty with age and neglect. Documentation can really require a global look, not just point fixes. As a result, I ended up submitting a gargantuan pull request to the Community repo: 267 additions and 88 deletions.

    The life cycle of a pull request requires a certain number of Kubernetes organization members to review and approve changes before they can be merged. This is a great practice, as it keeps both documentation and code in pretty good shape, but it can be tough to cajole the right people into taking the time for such a hefty review. As a result, that massive PR took 26 days from my first submission to final merge. But in the end, it was successful.

    Since Kubernetes is a pretty fast-moving project, and since developers typically aren't really excited about writing documentation, I also ran into the problem that sometimes, the secret jewels that describe the workings of a Kubernetes subsystem are buried deep within the labyrinthine mind of a brilliant engineer, and not in plain English in a Markdown file. I ran headlong into this issue when it came time to update the getting started documentation for end-to-end (e2e) testing.

    This portion of my journey took me out of documentation-writing territory and into the role of a brand new user of some unfinished software. I ended up working with one of the developers of the new kubetest2 framework to document the latest process of getting up-and-running for e2e testing, but it required a lot of head scratching on my part. You can judge the results for yourself by checking out my completed pull request.

    Nobody Is the Boss, and Everybody Gives Feedback

    But while I secretly expected chaos, the process of contributing to the Kubernetes Development Guide and interacting with the amazing Kubernetes community went incredibly smoothly. There was no contention. I made no enemies. Everybody was incredibly friendly and welcoming. It was enjoyable.

    With an open source project, there is no one boss. The Kubernetes project, which approaches being gargantuan, is split into many different special interest groups (SIGs), working groups, and communities. Each has its own regularly scheduled meetings, assigned duties, and elected chairpersons. My work intersected with the efforts of both SIG ContribEx (who watch over and seek to improve the contributor experience) and SIG Testing (who are in charge of testing). Both of these SIGs proved easy to work with, eager for contributions, and populated with incredibly friendly and welcoming people.

    In an active, living project like Kubernetes, documentation continues to need maintenance, revision, and testing alongside the code base. The Development Guide will continue to be crucial to onboarding new contributors to the Kubernetes code base, and as our efforts have shown, it is important that this guide keeps pace with the evolution of the Kubernetes project.

    Joel and I really enjoy interacting with the Kubernetes community and contributing to the Development Guide. I really look forward to continuing to not only contributing more, but to continuing to build the new friendships I've made in this vast open source community over the past few months.

  2. Author: Somtochi Onyekwere


    Google Summer of Code is a global program that is geared towards introducing students to open source. Students are matched with open-source organizations to work with them for three months during the summer.

    My name is Somtochi Onyekwere from the Federal University of Technology, Owerri (Nigeria) and this year, I was given the opportunity to work with Kubernetes (under the CNCF organization) and this led to an amazing summer spent learning, contributing and interacting with the community.

    Specifically, I worked on the Cluster Addons: Package all the things! project. The project focused on building operators for better management of various cluster addons, extending the tooling for building these operators and making the creation of these operators a smooth process.


    Kubernetes has progressed greatly in the past few years with a flourishing community and a large number of contributors. The codebase is gradually moving away from the monolith structure where all the code resides in the kubernetes/kubernetes repository to being split into multiple sub-projects. Part of the focus of cluster-addons is to make some of these sub-projects work together in an easy to assemble, self-monitoring, self-healing and Kubernetes-native way. It enables them to work seamlessly without human intervention.

    The community is exploring the use of operators as a mechanism to monitor various resources in the cluster and properly manage these resources. In addition to this, it provides self-healing and it is a kubernetes-native pattern that can encode how best these addons work and manage them properly.

    What are cluster addons? Cluster addons are a collection of resources (like Services and deployment) that are used to give a Kubernetes cluster additional functionalities. They range from things as simple as the Kubernetes dashboards (for visualization) to more complex ones like Calico (for networking). These addons are essential to different applications running in the cluster and the cluster itself. The addon operator provides a nicer way of managing these addons and understanding the health and status of the various resources that comprise the addon. You can get a deeper overview in this article.

    Operators are custom controllers with custom resource definitions that encode application-specific knowledge and are used for managing complex stateful applications. It is a widely accepted pattern. Managing addons via operators, with these operators encoding knowledge of how best the addons work, introduces a lot of advantages while setting standards that will be easy to follow and scale. This article does a good job of explaining operators.

    The addon operators can solve a lot of problems, but they have their challenges. Those under the cluster-addons project had missing pieces and were still a proof of concept. Generating the RBAC configuration for the operators was a pain and sometimes the operators were given too much privilege. The operators weren’t very extensible as it only pulled manifests from local filesystems or HTTP(s) servers and a lot of simple addons were generating the same code. I spent the summer working on these issues, looking at them with fresh eyes and coming up with solutions for both the known and unknown issues.

    Various additions to kubebuilder-declarative-pattern

    The kubebuilder-declarative-pattern (from here on referred to as KDP) repo is an extra layer of addon specific tooling on top of the kubebuilder SDK that is enabled by passing the experimental --pattern=addon flag to kubebuilder create command. Together, they create the base code for the addon operator. During the internship, I worked on a couple of features in KDP and cluster-addons.

    Operator version checking

    Enabling version checks for operators helped in making upgrades/downgrades safer to different versions of the addon, even though the operator had complex logic. It is a way of matching the version of an addon to the version of the operator that knows how to manage it well. Most addons have different versions and these versions might need to be managed differently. This feature checks the custom resource for the addons.k8s.io/min-operator-version annotation which states the minimum operator version that is needed to manage the version against the version of the operator. If the operator version is below the minimum version required, the operator pauses with an error telling the user that the version of the operator is too low. This helps to ensure that the correct operator is being used for the addon.

    Git repository for storing the manifests

    Previously, there was support for only local file directories and HTTPS repositories for storing manifests. Giving creators of addon operators the ability to store manifest in GitHub repository enables faster development and version control. When starting the controller, you can pass a flag to specify the location of your channels directory. The channels directory contains the manifests for different versions, the controller pulls the manifest from this directory and applies it to the cluster. During the internship period, I extended it to include Git repositories.

    Annotations to temporarily disable reconciliation

    The reconciliation loop that ensures that the desired state matches the actual state prevents modification of objects in the cluster. This makes it hard to experiment or investigate what might be wrong in the cluster as any changes made are promptly reverted. I resolved this by allowing users to place an addons.k8s.io/ignore annotation on the resource that they don’t want the controller to reconcile. The controller checks for this annotation and doesn’t reconcile that object. To resume reconciliation, the annotation can be removed from the resource.

    Unstructured support in kubebuilder-declarative-pattern

    One of the operators that I worked on is a generic controller that could manage more than one cluster addon that did not require extra configuration. To do this, the operator couldn’t use a particular type and needed the kubebuilder-declarative-repo to support using the unstructured.Unstructured type. There were various functions in the kubebuilder-declarative-pattern that couldn’t handle this type and returned an error if the object passed in was not of type addonsv1alpha1.CommonObject. The functions were modified to handle both unstructured.Unstructured and addonsv1alpha.CommonObject.

    Tools and CLI programs

    There were also some command-line programs I wrote that could be used to make working with addon operators easier. Most of them have uses outside the addon operators as they try to solve a specific problem that could surface anywhere while working with Kubernetes. I encourage you to check them out when you have the chance!

    RBAC Generator

    One of the biggest concerns with the operator was RBAC. You had to manually look through the manifest and add the RBAC rule for each resource as it needs to have RBAC permissions to create, get, update and delete the resources in the manifest when running in-cluster. Building the RBAC generator automated the process of writing the RBAC roles and role bindings. The function of the RBAC generator is simple. It accepts the file name of the manifest as a flag. Then, it parses the manifest and gets the API group and resource name of the resources and adds it to a role. It outputs the role and role binding to stdout or a file if the --out flag is parsed.

    Additionally, the tool enables you to split the RBAC by separating the cluster roles in the manifest. This lessened the security concern of an operator being over-privileged as it needed to have all the permissions that the clusterrole has. If you want to apply the clusterrole yourself and not give the operator these permissions, you can pass in a --supervisory boolean flag so that the generator does not add these permissions to the role. The CLI program resides here.

    Kubectl Ownerref

    It is hard to find out at a glance which objects were created by an addon custom resource. This kubectl plugin alleviates that pain by displaying all the objects in the cluster that a resource has ownerrefs on. You simply pass the kind and the name of the resource as arguments to the program and it checks the cluster for the objects and gives the kind, name, the namespace of such an object. It could be useful to get a general overview of all the objects that the controller is reconciling by passing in the name and kind of custom resource. The CLI program resides here.

    Addon Operators

    To fully understand addons operators and make changes to how they are being created, you have to try creating and using them. Part of the summer was spent building operators for some popular addons like the Kubernetes dashboard, flannel, NodeLocalDNS and so on. Please check the cluster-addons repository for the different addon operators. In this section, I will just highlight one that is a little different from the others.

    Generic Controller

    The generic controller can be shared between addons that don’t require much configuration. This minimizes resource consumption on the cluster as it reduces the number of controllers that need to be run. Also instead of building your own operator, you can just use the generic controller and whenever you feel that your needs have grown and you need a more complex operator, you can always scaffold the code with kubebuilder and continue from where the generic operator stopped. To use the generic controller, you can generate the CustomResourceDefinition(CRD) using this tool (generic-addon). You pass in the kind, group, and the location of your channels directory (it could be a Git repository too!). The tool generates the - CRD, RBAC manifest and two custom resources for you.

    The process is as follows:

    This tool creates:

    1. The CRD for your addon
    2. The RBAC rules for the CustomResourceDefinitions
    3. The RBAC rules for applying the manifests
    4. The custom resource for your addon
    5. A Generic custom resource

    The Generic custom resource looks like this:


    Apply these manifests but ensure to apply the CRD before the CR. Then, run the Generic controller, either on your machine or in-cluster.

    If you are interested in building an operator, Please check out this guide.

    Relevant Links

    Further Work

    A lot of work was definitely done on the cluster addons during the GSoC period. But we need more people building operators and using them in the cluster. We need wider adoption in the community. Build operators for your favourite addons and tell us how it went and if you had any issues. Check out this README.md to get started.


    I really want to appreciate my mentors Justin Santa Barbara (Google) and Leigh Capili (Weaveworks). My internship was awesome because they were awesome. They set a golden standard for what mentorship should be. They were accessible and always available to clear any confusion. I think what I liked best was that they didn’t just dish out tasks, instead, we had open discussions about what was wrong and what could be improved. They are really the best and I hope I get to work with them again! Also, I want to say a huge thanks to Lubomir I. Ivanov for reviewing this blog post!


    So far I have learnt a lot about Go, the internals of Kubernetes, and operators. I want to conclude by encouraging people to contribute to open-source (especially Kubernetes :)) regardless of your level of experience. It has been a well-rounded experience for me and I have come to love the community. It is a great initiative and it is a great way to learn and meet awesome people. Special shoutout to Google for organizing this program.

    If you are interested in cluster addons and finding out more on addon operators, you are welcome to join our slack channel on the Kubernetes #cluster-addons.

    Somtochi Onyekwere is a software engineer that loves contributing to open-source and exploring cloud native solutions.

  3. Authors: Marek Siarkowicz (Google), Nathan Beach (Google)

    Logs are an essential aspect of observability and a critical tool for debugging. But Kubernetes logs have traditionally been unstructured strings, making any automated parsing difficult and any downstream processing, analysis, or querying challenging to do reliably.

    In Kubernetes 1.19, we are adding support for structured logs, which natively support (key, value) pairs and object references. We have also updated many logging calls such that over 99% of logging volume in a typical deployment are now migrated to the structured format.

    To maintain backwards compatibility, structured logs will still be outputted as a string where the string contains representations of those "key"="value" pairs. Starting in alpha in 1.19, logs can also be outputted in JSON format using the --logging-format=json flag.

    Using Structured Logs

    We've added two new methods to the klog library: InfoS and ErrorS. For example, this invocation of InfoS:

    klog.InfoS("Pod status updated", "pod", klog.KObj(pod), "status", status)

    will result in this log:

    I1025 00:15:15.525108 1 controller_utils.go:116] "Pod status updated" pod="kube-system/kubedns" status="ready"

    Or, if the --logging-format=json flag is set, it will result in this output:

    "ts": 1580306777.04728,
    "msg": "Pod status updated",
    "pod": {
    "name": "coredns",
    "namespace": "kube-system"
    "status": "ready"

    This means downstream logging tools can easily ingest structured logging data and instead of using regular expressions to parse unstructured strings. This also makes processing logs easier, querying logs more robust, and analyzing logs much faster.

    With structured logs, all references to Kubernetes objects are structured the same way, so you can filter the output and only log entries referencing the particular pod. You can also find logs indicating how the scheduler was scheduling the pod, how the pod was created, the health probes of the pod, and all other changes in the lifecycle of the pod.

    Suppose you are debugging an issue with a pod. With structured logs, you can filter to only those log entries referencing the pod of interest, rather than needing to scan through potentially thousands of log lines to find the relevant ones.

    Not only are structured logs more useful when manual debugging of issues, they also enable richer features like automated pattern recognition within logs or tighter correlation of log and trace data.

    Finally, structured logs can help reduce storage costs for logs because most storage systems are more efficiently able to compress structured key=value data than unstructured strings.

    Get Involved

    While we have updated over 99% of the log entries by log volume in a typical deployment, there are still thousands of logs to be updated. Pick a file or directory that you would like to improve and migrate existing log calls to use structured logs. It's a great and easy way to make your first contribution to Kubernetes!

  4. Author: Jordan Liggitt (Google)

    As Kubernetes maintainers, we're always looking for ways to improve usability while preserving compatibility. As we develop features, triage bugs, and answer support questions, we accumulate information that would be helpful for Kubernetes users to know. In the past, sharing that information was limited to out-of-band methods like release notes, announcement emails, documentation, and blog posts. Unless someone knew to seek out that information and managed to find it, they would not benefit from it.

    In Kubernetes v1.19, we added a feature that allows the Kubernetes API server to send warnings to API clients. The warning is sent using a standard Warning response header, so it does not change the status code or response body in any way. This allows the server to send warnings easily readable by any API client, while remaining compatible with previous client versions.

    Warnings are surfaced by kubectl v1.19+ in stderr output, and by the k8s.io/client-go client library v0.19.0+ in log output. The k8s.io/client-go behavior can be overridden per-process or per-client.

    Deprecation Warnings

    The first way we are using this new capability is to send warnings for use of deprecated APIs.

    Kubernetes is a big, fast-moving project. Keeping up with the changes in each release can be daunting, even for people who work on the project full-time. One important type of change is API deprecations. As APIs in Kubernetes graduate to GA versions, pre-release API versions are deprecated and eventually removed.

    Even though there is an extended deprecation period, and deprecations are included in release notes, they can still be hard to track. During the deprecation period, the pre-release API remains functional, allowing several releases to transition to the stable API version. However, we have found that users often don't even realize they are depending on a deprecated API version until they upgrade to the release that stops serving it.

    Starting in v1.19, whenever a request is made to a deprecated REST API, a warning is returned along with the API response. This warning includes details about the release in which the API will no longer be available, and the replacement API version.

    Because the warning originates at the server, and is intercepted at the client level, it works for all kubectl commands, including high-level commands like kubectl apply, and low-level commands like kubectl get --raw:

    kubectl applying a manifest file, then displaying a warning message 'networking.k8s.io/v1beta1 Ingress is deprecated in v1.19+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress'.

    This helps people affected by the deprecation to know the request they are making is deprecated, how long they have to address the issue, and what API they should use instead. This is especially helpful when the user is applying a manifest they didn't create, so they have time to reach out to the authors to ask for an updated version.

    We also realized that the person using a deprecated API is often not the same person responsible for upgrading the cluster, so we added two administrator-facing tools to help track use of deprecated APIs and determine when upgrades are safe.


    Starting in Kubernetes v1.19, when a request is made to a deprecated REST API endpoint, an apiserver_requested_deprecated_apis gauge metric is set to 1 in the kube-apiserver process. This metric has labels for the API group, version, resource, and subresource, and a removed_version label that indicates the Kubernetes release in which the API will no longer be served.

    This is an example query using kubectl, prom2json, and jq to determine which deprecated APIs have been requested from the current instance of the API server:

    kubectl get --raw /metrics | prom2json | jq '
     .[] | select(.name=="apiserver_requested_deprecated_apis").metrics[].labels


    "group": "extensions",
    "removed_release": "1.22",
    "resource": "ingresses",
    "subresource": "",
    "version": "v1beta1"
    "group": "rbac.authorization.k8s.io",
    "removed_release": "1.22",
    "resource": "clusterroles",
    "subresource": "",
    "version": "v1beta1"

    This shows the deprecated extensions/v1beta1 Ingress and rbac.authorization.k8s.io/v1beta1 ClusterRole APIs have been requested on this server, and will be removed in v1.22.

    We can join that information with the apiserver_request_total metrics to get more details about the requests being made to these APIs:

    kubectl get --raw /metrics | prom2json | jq '
     # set $deprecated to a list of deprecated APIs
     .[] |
     select(.name=="apiserver_requested_deprecated_apis").metrics[].labels |
     ] as $deprecated
     # select apiserver_request_total metrics which are deprecated
     .[] | select(.name=="apiserver_request_total").metrics[] |
     select(.labels | {group,version,resource} as $key | $deprecated | index($key))


    "labels": {
    "code": "0",
    "component": "apiserver",
    "contentType": "application/vnd.kubernetes.protobuf;stream=watch",
    "dry_run": "",
    "group": "extensions",
    "resource": "ingresses",
    "scope": "cluster",
    "subresource": "",
    "verb": "WATCH",
    "version": "v1beta1"
    "value": "21"
    "labels": {
    "code": "200",
    "component": "apiserver",
    "contentType": "application/vnd.kubernetes.protobuf",
    "dry_run": "",
    "group": "extensions",
    "resource": "ingresses",
    "scope": "cluster",
    "subresource": "",
    "verb": "LIST",
    "version": "v1beta1"
    "value": "1"
    "labels": {
    "code": "200",
    "component": "apiserver",
    "contentType": "application/json",
    "dry_run": "",
    "group": "rbac.authorization.k8s.io",
    "resource": "clusterroles",
    "scope": "cluster",
    "subresource": "",
    "verb": "LIST",
    "version": "v1beta1"
    "value": "1"

    The output shows that only read requests are being made to these APIs, and the most requests have been made to watch the deprecated Ingress API.

    You can also find that information through the following Prometheus query, which returns information about requests made to deprecated APIs which will be removed in v1.22:

    apiserver_requested_deprecated_apis{removed_version="1.22"} * on(group,version,resource,subresource)
    group_right() apiserver_request_total

    Audit annotations

    Metrics are a fast way to check whether deprecated APIs are being used, and at what rate, but they don't include enough information to identify particular clients or API objects. Starting in Kubernetes v1.19, audit events for requests to deprecated APIs include an audit annotation of "k8s.io/deprecated":"true". Administrators can use those audit events to identify specific clients or objects that need to be updated.

    Custom Resource Definitions

    Along with the API server ability to warn about deprecated API use, starting in v1.19, a CustomResourceDefinition can indicate a particular version of the resource it defines is deprecated. When API requests to a deprecated version of a custom resource are made, a warning message is returned, matching the behavior of built-in APIs.

    The author of the CustomResourceDefinition can also customize the warning for each version if they want to. This allows them to give a pointer to a migration guide or other information if needed.

    - name:v1alpha1
    # This indicates the v1alpha1 version of the custom resource is deprecated.
    # API requests to this version receive a warning in the server response.
    # This overrides the default warning returned to clients making v1alpha1 API requests.
    deprecationWarning:"example.com/v1alpha1 CronTab is deprecated; use example.com/v1 CronTab (see http://example.com/v1alpha1-v1)"
    - name:v1beta1
    # This indicates the v1beta1 version of the custom resource is deprecated.
    # API requests to this version receive a warning in the server response.
    # A default warning message is returned for this version.
    - name:v1

    Admission Webhooks

    Admission webhooks are the primary way to integrate custom policies or validation with Kubernetes. Starting in v1.19, admission webhooks can return warning messages that are passed along to the requesting API client. Warnings can be returned with allowed or rejected admission responses.

    As an example, to allow a request but warn about a configuration known not to work well, an admission webhook could send this response:

    "apiVersion": "admission.k8s.io/v1",
    "kind": "AdmissionReview",
    "response": {
    "uid": "<value from request.uid>",
    "allowed": true,
    "warnings": [
    ".spec.memory: requests >1GB do not work on Fridays"

    If you are implementing a webhook that returns a warning message, here are some tips:

    • Don't include a "Warning:" prefix in the message (that is added by clients on output)
    • Use warning messages to describe problems the client making the API request should correct or be aware of
    • Be brief; limit warnings to 120 characters if possible

    There are many ways admission webhooks could use this new feature, and I'm looking forward to seeing what people come up with. Here are a couple ideas to get you started:

    • webhook implementations adding a "complain" mode, where they return warnings instead of rejections, to allow trying out a policy to verify it is working as expected before starting to enforce it
    • "lint" or "vet"-style webhooks, inspecting objects and surfacing warnings when best practices are not followed

    Kubectl strict mode

    If you want to be sure you notice deprecations as soon as possible and get a jump start on addressing them, kubectl added a --warnings-as-errors option in v1.19. When invoked with this option, kubectl treats any warnings it receives from the server as errors and exits with a non-zero exit code:

    kubectl applying a manifest file with a --warnings-as-errors flag, displaying a warning message and exiting with a non-zero exit code.

    This could be used in a CI job to apply manifests to a current server, and required to pass with a zero exit code in order for the CI job to succeed.

    Future Possibilities

    Now that we have a way to communicate helpful information to users in context, we're already considering other ways we can use this to improve people's experience with Kubernetes. A couple areas we're looking at next are warning about known problematic values we cannot reject outright for compatibility reasons, and warning about use of deprecated fields or field values (like selectors using beta os/arch node labels, deprecated in v1.14). I'm excited to see progress in this area, continuing to make it easier to use Kubernetes.

    Jordan Liggitt is a software engineer at Google, and helps lead Kubernetes authentication, authorization, and API efforts.

  5. Author: Rob Scott (Google)

    EndpointSlices are an exciting new API that provides a scalable and extensible alternative to the Endpoints API. EndpointSlices track IP addresses, ports, readiness, and topology information for Pods backing a Service.

    In Kubernetes 1.19 this feature is enabled by default with kube-proxy reading from EndpointSlices instead of Endpoints. Although this will mostly be an invisible change, it should result in noticeable scalability improvements in large clusters. It also enables significant new features in future Kubernetes releases like Topology Aware Routing.

    Scalability Limitations of the Endpoints API

    With the Endpoints API, there was only one Endpoints resource for a Service. That meant that it needed to be able to store IP addresses and ports (network endpoints) for every Pod that was backing the corresponding Service. This resulted in huge API resources. To compound this problem, kube-proxy was running on every node and watching for any updates to Endpoints resources. If even a single network endpoint changed in an Endpoints resource, the whole object would have to be sent to each of those instances of kube-proxy.

    A further limitation of the Endpoints API is that it limits the number of network endpoints that can be tracked for a Service. The default size limit for an object stored in etcd is 1.5MB. In some cases that can limit an Endpoints resource to 5,000 Pod IPs. This is not an issue for most users, but it becomes a significant problem for users with Services approaching this size.

    To show just how significant these issues become at scale it helps to have a simple example. Think about a Service which has 5,000 Pods, it might end up with a 1.5MB Endpoints resource. If even a single network endpoint in that list changes, the full Endpoints resource will need to be distributed to each Node in the cluster. This becomes quite an issue in a large cluster with 3,000 Nodes. Each update would involve sending 4.5GB of data (1.5MB Endpoints * 3,000 Nodes) across the cluster. That's nearly enough to fill up a DVD, and it would happen for each Endpoints change. Imagine a rolling update that results in all 5,000 Pods being replaced - that's more than 22TB (or 5,000 DVDs) worth of data transferred.

    Splitting endpoints up with the EndpointSlice API

    The EndpointSlice API was designed to address this issue with an approach similar to sharding. Instead of tracking all Pod IPs for a Service with a single Endpoints resource, we split them into multiple smaller EndpointSlices.

    Consider an example where a Service is backed by 15 pods. We'd end up with a single Endpoints resource that tracked all of them. If EndpointSlices were configured to store 5 endpoints each, we'd end up with 3 different EndpointSlices: EndpointSlices

    By default, EndpointSlices store as many as 100 endpoints each, though this can be configured with the --max-endpoints-per-slice flag on kube-controller-manager.

    EndpointSlices provide 10x scalability improvements

    This API dramatically improves networking scalability. Now when a Pod is added or removed, only 1 small EndpointSlice needs to be updated. This difference becomes quite noticeable when hundreds or thousands of Pods are backing a single Service.

    Potentially more significant, now that all Pod IPs for a Service don't need to be stored in a single resource, we don't have to worry about the size limit for objects stored in etcd. EndpointSlices have already been used to scale Services beyond 100,000 network endpoints.

    All of this is brought together with some significant performance improvements that have been made in kube-proxy. When using EndpointSlices at scale, significantly less data will be transferred for endpoints updates and kube-proxy should be faster to update iptables or ipvs rules. Beyond that, Services can now scale to at least 10 times beyond any previous limitations.

    EndpointSlices enable new functionality

    Introduced as an alpha feature in Kubernetes v1.16, EndpointSlices were built to enable some exciting new functionality in future Kubernetes releases. This could include dual-stack Services, topology aware routing, and endpoint subsetting.

    Dual-Stack Services are an exciting new feature that has been in development alongside EndpointSlices. They will utilize both IPv4 and IPv6 addresses for Services and rely on the addressType field on EndpointSlices to track these addresses by IP family.

    Topology aware routing will update kube-proxy to prefer routing requests within the same zone or region. This makes use of the topology fields stored for each endpoint in an EndpointSlice. As a further refinement of that, we're exploring the potential of endpoint subsetting. This would allow kube-proxy to only watch a subset of EndpointSlices. For example, this might be combined with topology aware routing so that kube-proxy would only need to watch EndpointSlices containing endpoints within the same zone. This would provide another very significant scalability improvement.

    What does this mean for the Endpoints API?

    Although the EndpointSlice API is providing a newer and more scalable alternative to the Endpoints API, the Endpoints API will continue to be considered generally available and stable. The most significant change planned for the Endpoints API will involve beginning to truncate Endpoints that would otherwise run into scalability issues.

    The Endpoints API is not going away, but many new features will rely on the EndpointSlice API. To take advantage of the new scalability and functionality that EndpointSlices provide, applications that currently consume Endpoints will likely want to consider supporting EndpointSlices in the future.