谷歌中国开发者社区 (GDG)
  • 主页
  • 博客
    • Android
    • Design
    • GoogleCloud
    • GoogleMaps
    • GooglePlay
    • Web
  • 社区
    • 各地社区
    • 社区历史
    • GDG介绍
    • 社区通知
  • 视频
  • 资源
    • 资源汇总
    • 精选视频
    • 优酷频道

Best practices for building Kubernetes Operators and stateful apps

2018-10-20adminGoogleCloudNo comments

Source: Best practices for building Kubernetes Operators and stateful apps from Google Cloud

Recently, the Kubernetes community has started to add support for running large stateful applications such as databases, analytics and machine learning. For example, you can use the StatefulSet workload controller to maintain identity for each of the pods, and to use Persistent Volumes to persist data so it can survive a service restart. If your workload depends on local storage, you can use PersistentVolumes with Local SSDs, and you can also useSSD persistent disk as boot disk for improved performance for different kinds of workloads.

However, for many advanced use cases such as backup, restore, and high availability, these core Kubernetes primitives may not be sufficient. That’s where Kubernetes Operators come in. They provide a way to extend Kubernetes functionality with application specific logic using custom resources and custom controllers. With the Operator pattern, you can encode domain knowledge of specific applications into an Kubernetes API extension. Using this, you can create, access and manage applications with kubectl, just as you do for built-in resources such as Pods.

At Google Cloud, we’ve used Operators to better support different applications on Kubernetes. For example, we have Operators for running and managing Spark and Airflow applications in a Kubernetes native way. We’ve also made these Operators available on the GCP Marketplace for an easy click-to-deploy experience. The Spark Operator automatically runs spark-submit on behalf of users, provides cron support for running Spark jobs on a schedule, supports automatic application restarts and re-tries and enables mounting data from local Hadoop configuration as well as Google Cloud Storage. The Airflow Operator creates and manages the necessary Kubernetes resources for an Airflow deployment and supports the creation of Airflow schedulers with different Executors.

As developers, we learned a lot building these Operators. If you’re writing your own operator to manage a Kubernetes application, here are some best practices we recommend.

1. Develop one Operator per application

An Operator can automate various features of an application, but it should be specific to a single application. For example, Airflow is normally used with MySQL and Redis. You should develop an operator for each application (i.e., three operators) rather than a single operator that covers all three of them. This provides better separation of concerns with respect to domain expertise of each application.

2. Use an SDK like Kubebuilder

Kubebuilder is a comprehensive development kit for building and publishing Kubernetes APIs and Controllers using CRDs. With Kubebuilder, you can write Operators in an easy way without having to learn about all the low level details of how Kubernetes libraries are implemented. To learn more, check out the Kubebuilder book.

3. Use declarative APIs

Design declarative APIs for operators, not imperative APIs. This aligns well with Kubernetes APIs that are declarative in nature. With declarative APIs, users only need to express their desired cluster state, while letting the operator perform all necessary steps to achieve it. With imperative APIs, in contrast, users must specify clearly and in order what steps to perform to achieve the desired state.

4. Compartmentalize features via multiple controllers

An application may have different features such as scaling, backup, restore, and monitoring. An operator should be made up of multiple controllers that specifically handle each of the those features. For example, the operator can have a main controller to spawn and manage application instances, a backup controller to handle backup operations, and a restore controller to handle restore operations. This simplifies the development process via better abstraction and simpler sync loops. Note that each controller should correspond to a specific CRD so that the domain of each controller’s responsibility is clear.

5. Use asynchronous sync loops

If an operator detects an error (e.g., failed pod creation) when reconciling the current cluster state to the desired state, it should immediately terminate the current sync call and return the error. The work queue should then schedule a resync at a later time; the sync call should not block the application by continuing to poll the cluster state until the error is resolved. Similarly, controllers that initiate and monitor long-running operations should not synchronously wait for the operations. Instead, the controllers should go back to sleep and check again later.

Monitoring and logging your applications  

Once you have written your own operator, you will need to enable logging and monitoring for your applications. This can be complicated to newcomers. Below are some best practices you can follow.

1. Perform application-, node- and cluster-level log aggregation

Kubernetes clusters can get big, especially ones with stateful applications. If you keep a log for every container, you will likely end up with unmanageable amount of logs. To remedy this, you can aggregate your logs. You can perform application-level logging by aggregating container logs and filtering out log messages that meet certain severity and verbosity logging levels. Application-level aggregation requires the ability to tell which application a log belongs to. For this, you may need to integrate application-specific details to the log messages such as adding a prefix for the application name.

Similarly, for node-level and cluster-level logging, you can aggregate all application-level logs within a node or a cluster. Kubernetes doesn’t support this natively, so you may have to use external logging tools such as Google Stackdriver, Elasticsearch, Fluentd, or Kibana to perform the aggregations.

2. Properly label your metrics for easier view, aggregation, and analysis

We recommend adding labels to metrics to facilitate aggregation and analysis by monitoring systems. For example, if you are using Prometheus to analyze your Prometheus-style metrics, the added labels help the system a lot in querying and aggregating the metrics.

3. Expose application metrics via pod endpoints for scraping purposes

Instead of writing application metrics to logs, files, or other storage mediums, a more viable option is for application pods to expose a metrics HTTP endpoint for monitoring tools to scrape. This provides better discoverability, uniformity and integration with metric analysis tools such as Google Stackdriver. A good way to achieve this is to use open-source application-specific exporters for exposing Prometheus-style metrics.

There’s more work to be done in making running stateful applications on Kubernetes as easy as it is in a virtual machine, but with the ability to write custom controllers with Kubernetes Operators, we’ve come a long way.

For more insights around the developer experience on Kubernetes and Google Kubernetes Engine (GKE) check out these recent posts: for developers with small environments see how we made it easier and more affordable to get started, for those looking to learn from developers directly see our curated list of must watch talkscovering a variety of important topics. Over the next couple weeks we’ll publish more around the Kubernetes developer experience so watch for our series and follow us on @GCPcloud for the latest.

除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。

Tags: Cloud

Related Articles

Announcing new partners to help you bring Grab and Go to your organization

2018-11-09admin

Admin Insider: What's new in Chrome Enterprise, Release 72

2019-01-31admin

Build it like you MEAN it with MongoDB Atlas on GCP

2018-10-03admin

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Recent Posts

  • Cloud Services Platform—bringing hybrid cloud to you
  • Setting a course to the future of cloud computing
  • Analyze this—expanding the power of your API data with new Apigee analytics features
  • Hello, .dev!
  • Google announces intent to acquire Alooma to simplify cloud migration

Recent Comments

  • Chen Zhixiang on Concurrent marking in V8
  • admin on 使用 Android Jetpack 加快应用开发速度
  • 怪盗kidou on 使用 Android Jetpack 加快应用开发速度
  • 鸿维 on Google 帐号登录 API 更新
  • admin on 推出 CVPR 2018 学习图像压缩挑战赛

Archives

  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • January 1970

Categories

  • Android
  • Design
  • Firebase
  • GoogleCloud
  • GoogleDevFeeds
  • GoogleMaps
  • GooglePlay
  • Google动态
  • iOS
  • Uncategorized
  • VR
  • Web
  • WebMaster
  • 社区
  • 通知

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

最新文章

  • Cloud Services Platform—bringing hybrid cloud to you
  • Setting a course to the future of cloud computing
  • Analyze this—expanding the power of your API data with new Apigee analytics features
  • Hello, .dev!
  • Google announces intent to acquire Alooma to simplify cloud migration
  • Google announces intent to acquire Alooma to simplify cloud migration
  • New UI tools and a richer creative canvas come to ARCore
  • Introducing PlaNet: A Deep Planning Network for Reinforcement Learning
  • AI in depth: monitoring home appliances from power readings with ML
  • AI in depth: monitoring home appliances from power readings with ML

最多查看

  • 谷歌招聘软件工程师 (21,032)
  • Google 推出的 31 套在线课程 (20,122)
  • 如何选择 compileSdkVersion, minSdkVersion 和 targetSdkVersion (18,720)
  • Seti UI 主题: 让你编辑器焕然一新 (12,686)
  • Android Studio 2.0 稳定版 (8,963)
  • Android N 最初预览版:开发者 API 和工具 (7,934)
  • 像 Sublime Text 一样使用 Chrome DevTools (5,949)
  • Google I/O 2016: Android 演讲视频汇总 (5,519)
  • 用 Google Cloud 打造你的私有免费 Git 仓库 (5,503)
  • 面向普通开发者的机器学习应用方案 (5,200)
  • 生还是死?Android 进程优先级详解 (4,971)
  • 面向 Web 开发者的 Sublime Text 插件 (4,140)
  • 适配 Android N 多窗口特性的 5 个要诀 (4,103)
  • 参加 Google I/O Extended,观看 I/O 直播,线下聚会! (3,476)
© 2018 中国谷歌开发者社区 - ChinaGDG