Kubernetes 调度器实现原理( 四 )

  1. Post-bind 是一个通知性质的扩展:
  • Post-bind 扩展在 Pod 成功绑定到节点上之后被动调用 。
  • Post-bind 扩展是绑定过程的最后一个步骤,可以用来执行资源清理的动作 。
  1. Unreserve 是一个通知性质的扩展,如果为 Pod 预留了资源,Pod 又在被绑定过程中被拒绝绑定,则 unreserve 扩展将被调用 。Unreserve 扩展应该释放已经为 Pod 预留的节点上的计算资源 。在一个插件中,reserve 扩展和 unreserve 扩展应该成对出现 。
如果我们要实现自己的插件,必须向调度框架注册插件并完成配置,另外还必须实现扩展点接口,对应的扩展点接口我们可以在源码 pkg/scheduler/framework/interface.go 文件中找到,如下所示:
// Plugin is the parent type for all the scheduling framework plugins.type Plugin interface { Name() string}// PreEnqueuePlugin is an interface that must be implemented by "PreEnqueue" plugins.// These plugins are called prior to adding Pods to activeQ.// Note: an preEnqueue plugin is expected to be lightweight and efficient, so it's not expected to// involve expensive calls like accessing external endpoints; otherwise it'd block other// Pods' enqueuing in event handlers.type PreEnqueuePlugin interface { Plugin // PreEnqueue is called prior to adding Pods to activeQ. PreEnqueue(ctx context.Context, p *v1.Pod) *Status}// LessFunc is the function to sort pod infotype LessFunc func(podInfo1, podInfo2 *QueuedPodInfo) bool// QueueSortPlugin is an interface that must be implemented by "QueueSort" plugins.// These plugins are used to sort pods in the scheduling queue. Only one queue sort// plugin may be enabled at a time.type QueueSortPlugin interface { Plugin // Less are used to sort pods in the scheduling queue. Less(*QueuedPodInfo, *QueuedPodInfo) bool}// EnqueueExtensions is an optional interface that plugins can implement to efficiently// move unschedulable Pods in internal scheduling queues. Plugins// that fail pod scheduling (e.g., Filter plugins) are expected to implement this interface.type EnqueueExtensions interface { // EventsToRegister returns a series of possible events that may cause a Pod // failed by this plugin schedulable. // The events will be registered when instantiating the internal scheduling queue, // and leveraged to build event handlers dynamically. // Note: the returned list needs to be static (not depend on configuration parameters); // otherwise it would lead to undefined behavior. EventsToRegister() []ClusterEvent}// PreFilterExtensions is an interface that is included in plugins that allow specifying// callbacks to make incremental updates to its supposedly pre-calculated// state.type PreFilterExtensions interface { // AddPod is called by the framework while trying to evaluate the impact // of adding podToAdd to the node while scheduling podToSchedule. AddPod(ctx context.Context, state *CycleState, podToSchedule *v1.Pod, podInfoToAdd *PodInfo, nodeInfo *NodeInfo) *Status // RemovePod is called by the framework while trying to evaluate the impact // of removing podToRemove from the node while scheduling podToSchedule. RemovePod(ctx context.Context, state *CycleState, podToSchedule *v1.Pod, podInfoToRemove *PodInfo, nodeInfo *NodeInfo) *Status}// PreFilterPlugin is an interface that must be implemented by "PreFilter" plugins.// These plugins are called at the beginning of the scheduling cycle.type PreFilterPlugin interface { Plugin // PreFilter is called at the beginning of the scheduling cycle. All PreFilter // plugins must return success or the pod will be rejected. PreFilter could optionally // return a PreFilterResult to influence which nodes to evaluate downstream. This is useful // for cases where it is possible to determine the subset of nodes to process in O(1) time. PreFilter(ctx context.Context, state *CycleState, p *v1.Pod) (*PreFilterResult, *Status) // PreFilterExtensions returns a PreFilterExtensions interface if the plugin implements one, // or nil if it does not. A Pre-filter plugin can provide extensions to incrementally // modify its pre-processed info. The framework guarantees that the extensions // AddPod/RemovePod will only be called after PreFilter, possibly on a cloned // CycleState, and may call those functions more than once before calling // Filter again on a specific node. PreFilterExtensions() PreFilterExtensions}// FilterPlugin is an interface for Filter plugins. These plugins are called at the// filter extension point for filtering out hosts that cannot run a pod.// This concept used to be called 'predicate' in the original scheduler.// These plugins should return "Success", "Unschedulable" or "Error" in Status.code.// However, the scheduler accepts other valid codes as well.// Anything other than "Success" will lead to exclusion of the given host from// running the pod.type FilterPlugin interface { Plugin // Filter is called by the scheduling framework. // All FilterPlugins should return "Success" to declare that // the given node fits the pod. If Filter doesn't return "Success", // it will return "Unschedulable", "UnschedulableAndUnresolvable" or "Error". // For the node being evaluated, Filter plugins should look at the passed // nodeInfo reference for this particular node's information (e.g., pods // considered to be running on the node) instead of looking it up in the // NodeInfoSnapshot because we don't guarantee that they will be the same. // For example, during preemption, we may pass a copy of the original // nodeInfo object that has some pods removed from it to evaluate the // possibility of preempting them to schedule the target pod. Filter(ctx context.Context, state *CycleState, pod *v1.Pod, nodeInfo *NodeInfo) *Status}// PostFilterPlugin is an interface for "PostFilter" plugins. These plugins are called// after a pod cannot be scheduled.type PostFilterPlugin interface { Plugin // PostFilter is called by the scheduling framework. // A PostFilter plugin should return one of the following statuses: // - Unschedulable: the plugin gets executed successfully but the pod cannot be made schedulable. // - Success: the plugin gets executed successfully and the pod can be made schedulable. // - Error: the plugin aborts due to some internal error. // // Informational plugins should be configured ahead of other ones, and always return Unschedulable status. // Optionally, a non-nil PostFilterResult may be returned along with a Success status. For example, // a preemption plugin may choose to return nominatedNodeName, so that framework can reuse that to update the // preemptor pod's .spec.status.nominatedNodeName field. PostFilter(ctx context.Context, state *CycleState, pod *v1.Pod, filteredNodeStatusMap NodeToStatusMap) (*PostFilterResult, *Status)}// PreScorePlugin is an interface for "PreScore" plugin. PreScore is an// informational extension point. Plugins will be called with a list of nodes// that passed the filtering phase. A plugin may use this data to update internal// state or to generate logs/metrics.type PreScorePlugin interface { Plugin // PreScore is called by the scheduling framework after a list of nodes // passed the filtering phase. All prescore plugins must return success or // the pod will be rejected PreScore(ctx context.Context, state *CycleState, pod *v1.Pod, nodes []*v1.Node) *Status}// ScoreExtensions is an interface for Score extended functionality.type ScoreExtensions interface { // NormalizeScore is called for all node scores produced by the same plugin's "Score" // method. A successful run of NormalizeScore will update the scores list and return // a success status. NormalizeScore(ctx context.Context, state *CycleState, p *v1.Pod, scores NodeScoreList) *Status}// ScorePlugin is an interface that must be implemented by "Score" plugins to rank// nodes that passed the filtering phase.type ScorePlugin interface { Plugin // Score is called on each filtered node. It must return success and an integer // indicating the rank of the node. All scoring plugins must return success or // the pod will be rejected. Score(ctx context.Context, state *CycleState, p *v1.Pod, nodeName string) (int64, *Status) // ScoreExtensions returns a ScoreExtensions interface if it implements one, or nil if does not. ScoreExtensions() ScoreExtensions}// ReservePlugin is an interface for plugins with Reserve and Unreserve// methods. These are meant to update the state of the plugin. This concept// used to be called 'assume' in the original scheduler. These plugins should// return only Success or Error in Status.code. However, the scheduler accepts// other valid codes as well. Anything other than Success will lead to// rejection of the pod.type ReservePlugin interface { Plugin // Reserve is called by the scheduling framework when the scheduler cache is // updated. If this method returns a failed Status, the scheduler will call // the Unreserve method for all enabled ReservePlugins. Reserve(ctx context.Context, state *CycleState, p *v1.Pod, nodeName string) *Status // Unreserve is called by the scheduling framework when a reserved pod was // rejected, an error occurred during reservation of subsequent plugins, or // in a later phase. The Unreserve method implementation must be idempotent // and may be called by the scheduler even if the corresponding Reserve // method for the same plugin was not called. Unreserve(ctx context.Context, state *CycleState, p *v1.Pod, nodeName string)}// PreBindPlugin is an interface that must be implemented by "PreBind" plugins.// These plugins are called before a pod being scheduled.type PreBindPlugin interface { Plugin // PreBind is called before binding a pod. All prebind plugins must return // success or the pod will be rejected and won't be sent for binding. PreBind(ctx context.Context, state *CycleState, p *v1.Pod, nodeName string) *Status}// PostBindPlugin is an interface that must be implemented by "PostBind" plugins.// These plugins are called after a pod is successfully bound to a node.type PostBindPlugin interface { Plugin // PostBind is called after a pod is successfully bound. These plugins are // informational. A common application of this extension point is for cleaning // up. If a plugin needs to clean-up its state after a pod is scheduled and // bound, PostBind is the extension point that it should register. PostBind(ctx context.Context, state *CycleState, p *v1.Pod, nodeName string)}// PermitPlugin is an interface that must be implemented by "Permit" plugins.// These plugins are called before a pod is bound to a node.type PermitPlugin interface { Plugin // Permit is called before binding a pod (and before prebind plugins). Permit // plugins are used to prevent or delay the binding of a Pod. A permit plugin // must return success or wait with timeout duration, or the pod will be rejected. // The pod will also be rejected if the wait timeout or the pod is rejected while // waiting. Note that if the plugin returns "wait", the framework will wait only // after running the remaining plugins given that no other plugin rejects the pod. Permit(ctx context.Context, state *CycleState, p *v1.Pod, nodeName string) (*Status, time.Duration)}// BindPlugin is an interface that must be implemented by "Bind" plugins. Bind// plugins are used to bind a pod to a Node.type BindPlugin interface { Plugin // Bind plugins will not be called until all pre-bind plugins have completed. Each // bind plugin is called in the configured order. A bind plugin may choose whether // or not to handle the given Pod. If a bind plugin chooses to handle a Pod, the // remaining bind plugins are skipped. When a bind plugin does not handle a pod, // it must return Skip in its Status code. If a bind plugin returns an Error, the // pod is rejected and will not be bound. Bind(ctx context.Context, state *CycleState, p *v1.Pod, nodeName string) *Status}


推荐阅读