K8s中明明配置了HPA，但是没扩容-运维开发故事

!! 大家好，我是乔克，一个爱折腾的运维工程，一个睡觉都被自己丑醒的云原生爱好者。

作者：乔克
公众号：运维开发故事
博客：https://jokerbai.com

✍ 道路千万条，安全第一条。操作不规范，运维两行泪。

Kubernetes 的 Horizontal Pod Autoscaler（HPA）是一种根据观察到的 CPU 利用率或其他自定义指标自动扩展 Pod 副本数的控制器。它在业务繁忙的时候可以有效的对 Pod 进行横线扩展，但是最近发现明明使用率已经超过了定义的目标值，但是为何没有扩容呢？

9906771bea31d64adb2a89a2f2b88207 MD5

为了搞清楚原由，我们从源码层面来找找原因。

一、HPA 的整体架构与核心组件

HPA 的实现位于 Kubernetes 的 k8s.io/kubernetes/pkg/controller/podautoscaler 目录下，主要由以下几个组件构成：

HorizontalController：主控制器，负责监听 HPA 和 Pod 资源，协调扩缩容。
ReplicaCalculator：计算目标副本数的核心逻辑。
MetricsClient：获取指标数据（如 CPU、内存、自定义指标）。
ScaleClient：用于修改工作负载（如 Deployment、ReplicaSet）的副本数。

二、源码入口：HPA 控制器的启动

HPA 控制器在cmd/kube-controller-manager 启动时被初始化。

在 cmd/kube-controller-manager/controllermanager.go 中的 Run() 调用 NewControllerDescriptors() 中将控制器注册。

func NewControllerDescriptors() map[string]*ControllerDescriptor {
 ...
 register(newHorizontalPodAutoscalerControllerDescriptor())
 ...
}

然后在 cmd/kube-controller-manager/autoscaling.go 里面最终通过 startHPAControllerWithMetricsClient() 来启动。

func newHorizontalPodAutoscalerControllerDescriptor() *ControllerDescriptor {
 return &ControllerDescriptor{
  name:     names.HorizontalPodAutoscalerController,
  aliases:  []string{"horizontalpodautoscaling"},
  initFunc: startHorizontalPodAutoscalerControllerWithRESTClient,
 }
}

func startHorizontalPodAutoscalerControllerWithRESTClient(ctx context.Context, controllerContext ControllerContext, controllerName string) (controller.Interface, bool, error) {

 ...
 return startHPAControllerWithMetricsClient(ctx, controllerContext, metricsClient)
}

func startHPAControllerWithMetricsClient(ctx context.Context, controllerContext ControllerContext, metricsClient metrics.MetricsClient) (controller.Interface, bool, error) {

 ...

 go podautoscaler.NewHorizontalController(
  ctx,
  hpaClient.CoreV1(),
  scaleClient,
  hpaClient.AutoscalingV2(),
  controllerContext.RESTMapper,
  metricsClient,
  controllerContext.InformerFactory.Autoscaling().V2().HorizontalPodAutoscalers(),
  controllerContext.InformerFactory.Core().V1().Pods(),
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerSyncPeriod.Duration,
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerDownscaleStabilizationWindow.Duration,
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerTolerance,
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerCPUInitializationPeriod.Duration,
  controllerContext.ComponentConfig.HPAController.HorizontalPodAutoscalerInitialReadinessDelay.Duration,
 ).Run(ctx, int(controllerContext.ComponentConfig.HPAController.ConcurrentHorizontalPodAutoscalerSyncs))
 return nil, true, nil
}

三、控制器核心逻辑

控制器的核心实现逻辑的代码位于 k8s.io/kubernetes/pkg/controller/podautoscaler 中，其调用链路为：

Run() -> worker() -> processNextWorkItem() -> reconcileKey() -> reconcileAutoscaler()

其中主要的逻辑在 reconcileAutoscaler 中实现。

（1）使用 a.monitor.ObserveReconciliationResult(actionLabel, errorLabel, time.Since(start)) 记录协调过程中的监控指标。（2）使用 hpaShared.DeepCopy() 和 hpa.Status.DeepCopy() 对 hpa 和 hpaStaus 对象进行深度拷贝，避免修改共享缓存。（3）然后对资源进行解析并实现资源映射。

// API版本解析
targetGV, err := schema.ParseGroupVersion(hpa.Spec.ScaleTargetRef.APIVersion)

// REST映射获取
mappings, err := a.mapper.RESTMappings(targetGK)

// 获取Scale子资源
scale, targetGR, err := a.scaleForResourceMappings(ctx, hpa.Namespace, hpa.Spec.ScaleTargetRef.Name, mappings)

其中：

schema.ParseGroupVersion : 解析目标资源的API版本
a.mapper.RESTMappings : 获取资源的REST映射信息
a.scaleForResourceMappings : 获取目标资源的Scale子资源

（4）对指标进行核心计算获取期望副本

// 计算基于指标的期望副本数
metricDesiredReplicas, metricName, metricStatuses, metricTimestamp, err = a.computeReplicasForMetrics(ctx, hpa, scale, hpa.Spec.Metrics)

（5）根据是否配置了 Behavior 选择不通的标准化策略

// 根据是否配置了Behavior选择不同的标准化策略
if hpa.Spec.Behavior == nil {
    desiredReplicas = a.normalizeDesiredReplicas(hpa, key, currentReplicas, desiredReplicas, minReplicas)
} else {
    desiredReplicas = a.normalizeDesiredReplicasWithBehaviors(hpa, key, currentReplicas, desiredReplicas, minReplicas)
}

（6）对于满足扩缩容要求的进行扩缩容操作

// 重试机制更新Scale子资源
err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
    scale.Spec.Replicas = desiredReplicas
    _, updateErr := a.scaleNamespacer.Scales(hpa.Namespace).Update(ctx, targetGR, scale, metav1.UpdateOptions{})
    // ... 冲突处理逻辑
})

这里使用 retry.RetryOnConflict 处理并发冲突的重试机制。实际上对目标资源的更新操作是调用 a.scaleNamespacer.Scales().Update 实现。

（7）最后更新状态和事件记录

// 设置HPA状态条件
setCondition(hpa, autoscalingv2.AbleToScale, v1.ConditionTrue, "SucceededRescale", "...")

// 记录事件
a.eventRecorder.Eventf(hpa, v1.EventTypeNormal, "SuccessfulRescale", "New size: %d; reason: %s", desiredReplicas, rescaleReason)

// 存储扩缩容事件（用于Behavior计算）
a.storeScaleEvent(hpa.Spec.Behavior, key, currentReplicas, desiredReplicas)

// 更新HPA状态
a.setStatus(hpa, currentReplicas, desiredReplicas, metricStatuses, rescale)
err = a.updateStatusIfNeeded(ctx, hpaStatusOriginal, hpa)

以上就是 reconcileAutoscaler 这个方法中的主要流程。其中最核心的地方在于副本数计算，它是在 computeReplicasForMetrics 中实现。

四、核心算法

现在我们对 computeReplicasForMetrics 方法进行解析，看看具体是怎么实现的。

（1）进行前置验证和初始化

// 解析 HPA 选择器，确保能够正确识别目标 Pod
selector, err := a.validateAndParseSelector(hpa, scale.Status.Selector)
if err != nil {
    return -1, "", nil, time.Time{}, err
}

// 获取目标资源的副本数信息
specReplicas := scale.Spec.Replicas      // 期望副本数
statusReplicas := scale.Status.Replicas  // 当前副本数

// 初始化指标状态列表
statuses = make([]autoscalingv2.MetricStatus, len(metricSpecs))

// 无效指标
invalidMetricsCount := 0
var invalidMetricError error
var invalidMetricCondition autoscalingv2.HorizontalPodAutoscalerCondition

（2）对指标进行循环计算

for i, metricSpec := range metricSpecs {
    // 为每个指标计算建议副本数
    replicaCountProposal, metricNameProposal, timestampProposal, condition, err := a.computeReplicasForMetric(ctx, hpa, metricSpec, specReplicas, statusReplicas, selector, &statuses[i])
    
    if err != nil {
        // 记录第一个无效指标的错误信息
        if invalidMetricsCount <= 0 {
            invalidMetricCondition = condition
            invalidMetricError = err
        }
        invalidMetricsCount++
        continue
    }
    
    // 采用最大值策略选择副本数
    if replicas == 0 || replicaCountProposal > replicas {
        timestamp = timestampProposal
        replicas = replicaCountProposal
        metric = metricNameProposal
    }
}

这里调用 replicaCountProposal, metricNameProposal, timestampProposal, condition, err := a.computeReplicasForMetric(ctx, hpa, metricSpec, specReplicas, statusReplicas, selector, &statuses[i]) 对每个指标进行计算。

在 computeReplicasForMetric 会根据不通的指标类型进行计算。

switch spec.Type {
case autoscalingv2.ObjectMetricSourceType:
    // 对象指标处理
case autoscalingv2.PodsMetricSourceType:
    // Pod指标处理
case autoscalingv2.ResourceMetricSourceType:
    // 资源指标处理
case autoscalingv2.ContainerResourceMetricSourceType:
    // 容器资源指标处理
case autoscalingv2.ExternalMetricSourceType:
    // 外部指标处理
default:
    // 未知指标类型错误处理
}

这里我们只拿对象指标 autoscalingv2.ObjectMetricSourceType 进行说明。如果类型是对象指标，则会调用 a.computeStatusForObjectMetric 来进行计算。

在 computeStatusForObjectMetric 中会先初始化指标状态，用于记录指标的当前状态。

// 初始化指标状态，用于记录指标的当前状态
metricStatus := autoscalingv2.MetricStatus{
    Type: autoscalingv2.ObjectMetricSourceType,
    Object: &autoscalingv2.ObjectMetricStatus{
        DescribedObject: metricSpec.Object.DescribedObject,
        Metric: autoscalingv2.MetricIdentifier{
            Name:     metricSpec.Object.Metric.Name,
            Selector: metricSpec.Object.Metric.Selector,
        },
    },
}

然后调用 a.tolerancesForHpa(hpa) 获取当前对象的容忍度，在 tolerancesForHpa 中的代码实现如下：

func (a *HorizontalController) tolerancesForHpa(hpa *autoscalingv2.HorizontalPodAutoscaler) Tolerances {
    // 初始化默认容忍度
 t := Tolerances{a.tolerance, a.tolerance}
 // 查看特性门控是否开启
 behavior := hpa.Spec.Behavior
 allowConfigurableTolerances := utilfeature.DefaultFeatureGate.Enabled(features.HPAConfigurableTolerance)
 // 如果特性门控未启用，则直接返回默认值
 if behavior == nil || !allowConfigurableTolerances {
  return t
 }
 // 如果自定义了容忍度，则返回自定义的容忍度
 if behavior.ScaleDown != nil && behavior.ScaleDown.Tolerance != nil {
  t.scaleDown = behavior.ScaleDown.Tolerance.AsApproximateFloat64()
 }
 if behavior.ScaleUp != nil && behavior.ScaleUp.Tolerance != nil {
  t.scaleUp = behavior.ScaleUp.Tolerance.AsApproximateFloat64()
 }
 return t
}

默认容忍度在 pkg\controller\podautoscaler\config\v1alpha1\defaults.go 中定义，默认是 0.1，也就是 10% 的容忍度。

if obj.HorizontalPodAutoscalerTolerance == 0 {
    obj.HorizontalPodAutoscalerTolerance = 0.1
}

获取到容忍度之后，会分别就 绝对值目标 和 平均值目标 进行计算。

if metricSpec.Object.Target.Type == autoscalingv2.ValueMetricType && metricSpec.Object.Target.Value != nil {
// 计算绝对值目标的副本数
replicaCountProposal, usageProposal, timestampProposal, err := a.replicaCalc.GetObjectMetricReplicas()
...
} else if metricSpec.Object.Target.Type == autoscalingv2.AverageValueMetricType && metricSpec.Object.Target.AverageValue != nil {
// 计算平均值目标的副本数
replicaCountProposal, usageProposal, timestampProposal, err := a.replicaCalc.GetObjectPerPodMetricReplicas()
...
}

在计算 绝对值 目标的副本数中，使用 usageRatio := float64(usage) / float64(targetUsage) 来计算使用率，然后通过replicaCountFloat := usageRatio * float64(readyPodCount) 获取期望的副本数，如果副本数不是整数，则会向上取整。

// GetObjectMetricReplicas

func (c *ReplicaCalculator) GetObjectMetricReplicas(currentReplicas int32, targetUsage int64, metricName string, tolerances Tolerances, namespace string, objectRef *autoscaling.CrossVersionObjectReference, selector labels.Selector, metricSelector labels.Selector) (replicaCount int32, usage int64, timestamp time.Time, err error) {
 // 获取当前的指标值
 usage, _, err = c.metricsClient.GetObjectMetric(metricName, namespace, objectRef, metricSelector)
 if err != nil {
  return 0, 0, time.Time{}, fmt.Errorf("unable to get metric %s: %v on %s %s/%s", metricName, objectRef.Kind, namespace, objectRef.Name, err)
 }
 // 计算使用率
 usageRatio := float64(usage) / float64(targetUsage)
 // 计算期望的副本数
 replicaCount, timestamp, err = c.getUsageRatioReplicaCount(currentReplicas, usageRatio, tolerances, namespace, selector)
 return replicaCount, usage, timestamp, err
}

func (c *ReplicaCalculator) getUsageRatioReplicaCount(currentReplicas int32, usageRatio float64, tolerances Tolerances, namespace string, selector labels.Selector) (replicaCount int32, timestamp time.Time, err error) {
 // 当当前副本数不为0的时候
 if currentReplicas != 0 {
  // 检查使用率比例是否在容忍度中，如果在容忍度内，直接返回当前副本数
  if tolerances.isWithin(usageRatio) {
   return currentReplicas, timestamp, nil
  }
  // 获取就绪的Pod
  readyPodCount := int64(0)
  readyPodCount, err = c.getReadyPodsCount(namespace, selector)
  if err != nil {
   return 0, time.Time{}, fmt.Errorf("unable to calculate ready pods: %s", err)
  }
  // 计算期望的副本数
  replicaCountFloat := usageRatio * float64(readyPodCount)
  // 检查副本数是否超过最大int32值
  if replicaCountFloat > math.MaxInt32 {
   replicaCount = math.MaxInt32
  } else {
   // 向上取整
   replicaCount = int32(math.Ceil(replicaCountFloat))
  }
 } else {
  // 当当前副本数为0的时候，直接使用使用率计算，向上取整
  replicaCount = int32(math.Ceil(usageRatio))
 }

 return replicaCount, timestamp, err
}

在处理 平均值 目标的副本数中，是采用 usageRatio := float64(usage) / (float64(targetAverageUsage) * float64(replicaCount)) 来计算使用率，也就是 使用率 = 实际指标值 / (目标平均值 × 当前副本数)。当使用率超出容忍范围，则采用 math.Ceil(实际指标值 / 目标平均值) 重新计算副本数，否则副本数不变。

func (c *ReplicaCalculator) GetObjectPerPodMetricReplicas(statusReplicas int32, targetAverageUsage int64, metricName string, tolerances Tolerances, namespace string, objectRef *autoscaling.CrossVersionObjectReference, metricSelector labels.Selector) (replicaCount int32, usage int64, timestamp time.Time, err error) {
 // 获取当前的指标值
 usage, timestamp, err = c.metricsClient.GetObjectMetric(metricName, namespace, objectRef, metricSelector)
 if err != nil {
  return 0, 0, time.Time{}, fmt.Errorf("unable to get metric %s: %v on %s %s/%s", metricName, objectRef.Kind, namespace, objectRef.Name, err)
 }

 // 初始化副本数为当前副本数
 replicaCount = statusReplicas
 // 计算使用率
 usageRatio := float64(usage) / (float64(targetAverageUsage) * float64(replicaCount))
 if !tolerances.isWithin(usageRatio) {
  // 重新计算副本数
  replicaCount = int32(math.Ceil(float64(usage) / float64(targetAverageUsage)))
 }
 // 计算平均使用量
 usage = int64(math.Ceil(float64(usage) / float64(statusReplicas)))
 return replicaCount, usage, timestamp, nil
}

（3）如果指标无效则返回错误，否则返回期望副本数

// 如果所有指标都无效或部分指标无效且会导致缩容，则返回错误
if invalidMetricsCount >= len(metricSpecs) || (invalidMetricsCount > 0 && replicas < specReplicas) {
    setCondition(hpa, invalidMetricCondition.Type, invalidMetricCondition.Status, 
        invalidMetricCondition.Reason, "%s", invalidMetricCondition.Message)
    return -1, "", statuses, time.Time{}, invalidMetricError
}

// 设置伸缩活跃状态
setCondition(hpa, autoscalingv2.ScalingActive, v1.ConditionTrue, "ValidMetricFound", 
    "the HPA was able to successfully calculate a replica count from %s", metric)

// 返回期望副本数
return replicas, metric, statuses, timestamp, invalidMetricError

这里的 容忍度 可以解释为何指标达到了87%，但是未触发扩容。

在上面我们介绍了默认的容忍度是 0.1 ，也就是 10%，也就是当前使用率在目标值的 ±10% 范围内，不会触发扩缩容。我们可以使用容忍度的比较方法 (1.0-t.scaleDown) <= usageRatio && usageRatio <= (1.0+t.scaleUp) 来进行计算。

// 使用率
usageRatio = 实际值 / 目标值 = 87% / 80% = 1.0875
// 默认容忍度为 0.1，则容忍度范围为 [0.9,1.1]
// 0.9 ≤ 1.0875 ≤ 1.1
// 所以目标值在容忍度范围内，不会触发扩容

五、约束机制

HPA 的扩缩容也不是无限制的，为了避免频繁的扩缩容，除了容忍度之外，还增加了许多约束条件。

其主要在 a.normalizeDesiredReplicas 或 a.normalizeDesiredReplicasWithBehaviors 中进行实现。这两个实现的区别在于：

normalizeDesiredReplicas是基础的标准化处理，而 normalizeDesiredReplicasWithBehaviors是高级的行为策略处理
要使用 normalizeDesiredReplicasWithBehaviors，则需要配置 hpa.Spec.Behavior，比如：

behavior:
  scaleUp:
    policies:
    - type: Percent
      value: 100
      periodSeconds: 15
  scaleDown:
    policies:
    - type: Percent
      value: 100
      periodSeconds: 300

下面，我们在 normalizeDesiredReplicas 中进行说明，源代码如下：

func (a *HorizontalController) normalizeDesiredReplicas(hpa *autoscalingv2.HorizontalPodAutoscaler, key string, currentReplicas int32, prenormalizedDesiredReplicas int32, minReplicas int32) int32 {
 // 对推荐副本数进行稳定性处理
 stabilizedRecommendation := a.stabilizeRecommendation(key, prenormalizedDesiredReplicas)
 // 如果稳定化推荐值和原始值不同，则发生了稳定化变化，需要设置相应的状态条件来反映当前的扩缩容能力
 if stabilizedRecommendation != prenormalizedDesiredReplicas {
  setCondition(hpa, autoscalingv2.AbleToScale, v1.ConditionTrue, "ScaleDownStabilized", "recent recommendations were higher than current one, applying the highest recent recommendation")
 } else {
  setCondition(hpa, autoscalingv2.AbleToScale, v1.ConditionTrue, "ReadyForNewScale", "recommended size matches current size")
 }

 // 应用规则，确保最终副本数在 [minReplicas, maxReplicas] 范围内
 desiredReplicas, reason, message := convertDesiredReplicasWithRules(currentReplicas, stabilizedRecommendation, minReplicas, hpa.Spec.MaxReplicas)

 // 如果最终副本数与稳定化推荐值相同，说明没有受到限制
 // 如果不同，说明受到了某种限制（如最小/最大副本数限制、扩容速率限制等）
 if desiredReplicas == stabilizedRecommendation {
  setCondition(hpa, autoscalingv2.ScalingLimited, v1.ConditionFalse, reason, "%s", message)
 } else {
  setCondition(hpa, autoscalingv2.ScalingLimited, v1.ConditionTrue, reason, "%s", message)
 }

 return desiredReplicas
}

在 convertDesiredReplicasWithRules 中通过 calculateScaleUpLimit 来计算扩容限制。

func calculateScaleUpLimit(currentReplicas int32) int32 {
    return int32(math.Max(scaleUpLimitFactor*float64(currentReplicas), scaleUpLimitMinimum))
}

其中：