云原生实践：Kubernetes 部署与监控

云原生技术已成为现代应用部署的标准方案。本文分享在生产环境中使用 Kubernetes 和监控体系的实践经验。

Kubernetes 部署实践

基础部署配置

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-gateway
  labels:
    app: ai-gateway
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-gateway
  template:
    metadata:
      labels:
        app: ai-gateway
    spec:
      containers:
      - name: ai-gateway
        image: ai-gateway:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10

HPA 自动伸缩

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-gateway-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-gateway
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

监控体系

Prometheus + Grafana

使用 kube-prometheus-stack 快速部署监控：

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

核心监控指标

层级	指标	告警阈值
节点	CPU、内存、磁盘	CPU > 80%
Pod	重启次数、OOM	重启 > 3 次/小时
应用	QPS、延迟、错误率	P99 > 1s

自定义指标

// Go 应用暴露自定义 Prometheus 指标
var (
    requestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request duration in seconds",
            Buckets: []float64{0.1, 0.25, 0.5, 1, 2.5},
        },
        []string{"method", "path", "status"},
    )
)

日志管理

采用 EFK (Elasticsearch + Fluentd + Kibana) 方案收集和分析日志，结构化日志输出对排查问题至关重要。

总结

云原生不仅是技术选型，更是一种运维理念。通过 Kubernetes 实现声明式部署，通过 Prometheus 实现可观测性，可以大幅提升系统的可靠性和运维效率。