ML CI/CDパイプライン実装

田中VPoE「デプロイ戦略を理解したところで、次は実際のCI/CDパイプラインをGitHub Actionsで構築しよう。MLのCI/CDは3つのトリガーを考慮する必要がある。」

あなた「コード変更、データ変更、モデル変更の3つですね。それぞれで実行するテストが異なるんですよね。」

田中VPoE「そうだ。さらに、モデルの品質ゲートを設けて、基準を満たさないモデルは自動でデプロイをブロックする仕組みも作る。」

ML CI/CDの3つのトリガー

従来のCI/CDはコード変更のみがトリガーですが、MLシステムでは3種類の変更を扱います。

トリガー	検知方法	実行内容
コード変更	Git push / PR	ユニットテスト、リント、モデルテスト
データ変更	スケジュール / データ到着イベント	データバリデーション、再学習、精度テスト
モデル変更	Model Registry更新	統合テスト、ステージングデプロイ、承認フロー

GitHub Actions によるML CI/CD

コード変更トリガーのワークフロー

# .github/workflows/ml-ci.yml
name: ML CI Pipeline

on:
  pull_request:
    paths:
      - "src/**"
      - "tests/**"
      - "requirements.txt"

jobs:
  code-quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install -r requirements.txt
      - run: ruff check src/
      - run: mypy src/

  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install -r requirements.txt
      - run: pytest tests/unit/ -v --cov=src --cov-report=xml

  model-tests:
    runs-on: ubuntu-latest
    needs: unit-tests
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.10"
      - run: pip install -r requirements.txt
      - name: Run model validation tests
        run: pytest tests/model/ -v --timeout=600

モデル品質ゲート

# tests/model/test_model_quality.py
import pytest
import mlflow
import pandas as pd
from sklearn.metrics import roc_auc_score, f1_score

QUALITY_GATES = {
    "auc_roc": 0.80,
    "f1_score": 0.70,
    "inference_time_ms": 100,
    "model_size_mb": 500,
}

class TestModelQuality:
    @pytest.fixture(autouse=True)
    def setup(self):
        self.model = mlflow.pyfunc.load_model("models:/churn_model/Staging")
        self.test_data = pd.read_parquet("data/test_dataset.parquet")
        self.y_true = self.test_data["label"]
        self.X_test = self.test_data.drop(columns=["label"])

    def test_auc_threshold(self):
        """AUC-ROCが閾値以上であること"""
        predictions = self.model.predict(self.X_test)
        auc = roc_auc_score(self.y_true, predictions)
        assert auc >= QUALITY_GATES["auc_roc"], \
            f"AUC {auc:.4f} < threshold {QUALITY_GATES['auc_roc']}"

    def test_f1_threshold(self):
        """F1スコアが閾値以上であること"""
        predictions = self.model.predict(self.X_test)
        f1 = f1_score(self.y_true, (predictions > 0.5).astype(int))
        assert f1 >= QUALITY_GATES["f1_score"], \
            f"F1 {f1:.4f} < threshold {QUALITY_GATES['f1_score']}"

    def test_inference_latency(self):
        """推論レイテンシが許容範囲内であること"""
        import time
        single_record = self.X_test.iloc[:1]
        start = time.time()
        for _ in range(100):
            self.model.predict(single_record)
        avg_ms = (time.time() - start) / 100 * 1000
        assert avg_ms <= QUALITY_GATES["inference_time_ms"], \
            f"Latency {avg_ms:.1f}ms > threshold {QUALITY_GATES['inference_time_ms']}ms"

データバリデーションパイプライン

# src/validation/data_validator.py
import great_expectations as gx

def validate_training_data(data_path: str) -> bool:
    """学習データのバリデーションを実行する"""
    context = gx.get_context()

    # データソースの設定
    datasource = context.sources.add_pandas(name="training_data")
    asset = datasource.add_parquet_asset(
        name="training_parquet",
        filepath_or_buffer=data_path
    )

    # 期待値スイートの定義
    suite = context.add_expectation_suite("training_suite")

    # バリデーションルール
    expectations = [
        {"type": "expect_table_row_count_to_be_between",
         "kwargs": {"min_value": 1000, "max_value": 10000000}},
        {"type": "expect_column_values_to_not_be_null",
         "kwargs": {"column": "customer_id"}},
        {"type": "expect_column_values_to_be_between",
         "kwargs": {"column": "age", "min_value": 0, "max_value": 120}},
    ]

    # 実行
    batch = asset.get_batch()
    results = context.run_validation(batch, suite)

    return results.success

自動再学習パイプライン

# .github/workflows/ml-retrain.yml
name: ML Retrain Pipeline

on:
  schedule:
    - cron: "0 2 * * 1"  # 毎週月曜2時
  workflow_dispatch:
    inputs:
      reason:
        description: "再学習の理由"
        required: true

jobs:
  validate-data:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - name: Validate new training data
        run: python src/validation/data_validator.py

  retrain:
    needs: validate-data
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements.txt
      - name: Train model
        run: python src/training/train.py
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
      - name: Run quality gates
        run: pytest tests/model/ -v

  deploy-staging:
    needs: retrain
    runs-on: ubuntu-latest
    steps:
      - name: Promote to Staging
        run: python src/deploy/promote.py --stage staging
      - name: Run integration tests
        run: pytest tests/integration/ -v

パイプラインの可視化と監視

パイプライン実行のダッシュボード

# パイプラインメトリクスの記録例
pipeline_metrics = {
    "pipeline_run_id": run_id,
    "trigger": "scheduled",
    "data_validation": {"status": "passed", "duration_s": 45},
    "training": {"status": "passed", "duration_s": 1200, "auc": 0.85},
    "quality_gate": {"status": "passed", "checks_passed": 4, "checks_total": 4},
    "staging_deploy": {"status": "passed", "duration_s": 120},
    "total_duration_s": 1365,
}

メトリクス	目標値	アラート条件
パイプライン成功率	> 95%	連続2回失敗
学習時間	< 30分	> 60分
データバリデーション通過率	100%	< 100%
品質ゲート通過率	> 90%	連続2回不合格

まとめ

項目	ポイント
3つのトリガー	コード・データ・モデル変更ごとに異なるパイプラインを設計
品質ゲート	AUC・F1・レイテンシなどの閾値で自動ブロック
データバリデーション	Great Expectationsで学習データを自動検証
自動再学習	スケジュールまたはドリフト検知をトリガーに自動実行

チェックリスト

ML CI/CDの3つのトリガーとそれぞれの実行内容を説明できる
GitHub Actionsでモデルテストを自動実行するワークフローを書ける
品質ゲートの設計と実装方法を理解している
自動再学習パイプラインの構成要素を説明できる

次のステップへ

ML CI/CDパイプラインの実装方法を学びました。次は演習で、実際にパイプラインを構築してみましょう。

推定読了時間：30分