データメッシュとドメイン駆動データ

ストーリー

佐

佐藤CTO

中央集権型のデータチームがボトルネックになっていないか？

佐

佐藤CTO

データメッシュは、データの所有権をドメインチームに分散させる考え方だ。マイクロサービスと同じ発想をデータにも適用する。ただし無秩序にならないよう、標準化とガバナンスの仕組みが必要だ

中央集権 vs データメッシュ

従来型（中央集権型データプラットフォーム）

graph TD
    T1["注文チーム"] --> Central
    T2["顧客チーム"] --> Central
    T3["在庫チーム"] --> Central
    Central["中央データチーム（ボトルネック）<br/>ETL → DWH → BI"]

    classDef team fill:#dbeafe,stroke:#3b82f6
    classDef central fill:#fee2e2,stroke:#ef4444,font-weight:bold
    class T1,T2,T3 team
    class Central central

課題:

中央データチームへの依頼待ち（週単位のバックログ）
ドメイン知識の欠如による誤った変換ロジック
スケーラビリティの限界

データメッシュの4原則

原則	説明
ドメイン所有権	データはドメインチームが所有し品質に責任を持つ
データプロダクト	データを他チームが消費可能なプロダクトとして提供
セルフサービスプラットフォーム	インフラの複雑さを抽象化するプラットフォーム
連合型ガバナンス	グローバル標準とローカル自律のバランス

ドメイン所有権

graph LR
    subgraph Domain["注文ドメインチーム"]
        OS["Order Service<br/>（Operational）<br/>PostgreSQL"] --> DP["Order Data Product<br/>（Analytical）<br/>・注文サマリーAPI<br/>・月次注文データセット<br/>・注文イベントストリーム"]
    end
    Note["チームが運用データと分析データの<br/>両方に責任を持つ"]
    Domain --- Note

    classDef domain fill:#f0fdf4,stroke:#22c55e
    classDef service fill:#dbeafe,stroke:#3b82f6
    classDef product fill:#fef3c7,stroke:#f59e0b
    classDef note fill:#f5f5f5,stroke:#999,font-style:italic
    class Domain domain
    class OS service
    class DP product
    class Note note

データプロダクトの設計

データプロダクトの構成要素

// データプロダクト定義（宣言的設定）
interface DataProduct {
  name: string;
  domain: string;
  owner: string;
  description: string;

  // 提供するポート（消費方法）
  outputPorts: OutputPort[];

  // データ品質SLO
  slo: {
    freshness: string;      // 鮮度: "< 1 hour"
    completeness: number;   // 完全性: 99.9%
    accuracy: number;       // 正確性: 99.99%
    availability: string;   // 可用性: "99.9%"
  };

  // スキーマ（契約）
  schema: SchemaDefinition;

  // セマンティクスと分類
  tags: string[];
  classification: 'public' | 'internal' | 'confidential' | 'restricted';
}

// 出力ポートの種類
type OutputPort =
  | { type: 'rest-api'; endpoint: string; version: string }
  | { type: 'event-stream'; topic: string; format: 'avro' | 'protobuf' }
  | { type: 'dataset'; location: string; format: 'parquet' | 'delta' }
  | { type: 'sql-view'; database: string; schema: string; table: string };

データプロダクトの実装例

# data-product.yaml - 注文データプロダクトの定義
apiVersion: datamesh/v1
kind: DataProduct
metadata:
  name: order-analytics
  domain: order
  owner: order-team
  description: "注文に関する分析用データプロダクト"

spec:
  outputPorts:
    - name: order-events
      type: event-stream
      topic: order.events.v2
      format: avro
      description: "リアルタイム注文イベントストリーム"

    - name: order-summary-api
      type: rest-api
      endpoint: /api/v2/data-products/orders/summary
      description: "注文サマリー REST API"

    - name: order-dataset
      type: dataset
      location: s3://data-products/order/daily/
      format: parquet
      partitionBy: [order_date]
      refreshSchedule: "0 3 * * *"  # 毎日3時
      description: "日次注文データセット (Parquet)"

  schema:
    fields:
      - name: order_id
        type: string
        description: "注文ID"
        pii: false
      - name: customer_id
        type: string
        description: "顧客ID"
        pii: true
        classification: confidential
      - name: order_date
        type: timestamp
        description: "注文日時"
      - name: total_amount
        type: decimal
        precision: 10
        scale: 2
        description: "注文合計金額"
      - name: status
        type: enum
        values: [created, confirmed, shipped, delivered, cancelled]

  slo:
    freshness: "< 1 hour"
    completeness: 99.9
    accuracy: 99.99
    availability: "99.9%"

  governance:
    dataClassification: internal
    retentionPolicy: "7 years"
    accessControl:
      - role: data-analyst
        permissions: [read]
      - role: order-team
        permissions: [read, write, admin]

セルフサービスデータプラットフォーム

# セルフサービスプラットフォームの抽象化例

from dataclasses import dataclass
from typing import Optional

@dataclass
class DataProductConfig:
    """ドメインチームが定義するデータプロダクト設定"""
    name: str
    domain: str
    source_table: str
    output_format: str  # "parquet", "delta", "iceberg"
    partition_by: list[str]
    schedule: str  # cron expression
    quality_checks: list[dict]

class DataProductPlatform:
    """
    セルフサービスプラットフォーム
    ドメインチームはConfigを定義するだけ。
    インフラの複雑さ（Spark, Airflow, S3, カタログ等）は隠蔽。
    """

    def provision(self, config: DataProductConfig) -> None:
        """データプロダクトをプロビジョニング"""
        # 1. ストレージの確保
        self._create_storage(config)

        # 2. パイプラインの生成（Airflow DAG自動生成）
        self._generate_pipeline(config)

        # 3. データカタログへの登録
        self._register_catalog(config)

        # 4. 品質チェックの設定
        self._setup_quality_checks(config)

        # 5. アクセス制御の設定
        self._configure_access_control(config)

        # 6. モニタリングダッシュボード生成
        self._create_monitoring(config)

    def _generate_pipeline(self, config: DataProductConfig) -> str:
        """Airflow DAGを自動生成"""
        dag_code = f"""
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

with DAG(
    dag_id='{config.domain}_{config.name}_pipeline',
    schedule='{config.schedule}',
    start_date=datetime(2024, 1, 1),
    catchup=False,
    tags=['{config.domain}', 'data-product'],
) as dag:

    extract = PythonOperator(
        task_id='extract',
        python_callable=extract_from_source,
        op_kwargs={{'source': '{config.source_table}'}},
    )

    quality_check = PythonOperator(
        task_id='quality_check',
        python_callable=run_quality_checks,
        op_kwargs={{'checks': {config.quality_checks}}},
    )

    publish = PythonOperator(
        task_id='publish',
        python_callable=publish_data_product,
        op_kwargs={{
            'format': '{config.output_format}',
            'partition_by': {config.partition_by},
        }},
    )

    extract >> quality_check >> publish
"""
        return dag_code

連合型ガバナンス

レイヤー	グローバル（標準化）	ローカル（自律）
スキーマ	共通フィールド名規約	ドメイン固有フィールド
品質	最低限のSLO基準	ドメイン固有の品質ルール
セキュリティ	PII分類基準	アクセス権限の詳細
フォーマット	標準出力形式（Parquet）	内部データ形式は自由

連合型ガバナンスの実装パターン

// グローバルポリシーの定義と自動適用
interface GlobalPolicy {
  // 全データプロダクトに適用される最低要件
  minimumSLO: {
    freshness: string;
    completeness: number;
    availability: string;
  };

  // PII フィールドの必須分類
  piiClassification: {
    required: true;
    allowedClassifications: ['public', 'internal', 'confidential', 'restricted'];
  };

  // 命名規約
  naming: {
    fieldCase: 'snake_case';
    dateFormat: 'ISO8601';
    idSuffix: '_id';
  };

  // 保持期間ポリシー
  retention: {
    default: '3 years';
    piiData: '1 year after deletion request';
    financialData: '7 years';
  };
}

// ポリシー適合チェック（CI/CDパイプラインで実行）
class PolicyValidator {
  validate(product: DataProduct, policy: GlobalPolicy): ValidationResult {
    const errors: string[] = [];

    // SLOチェック
    if (product.slo.completeness < policy.minimumSLO.completeness) {
      errors.push(`完全性SLOが基準未満: ${product.slo.completeness}%`);
    }

    // PII分類チェック
    for (const field of product.schema.fields) {
      if (field.pii && !field.classification) {
        errors.push(`PIIフィールド ${field.name} に分類が未設定`);
      }
    }

    // 命名規約チェック
    for (const field of product.schema.fields) {
      if (!isSnakeCase(field.name)) {
        errors.push(`フィールド名 ${field.name} がsnake_caseではありません`);
      }
    }

    return { valid: errors.length === 0, errors };
  }
}

データメッシュ vs 従来型の比較

観点	中央集権型	データメッシュ
データ所有権	中央データチーム	ドメインチーム
スケーラビリティ	チーム数に比例して遅延	ドメイン並列で拡大
ドメイン知識	中央チームに集約（浅い）	各チームが深い知識を持つ
導入コスト	低い（既存パターン）	高い（プラットフォーム構築）
適用規模	小〜中規模	大規模（10+チーム）

まとめ

ポイント	内容
データメッシュ	データ所有権をドメインチームに分散
データプロダクト	消費可能な形でデータを提供する
セルフサービス	インフラの複雑さをプラットフォームが吸収
連合型ガバナンス	グローバル標準 + ローカル自律のバランス

チェックリスト

データメッシュの4原則を説明できる
データプロダクトの構成要素（出力ポート、SLO、スキーマ）を理解した
セルフサービスプラットフォームの役割を理解した
連合型ガバナンスの考え方を理解した

次のステップへ

次は演習で、マルチドメインシステムのデータモデルを実際に設計します。

推定読了時間: 30分