インデックス戦略の設計 - L0 カリキュラム

「インデックスは多ければいいというものじゃない」佐藤CTOは警告した。「インデックスが多いと書き込みが遅くなり、ストレージも消費する。クエリパターンを分析して、必要最小限のインデックスを設計するのがプロの仕事だ。」

1. インデックスの種類（PostgreSQL）

インデックス型	構造	適用場面	例
B-tree	平衡木	等値・範囲検索（デフォルト）	`=`, `<`, `>`, `BETWEEN`
Hash	ハッシュテーブル	等値検索のみ	`=`
GIN	転置インデックス	配列、全文検索、JSONB	`@>`, `@@`, `?`
GiST	汎用検索木	空間データ、範囲型	`&&`, `@>`, `<->`
BRIN	ブロックレンジ	自然順序のある大テーブル	時系列データ
SP-GiST	空間分割	非平衡データ構造	IP アドレス、電話番号

2. 複合インデックスの設計原則

カラム順序の重要性

-- 複合インデックスのカラム順序
-- 等値条件 → 範囲条件 → ソート条件 の順に配置

-- クエリパターン
SELECT * FROM orders
WHERE status = 'completed'      -- 等値条件
  AND created_at >= '2025-01-01' -- 範囲条件
ORDER BY total_amount DESC      -- ソート条件
LIMIT 20;

-- Good: 等値 → 範囲 → ソート の順
CREATE INDEX idx_orders_status_created_amount
ON orders (status, created_at, total_amount);

-- Bad: 範囲条件を先にするとソートが使えない
CREATE INDEX idx_orders_bad
ON orders (created_at, status, total_amount);

ESR ルール（Equality, Sort, Range）

-- ESR: Equality → Sort → Range の順でカラムを配置

-- クエリ: WHERE category = 'electronics' AND price BETWEEN 100 AND 500 ORDER BY rating DESC
-- E: category（等値）
-- S: rating（ソート）
-- R: price（範囲）

CREATE INDEX idx_products_esr
ON products (category, rating DESC, price);

-- このインデックスにより:
-- 1. category = 'electronics' で絞り込み
-- 2. rating DESC でソート済みのデータにアクセス
-- 3. price の範囲フィルタリング
-- → ソートが不要になる（Index Scan で順序保証）

3. 特殊なインデックス

部分インデックス（Partial Index）

-- アクティブなレコードのみにインデックス
CREATE INDEX idx_orders_active
ON orders (user_id, created_at)
WHERE status != 'cancelled' AND status != 'archived';
-- → テーブルの20%しかカバーしないのでサイズが小さい

-- 未処理のジョブのみ
CREATE INDEX idx_jobs_pending
ON jobs (priority, created_at)
WHERE processed_at IS NULL;
-- → 処理済みのジョブ（大多数）はインデックスに含まれない

カバリングインデックス（Covering Index / INCLUDE）

-- Index Only Scan を実現するカバリングインデックス
CREATE INDEX idx_orders_covering
ON orders (user_id, status)
INCLUDE (total_amount, created_at);

-- このクエリは Index Only Scan で完結（テーブルアクセス不要）
SELECT user_id, status, total_amount, created_at
FROM orders
WHERE user_id = 123 AND status = 'completed';

-- INCLUDE カラムはインデックスの検索には使われないが、
-- テーブルへのランダムI/Oを回避できる

式インデックス（Expression Index）

-- 関数や式の結果にインデックス
CREATE INDEX idx_users_lower_email
ON users (lower(email));

-- このクエリでインデックスが使われる
SELECT * FROM users WHERE lower(email) = 'user@example.com';

-- JSONB フィールドへのインデックス
CREATE INDEX idx_products_category
ON products ((metadata->>'category'));

-- 日付の一部にインデックス
CREATE INDEX idx_orders_month
ON orders (date_trunc('month', created_at));

4. JSONB インデックス

-- GIN インデックス: JSONB の包含検索
CREATE INDEX idx_products_metadata_gin
ON products USING gin (metadata);

-- 使用例
SELECT * FROM products
WHERE metadata @> '{"brand": "Apple", "color": "black"}';

-- jsonb_path_ops: 包含検索に特化（サイズが小さい）
CREATE INDEX idx_products_metadata_path
ON products USING gin (metadata jsonb_path_ops);

-- 特定キーへの B-tree インデックス
CREATE INDEX idx_products_brand
ON products ((metadata->>'brand'));

5. インデックスの分析と管理

-- インデックスの使用状況を確認
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan AS times_used,
  idx_tup_read AS tuples_read,
  idx_tup_fetch AS tuples_fetched,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- idx_scan = 0 のインデックスは不要な可能性が高い

-- 重複インデックスの検出
SELECT
  a.indexrelid::regclass AS index_a,
  b.indexrelid::regclass AS index_b,
  a.indkey AS columns_a,
  b.indkey AS columns_b
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid
  AND a.indexrelid != b.indexrelid
  AND a.indkey <@ b.indkey  -- a のカラムが b に含まれる
WHERE a.indrelid::regclass::text NOT LIKE 'pg_%';

-- インデックスの肥大化チェック
SELECT
  schemaname || '.' || relname AS table_name,
  indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan AS number_of_scans,
  idx_tup_read AS tuples_read
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 100 * 1024 * 1024  -- 100MB以上
ORDER BY pg_relation_size(indexrelid) DESC;

// インデックス推奨エンジンの骨格
interface IndexRecommendation {
  table: string;
  columns: string[];
  type: 'btree' | 'gin' | 'gist' | 'brin';
  partial?: string;
  include?: string[];
  rationale: string;
  estimatedImpact: 'high' | 'medium' | 'low';
}

class IndexAdvisor {
  async analyzeSlowQueries(): Promise<IndexRecommendation[]> {
    const recommendations: IndexRecommendation[] = [];

    // pg_stat_statements から遅いクエリを取得
    const slowQueries = await this.getSlowQueries();

    for (const query of slowQueries) {
      // EXPLAIN で実行計画を取得
      const plan = await this.getQueryPlan(query.queryText);

      // Seq Scan で大テーブルをスキャンしているノードを検出
      const seqScans = this.findSeqScans(plan);

      for (const scan of seqScans) {
        if (scan.rowsRemoved / scan.actualRows > 0.9) {
          // フィルタで90%以上除去 → インデックスが有効
          recommendations.push({
            table: scan.tableName,
            columns: scan.filterColumns,
            type: 'btree',
            rationale: `Query "${query.queryText.substring(0, 60)}..." scans ${scan.actualRows} rows, removes ${scan.rowsRemoved} by filter`,
            estimatedImpact: 'high',
          });
        }
      }
    }

    return this.deduplicateRecommendations(recommendations);
  }

  private async getSlowQueries(): Promise<any[]> { return []; }
  private async getQueryPlan(query: string): Promise<any> { return {}; }
  private findSeqScans(plan: any): any[] { return []; }
  private deduplicateRecommendations(recs: IndexRecommendation[]): IndexRecommendation[] { return recs; }
}

コラム: BRIN インデックスの活用

BRIN（Block Range Index）は、テーブルの物理的な順序と値の順序が相関しているデータに非常に効率的。

-- 時系列データに最適
CREATE INDEX idx_logs_created_brin
ON logs USING brin (created_at)
WITH (pages_per_range = 128);

-- サイズ比較（1億行のテーブル）:
-- B-tree: ~2GB
-- BRIN:   ~500KB (4000分の1！)

-- 制限:
-- 等値検索には不向き（範囲検索向き）
-- INSERT が時系列順でないと効果が薄い

BRIN は「各ブロック範囲の最小値・最大値」だけを記録するため、サイズが極めて小さい。

まとめ

トピック	要点
インデックス種類	B-tree（デフォルト）、GIN（配列/JSONB）、BRIN（時系列）
複合インデックス	ESR ルール: Equality → Sort → Range
部分インデックス	WHERE 句で対象を限定、サイズ削減
カバリングインデックス	INCLUDE で Index Only Scan を実現
インデックス管理	使用頻度・サイズ・重複を定期確認

チェックリスト

6種類のインデックスの特性と適用場面を説明できる
ESR ルールに従った複合インデックスを設計できる
部分インデックスとカバリングインデックスを使い分けられる
不要なインデックスの検出方法を知っている
JSONB に対する適切なインデックスを設計できる

次のステップへ

インデックス戦略を学んだ。次は コネクションプーリングとリードレプリカ を学び、DB接続レベルの最適化に進もう。

推定読了時間: 40分