探索と活用のバランス

「いつも同じジャンルばかり推薦していたら、ユーザーは飽きる。」

田中VPoEが指摘する。

「確実に反応が得られる商品を推薦し続ける活用（Exploitation）と、新しいジャンルを試す探索（Exploration）。このバランスが長期的な顧客満足を左右する。」

探索と活用のトレードオフ

戦略	メリット	デメリット
活用のみ	短期CTR最大化	フィルターバブル、飽き
探索のみ	多様な発見	CTR低下、関連性の低い推薦
バランス	短期と長期の両立	設計が複雑

Epsilon-Greedy

import numpy as np

class EpsilonGreedyRecommender:
    """Epsilon-Greedyによる探索と活用"""

    def __init__(self, model, epsilon=0.1):
        self.model = model
        self.epsilon = epsilon

    def recommend(self, user_id, n_items=10):
        """確率epsilonで探索、1-epsilonで活用"""
        if np.random.random() < self.epsilon:
            # 探索: ランダムな多様性のある推薦
            return self._explore(user_id, n_items)
        else:
            # 活用: スコア最高の推薦
            return self._exploit(user_id, n_items)

    def _exploit(self, user_id, n_items):
        """通常の推薦（スコア順）"""
        return self.model.recommend(user_id, n_items)

    def _explore(self, user_id, n_items):
        """探索的推薦（多様性重視）"""
        candidates = self.model.recommend(user_id, n_items * 5)
        # カテゴリの多様性を確保しつつランダムに選択
        selected = self._diversify(candidates, n_items)
        return selected

    def _diversify(self, candidates, n_items):
        """カテゴリ多様性を考慮した選択"""
        selected = []
        seen_categories = set()
        for item in candidates:
            cat = item.get('category', '')
            if cat not in seen_categories or len(selected) >= n_items // 2:
                selected.append(item)
                seen_categories.add(cat)
            if len(selected) >= n_items:
                break
        return selected

Thompson Sampling

class ThompsonSamplingRecommender:
    """Thompson Samplingによる探索と活用"""

    def __init__(self):
        # アイテムごとのBeta分布パラメータ
        self.alpha = {}  # クリック数 + 1
        self.beta = {}   # 非クリック数 + 1

    def update(self, item_id, clicked):
        """クリック結果でパラメータを更新"""
        if item_id not in self.alpha:
            self.alpha[item_id] = 1
            self.beta[item_id] = 1

        if clicked:
            self.alpha[item_id] += 1
        else:
            self.beta[item_id] += 1

    def recommend(self, candidate_items, n_items=10):
        """Thompson Samplingで推薦"""
        scores = {}
        for item_id in candidate_items:
            a = self.alpha.get(item_id, 1)
            b = self.beta.get(item_id, 1)
            # Beta分布からサンプリング
            scores[item_id] = np.random.beta(a, b)

        # サンプル値でソート
        ranked = sorted(scores.items(), key=lambda x: -x[1])
        return [item_id for item_id, _ in ranked[:n_items]]

まとめ

項目	ポイント
Epsilon-Greedy	シンプルだが効果的、epsilon=0.1が一般的
Thompson Sampling	不確実性を考慮した確率的探索
フィルターバブル	活用のみでは長期的に顧客満足が低下
多様性指標	カテゴリカバレッジで多様性を計測

チェックリスト

探索と活用のトレードオフを説明できる
Epsilon-Greedyの仕組みを理解した
Thompson Samplingの利点を説明できる
フィルターバブルのリスクと対策を理解した

次のステップへ

探索と活用のバランスを学んだ。次は演習で評価改善計画を策定しよう。

推定読了時間: 15分