総合演習：顧客分析レポート

田中VPoE「いよいよ最終課題だ。これまで学んだすべてのスキルを使って、NetShop社の経営会議向けレポートを完成させてほしい。」

あなた「データの前処理から、統計分析、可視化、レポート作成まで、一気通貫ですね。」

田中VPoE「その通り。CRISP-DMのプロセスに沿って、ビジネス理解から最終レポートまでを一人で完成させてみよう。実務で求められるのはまさにこの力だ。」

ミッション概要

NetShop社のマーケティング部長から以下の依頼を受けました：

「来月の経営会議で、顧客分析の結果を報告したい。特に以下の3点を知りたい：

売上の伸びが鈍化している原因は何か？

どの顧客層に注力すべきか？

具体的にどんな施策を打つべきか？分析レポートとプレゼン資料の両方を用意してほしい。」

Jupyter Notebookで分析レポートを作成し、7章構成でまとめてください。

データの準備

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)

# === 購買データ（2年分）===
n = 20000
customer_ids = np.random.choice(range(1, 2001), n)

# 季節性とトレンドを持つ日付
dates = pd.date_range('2024-01-01', '2025-12-31', periods=n)

orders = pd.DataFrame({
    'order_id': range(1, n + 1),
    'customer_id': customer_ids,
    'order_date': dates,
    'product_id': np.random.choice(range(1, 201), n),
    'category': np.random.choice(
        ['家電', '書籍', '食品', 'ファッション', 'スポーツ'],
        n, p=[0.25, 0.15, 0.30, 0.20, 0.10]
    ),
    'amount': np.random.lognormal(8.0, 0.7, n).round(0),
    'quantity': np.random.randint(1, 5, n),
    'channel': np.random.choice(['organic', 'paid', 'email', 'social'], n, p=[0.35, 0.30, 0.20, 0.15]),
    'device': np.random.choice(['desktop', 'mobile', 'tablet'], n, p=[0.35, 0.55, 0.10]),
    'status': np.random.choice(['completed', 'completed', 'completed', 'completed', 'cancelled', 'returned'], n)
})

# 後半で購買頻度を意図的に下げる（トレンド低下を再現）
late_orders = orders[orders['order_date'] >= '2025-07-01'].index
orders.loc[np.random.choice(late_orders, size=len(late_orders)//4, replace=False), 'status'] = 'cancelled'

# 欠損値を追加
orders.loc[np.random.choice(n, 500, replace=False), 'amount'] = np.nan
orders.loc[np.random.choice(n, 100, replace=False), 'channel'] = np.nan

# === 顧客マスタ ===
customers = pd.DataFrame({
    'customer_id': range(1, 2001),
    'age': np.random.normal(38, 12, 2000).clip(18, 75).astype(int),
    'gender': np.random.choice(['M', 'F', 'Other'], 2000, p=[0.48, 0.48, 0.04]),
    'region': np.random.choice(['関東', '関西', '中部', '北海道', '九州'], 2000, p=[0.35, 0.25, 0.20, 0.10, 0.10]),
    'registration_date': pd.date_range('2020-01-01', periods=2000, freq='10h'),
    'membership': np.random.choice(['Free', 'Silver', 'Gold'], 2000, p=[0.60, 0.30, 0.10])
})
customers.loc[np.random.choice(2000, 200, replace=False), 'age'] = np.nan

第1章：エグゼクティブサマリー

分析の主要な発見と提言を1ページ以内でまとめてください。

ヒント：先に第2-6章を完成させてから、最後にサマリーを書くのが効率的です。

第2章：背景と目的

分析の背景（売上鈍化の状況）
3つの分析目的を明記
対象期間と分析スコープ

第3章：データ準備

以下を実施してください：

データ品質チェック（欠損、異常値、重複）
クレンジングと前処理
テーブル結合
特徴量の追加（年月、曜日、顧客年代など）

解答例（データ準備）

# === 品質チェック ===
print("=== 購買データ品質 ===")
print(f"レコード数: {len(orders)}")
print(f"欠損値:\n{orders.isnull().sum()}")
print(f"ステータス分布:\n{orders['status'].value_counts()}")

# === クレンジング ===
# 欠損値処理
orders['amount'] = orders['amount'].fillna(orders['amount'].median())
orders['channel'] = orders['channel'].fillna('unknown')

# 顧客マスタの欠損
customers['age'] = customers.groupby('region')['age'].transform(
    lambda x: x.fillna(x.median())
)

# === 結合 ===
df = pd.merge(orders, customers, on='customer_id', how='left')

# === 特徴量追加 ===
df['order_date'] = pd.to_datetime(df['order_date'])
df['year_month'] = df['order_date'].dt.to_period('M')
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month
df['day_of_week'] = df['order_date'].dt.day_name()
df['is_weekend'] = df['order_date'].dt.dayofweek >= 5
df['age_group'] = pd.cut(df['age'], bins=[0, 25, 35, 45, 55, 100],
                          labels=['18-24', '25-34', '35-44', '45-54', '55+'])

# 完了注文のみ
df_completed = df[df['status'] == 'completed'].copy()
print(f"\n分析対象: {len(df_completed)}件（完了注文）")

第4章：分析結果

以下の3つの分析を実施し、それぞれ「発見→グラフ→解釈」の構造で記述してください。

4.1 売上構造分解

月次売上を顧客数、購買頻度、客単価に分解し、鈍化の主因を特定。

解答例（売上構造分解）

monthly = df_completed.groupby('year_month').agg(
    revenue=('amount', 'sum'),
    customers=('customer_id', 'nunique'),
    orders=('order_id', 'count'),
    avg_amount=('amount', 'mean')
).reset_index()
monthly['frequency'] = monthly['orders'] / monthly['customers']

fig, axes = plt.subplots(2, 2, figsize=(14, 8))
fig.suptitle('売上構造分解：購買頻度の低下が成長鈍化の主因', fontsize=14, fontweight='bold')

for ax, (col, label, color) in zip(axes.flatten(), [
    ('revenue', '売上（円）', '#1976D2'),
    ('customers', 'ユニーク顧客数', '#388E3C'),
    ('frequency', '購買頻度（回/人）', '#F57C00'),
    ('avg_amount', '客単価（円）', '#7B1FA2')
]):
    ax.plot(range(len(monthly)), monthly[col], marker='o', color=color, linewidth=2)
    ax.set_title(label, fontsize=11)
    ax.grid(axis='y', alpha=0.3)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

4.2 顧客セグメンテーション（RFM分析）

RFM分析で顧客を4セグメントに分類し、各セグメントの特徴を把握。

解答例（RFMセグメンテーション）

ref_date = df_completed['order_date'].max() + pd.Timedelta(days=1)
rfm = df_completed.groupby('customer_id').agg(
    recency=('order_date', lambda x: (ref_date - x.max()).days),
    frequency=('order_id', 'nunique'),
    monetary=('amount', 'sum')
).reset_index()

rfm['r_score'] = pd.qcut(rfm['recency'], 4, labels=[4, 3, 2, 1]).astype(int)
rfm['f_score'] = pd.qcut(rfm['frequency'].rank(method='first'), 4, labels=[1, 2, 3, 4]).astype(int)

def assign_segment(row):
    if row['r_score'] >= 3 and row['f_score'] >= 3:
        return 'Champion'
    elif row['r_score'] >= 3 and row['f_score'] <= 2:
        return 'New/Promising'
    elif row['r_score'] <= 2 and row['f_score'] >= 3:
        return 'At Risk'
    else:
        return 'Lost'

rfm['segment'] = rfm.apply(assign_segment, axis=1)

# セグメント別サマリー
seg_summary = rfm.groupby('segment').agg(
    count=('customer_id', 'count'),
    avg_recency=('recency', 'mean'),
    avg_frequency=('frequency', 'mean'),
    avg_monetary=('monetary', 'mean'),
    total_monetary=('monetary', 'sum')
).round(0)
seg_summary['revenue_share'] = (seg_summary['total_monetary'] / seg_summary['total_monetary'].sum() * 100).round(1)

print(seg_summary)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
colors = {'Champion': '#4CAF50', 'New/Promising': '#2196F3', 'At Risk': '#FF9800', 'Lost': '#F44336'}

# 顧客数
seg_counts = rfm['segment'].value_counts()
axes[0].bar(seg_counts.index, seg_counts.values, color=[colors[s] for s in seg_counts.index])
axes[0].set_title('セグメント別顧客数', fontsize=12, fontweight='bold')
axes[0].set_ylabel('顧客数')

# 売上構成
seg_revenue = rfm.groupby('segment')['monetary'].sum()
axes[1].pie(seg_revenue, labels=seg_revenue.index, autopct='%1.1f%%',
            colors=[colors[s] for s in seg_revenue.index], startangle=90)
axes[1].set_title('セグメント別売上構成', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

4.3 チャネル別・デバイス別分析

マーケティングチャネルとデバイスによる購買行動の違いを分析。

解答例（チャネル・デバイス分析）

# チャネル別KPI
channel_kpi = df_completed.groupby('channel').agg(
    orders=('order_id', 'count'),
    revenue=('amount', 'sum'),
    avg_amount=('amount', 'mean'),
    customers=('customer_id', 'nunique')
).round(0)
channel_kpi['revenue_per_customer'] = (channel_kpi['revenue'] / channel_kpi['customers']).round(0)

# デバイス別KPI
device_kpi = df_completed.groupby('device').agg(
    orders=('order_id', 'count'),
    revenue=('amount', 'sum'),
    avg_amount=('amount', 'mean')
).round(0)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# チャネル別
channel_kpi.sort_values('revenue_per_customer', ascending=True).plot(
    kind='barh', y='revenue_per_customer', ax=axes[0], color='#1976D2', legend=False
)
axes[0].set_title('チャネル別 顧客あたり売上', fontsize=12, fontweight='bold')
axes[0].set_xlabel('売上（円/人）')

# デバイス別
device_kpi.plot(kind='bar', y='avg_amount', ax=axes[1], color='#388E3C', legend=False)
axes[1].set_title('デバイス別 平均注文金額', fontsize=12, fontweight='bold')
axes[1].set_ylabel('平均金額（円）')
axes[1].tick_params(axis='x', rotation=0)

plt.tight_layout()
plt.show()

# 統計検定：デスクトップ vs モバイルの注文金額
desktop = df_completed[df_completed['device'] == 'desktop']['amount']
mobile = df_completed[df_completed['device'] == 'mobile']['amount']
t, p = stats.ttest_ind(desktop, mobile, equal_var=False)
print(f"デスクトップ vs モバイル: t={t:.3f}, p={p:.4f}")

第5章：考察とインサイト

分析結果のビジネス上の意味を掘り下げ、分析の限界も明記してください。

第6章：提言とアクションプラン

3つ以上の施策を優先度付きで提案し、期待効果とコストを見積もってください。

第7章：付録

追加の統計検定結果、詳細なデータテーブル、環境情報などを記載してください。

最終チェック

7章構成でNotebookが完成している
エグゼクティブサマリーが1ページ以内にまとまっている
データ品質チェックと前処理が適切に行われている
3つの分析が「発見→グラフ→解釈」の構造で記述されている
少なくとも1つの統計検定が実施されている
可視化がビジネスレポートとして適切なスタイルになっている
提言が具体的で実行可能である
Restart & Run All で全セルが正常に実行される
全体として一貫したストーリーになっている

推定所要時間：90分