3手法の比較と選定

「従来ML、BERT、LLM。3つの手法を同じ土俵で比較しよう。」

田中VPoEが比較表を用意する。

「精度だけでなく、コスト、速度、保守性、運用のしやすさまで含めた総合的な判断が必要だ。ビジネスに最適な選択をするのがエンジニアの腕の見せどころだ。」

同一データセットでの比較

from sklearn.metrics import classification_report, f1_score
import time

def benchmark_models(X_train, y_train, X_test, y_test):
    """3手法をベンチマーク"""
    results = []

    # 1. SVM (TF-IDF)
    start = time.time()
    svm_pipeline.fit(X_train, y_train)
    train_time = time.time() - start

    start = time.time()
    y_pred_svm = svm_pipeline.predict(X_test)
    infer_time = (time.time() - start) / len(X_test)

    results.append({
        'model': 'SVM (TF-IDF)',
        'f1': f1_score(y_test, y_pred_svm, average='weighted'),
        'train_time_sec': round(train_time, 1),
        'infer_time_ms': round(infer_time * 1000, 2),
        'cost_per_1000': 0,
    })

    # 2. BERT Fine-tuning
    # (Trainerで学習済みの結果)
    results.append({
        'model': 'BERT Fine-tuning',
        'f1': bert_f1_score,
        'train_time_sec': bert_train_time,
        'infer_time_ms': bert_infer_time,
        'cost_per_1000': 0,  # GPU費用は別途
    })

    # 3. LLM Zero-shot
    start = time.time()
    y_pred_llm = [zero_shot_classify(text)['category'] for text in X_test[:100]]
    infer_time = (time.time() - start) / 100

    results.append({
        'model': 'LLM Zero-shot',
        'f1': f1_score(y_test[:100], y_pred_llm, average='weighted'),
        'train_time_sec': 0,
        'infer_time_ms': round(infer_time * 1000, 1),
        'cost_per_1000': round(infer_time * 1000 * 0.002, 0),  # API費用概算
    })

    return results

比較結果の例

指標	SVM	BERT	LLM Zero-shot
F1-Score	0.85	0.92	0.88
学習時間	5秒	30分	0秒
推論時間/件	0.1ms	10ms	500ms
1000件あたりコスト	0円	0円	2,000円
学習データ必要量	3,000件	500件	0件
新カテゴリ追加	再学習必要	再学習必要	プロンプト変更のみ
解釈性	特徴量重要度	Attention可視化	理由文生成可能

選定フレームワーク

def select_best_model(requirements):
    """要件に基づいてモデルを選定"""
    scores = {'SVM': 0, 'BERT': 0, 'LLM': 0}

    # 精度要件
    if requirements['accuracy_priority'] == 'high':
        scores['BERT'] += 3
        scores['LLM'] += 2
        scores['SVM'] += 1
    else:
        scores['SVM'] += 2
        scores['BERT'] += 2
        scores['LLM'] += 1

    # コスト要件
    if requirements['budget'] == 'limited':
        scores['SVM'] += 3
        scores['BERT'] += 2
        scores['LLM'] += 0
    else:
        scores['SVM'] += 1
        scores['BERT'] += 2
        scores['LLM'] += 2

    # 速度要件
    if requirements['latency_ms'] < 10:
        scores['SVM'] += 3
        scores['BERT'] += 1
        scores['LLM'] += 0
    elif requirements['latency_ms'] < 100:
        scores['SVM'] += 2
        scores['BERT'] += 3
        scores['LLM'] += 1
    else:
        scores['SVM'] += 1
        scores['BERT'] += 2
        scores['LLM'] += 3

    # 学習データ
    if requirements['labeled_data'] < 100:
        scores['LLM'] += 3
        scores['BERT'] += 1
        scores['SVM'] += 0
    elif requirements['labeled_data'] < 1000:
        scores['LLM'] += 2
        scores['BERT'] += 3
        scores['SVM'] += 1
    else:
        scores['SVM'] += 3
        scores['BERT'] += 3
        scores['LLM'] += 1

    best = max(scores, key=scores.get)
    return {
        'recommendation': best,
        'scores': scores,
        'reasoning': f"{best}が要件に最も適合（スコア: {scores[best]}）",
    }

ハイブリッド戦略

推奨アーキテクチャ:

1. LLM Zero-shot: プロトタイプ段階での検証
2. LLM Few-shot: 学習データ収集と並行して本番運用開始
3. BERT Fine-tuning: 十分なデータが溜まったら移行
4. SVM フォールバック: BERT推論遅延時の代替

段階的移行:
  Week 1-2: LLM Zero-shot（データ収集開始）
  Week 3-4: LLM Few-shot（収集データで例示改善）
  Month 2: BERT Fine-tuning（500件以上のラベル付きデータで学習）
  Month 3: BERT本番 + LLMは複雑ケースのみ

まとめ

項目	ポイント
総合比較	精度/コスト/速度/保守性の4軸で評価
段階的アプローチ	LLM→BERT→SVMの段階的移行が現実的
ハイブリッド	複雑度に応じて手法を使い分ける
選定基準	ビジネス要件（コスト、速度、精度）に応じて判断

チェックリスト

3手法の定量的な比較を実施できる
ビジネス要件に基づくモデル選定ができる
段階的移行戦略を設計できる
ハイブリッドアーキテクチャの利点を説明できる

次のステップへ

3手法の比較と選定を理解した。次は演習でテキスト分類モデルを構築しよう。

推定読了時間: 30分