演習：テキスト分類モデルを構築しよう

「3つの手法を理解した。では実際にKaggleデータで比較実験をやってみよう。」

田中VPoEが開発環境を開く。

「Disaster Tweetsデータセットで、従来ML・BERT・LLMの3手法を実装し、定量的に比較してくれ。」

ミッション概要

Kaggle Natural Language Processing with Disaster Tweetsデータセットを使い、3手法でテキスト分類モデルを構築・比較する。

前提条件

Python 3.10+
scikit-learn, transformers, langchain
Kaggle APIでデータセット取得済み

Mission 1: 従来ML（Naive Bayes + SVM）（30分）

以下を実装せよ。

TF-IDFパイプライン + Naive Bayes
TF-IDFパイプライン + SVM
GridSearchCVによるハイパーパラメータチューニング
classification_reportで評価

解答例

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report

# Naive Bayes
nb_pipe = Pipeline([
    ('tfidf', TfidfVectorizer(max_features=10000, ngram_range=(1, 2))),
    ('clf', MultinomialNB(alpha=0.1)),
])
nb_pipe.fit(X_train, y_train)
print("=== Naive Bayes ===")
print(classification_report(y_test, nb_pipe.predict(X_test)))

# SVM + チューニング
svm_pipe = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('clf', LinearSVC()),
])
param_grid = {
    'tfidf__max_features': [5000, 10000],
    'tfidf__ngram_range': [(1, 1), (1, 2)],
    'clf__C': [0.1, 1.0],
}
grid = GridSearchCV(svm_pipe, param_grid, cv=5, scoring='f1', n_jobs=-1)
grid.fit(X_train, y_train)
print("=== SVM (Best) ===")
print(f"Best params: {grid.best_params_}")
print(classification_report(y_test, grid.predict(X_test)))

Mission 2: BERT Fine-tuning（30分）

以下を実装せよ。

トークナイザーとモデルの準備
データセットの構築（tokenize）
TrainingArgumentsとTrainerの設定
学習と評価

解答例

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import TrainingArguments, Trainer
from datasets import Dataset

model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

def tokenize(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=128)

train_ds = Dataset.from_dict({'text': X_train.tolist(), 'label': y_train.tolist()})
eval_ds = Dataset.from_dict({'text': X_test.tolist(), 'label': y_test.tolist()})
train_ds = train_ds.map(tokenize, batched=True)
eval_ds = eval_ds.map(tokenize, batched=True)

args = TrainingArguments(
    output_dir='./bert_results', num_train_epochs=3,
    per_device_train_batch_size=16, learning_rate=2e-5,
    evaluation_strategy='epoch', load_best_model_at_end=True,
)

trainer = Trainer(model=model, args=args, train_dataset=train_ds, eval_dataset=eval_ds, compute_metrics=compute_metrics)
trainer.train()
print("=== BERT ===")
print(trainer.evaluate())

Mission 3: LLM Zero-shot + 比較レポート（30分）

以下を実装せよ。

LLM Zero-shotによる分類（テストデータの100件サンプル）
3手法の比較表作成（F1、速度、コスト）
NetShop社への推奨手法の選定と根拠

解答例

# LLM Zero-shot (100件サンプル)
import time
sample_texts = X_test[:100].tolist()
sample_labels = y_test[:100].tolist()

start = time.time()
llm_preds = [zero_shot_classify(text) for text in sample_texts]
llm_time = (time.time() - start) / 100

llm_labels = [1 if p['category'] == 'disaster' else 0 for p in llm_preds]
llm_f1 = f1_score(sample_labels, llm_labels)

# 比較レポート
print("=== 3手法比較 ===")
print(f"{'手法':<20} {'F1':>8} {'推論(ms)':>10} {'コスト/1K':>10}")
print(f"{'Naive Bayes':<20} {nb_f1:>8.4f} {'0.1':>10} {'0円':>10}")
print(f"{'SVM':<20} {svm_f1:>8.4f} {'0.1':>10} {'0円':>10}")
print(f"{'BERT':<20} {bert_f1:>8.4f} {'10':>10} {'0円':>10}")
print(f"{'LLM Zero-shot':<20} {llm_f1:>8.4f} {llm_time*1000:>10.0f} {'2000円':>10}")

# 推奨
print("\n=== 推奨 ===")
print("NetShop社の月5000件問い合わせ対応には:")
print("Phase 1: LLM Zero-shot でPoC検証（2週間）")
print("Phase 2: BERT Fine-tuning で本番運用（精度とコスト最適）")
print("Phase 3: SVM をフォールバックとして併用")

達成度チェック

従来ML（NB + SVM）でベースラインを構築した
BERT Fine-tuningで学習・評価を実行した
LLM Zero-shotの分類を実装した
3手法の定量的比較表を作成した
ビジネス要件に基づく推奨手法を選定した

推定所要時間: 90分