画像データ拡張

田中VPoE:「転移学習を使っても、データが少なければ過学習のリスクは残る。そこでデータ拡張（Data Augmentation）の出番だ。既存の画像に変換を加えて、擬似的にデータ量を増やす手法だよ。」

あなた:「画像を回転させたり反転させたりするんですよね。NetShop の商品画像だと、撮影角度やライティングのバリエーションを増やせる。」

田中VPoE:「その通り。しかもデータ拡張はコストゼロで汎化性能を大きく改善できる。やらない理由がないくらいだ。」

データ拡張の基本

データ拡張は、学習データに対してランダムな変換を適用することで、見かけ上のデータ量を増やし、モデルの汎化性能を向上させる手法です。

元画像 → [ランダム変換] → 変換後画像（毎エポック異なる変換が適用される）

変換の例:
- 水平反転      - 回転          - 色調変更
- クロップ      - スケーリング    - ぼかし

torchvision.transforms による実装

基本的な変換

from torchvision import transforms

# 学習用の変換パイプライン
train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(degrees=15),
    transforms.ColorJitter(
        brightness=0.2,
        contrast=0.2,
        saturation=0.2,
        hue=0.1
    ),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

# 検証/テスト用の変換（拡張なし）
val_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

各変換の効果

変換	パラメータ例	効果
RandomHorizontalFlip	p=0.5	左右の向きに依存しない学習
RandomRotation	degrees=15	回転に対するロバスト性
RandomResizedCrop	scale=(0.8,1.0)	スケール変化への対応
ColorJitter	brightness=0.2	照明条件の変化への対応
RandomAffine	translate=(0.1,0.1)	位置ズレへのロバスト性
RandomErasing	p=0.5	オクルージョンへの対応

高度なデータ拡張

RandAugment

複数の変換からランダムに選択して適用する手法です。

from torchvision.transforms import RandAugment

train_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomCrop(224),
    RandAugment(num_ops=2, magnitude=9),  # 2つの変換をランダムに適用
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
])

Mixup と CutMix

画像同士を混ぜ合わせることで、決定境界を滑らかにする手法です。

import torch

def mixup(images, labels, alpha=0.2):
    """Mixup: 2つの画像を線形補間で混合"""
    lam = torch.distributions.Beta(alpha, alpha).sample()
    batch_size = images.size(0)
    index = torch.randperm(batch_size)

    mixed_images = lam * images + (1 - lam) * images[index]
    return mixed_images, labels, labels[index], lam

def cutmix(images, labels, alpha=1.0):
    """CutMix: 画像の一部を別画像で置き換え"""
    lam = torch.distributions.Beta(alpha, alpha).sample()
    batch_size, _, H, W = images.size()
    index = torch.randperm(batch_size)

    # カット領域の計算
    cut_ratio = (1 - lam).sqrt()
    cut_h = int(H * cut_ratio)
    cut_w = int(W * cut_ratio)
    cy = torch.randint(H, (1,)).item()
    cx = torch.randint(W, (1,)).item()

    y1 = max(0, cy - cut_h // 2)
    y2 = min(H, cy + cut_h // 2)
    x1 = max(0, cx - cut_w // 2)
    x2 = min(W, cx + cut_w // 2)

    images[:, :, y1:y2, x1:x2] = images[index, :, y1:y2, x1:x2]
    lam = 1 - (y2 - y1) * (x2 - x1) / (H * W)
    return images, labels, labels[index], lam

NetShop 商品画像向けの設計

商品画像の特性を考慮した変換パイプラインを設計します。

# NetShop 商品画像用の変換
netshop_train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224, scale=(0.7, 1.0)),
    transforms.RandomHorizontalFlip(p=0.5),
    # 商品画像は上下反転しない（不自然になるため）
    transforms.RandomRotation(degrees=10),
    transforms.ColorJitter(
        brightness=0.3,   # 照明条件のバリエーション
        contrast=0.2,
        saturation=0.2,
        hue=0.05          # 色相は控えめに（商品の色は重要）
    ),
    transforms.RandomAffine(
        degrees=0,
        translate=(0.1, 0.1),  # 商品の位置ズレ
    ),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
    transforms.RandomErasing(p=0.3),  # 部分的な遮蔽への対応
])

設計時のポイント:

ドメイン知識を活用する（商品画像は上下反転しない等）
色相の変更は控えめに（商品の色情報は分類に重要）
過度な拡張はノイズになるため、段階的に調整する

データ拡張の効果測定

# 拡張あり/なしで比較実験
results = {
    "拡張なし":        {"train_acc": 0.99, "val_acc": 0.82},
    "基本拡張":        {"train_acc": 0.95, "val_acc": 0.88},
    "基本 + Mixup":    {"train_acc": 0.92, "val_acc": 0.90},
    "RandAugment":     {"train_acc": 0.91, "val_acc": 0.91},
}

設定	学習精度	検証精度	過学習度合い
拡張なし	99%	82%	大きい
基本拡張	95%	88%	中程度
基本 + Mixup	92%	90%	小さい
RandAugment	91%	91%	ほぼなし

まとめ

データ拡張はコストゼロで汎化性能を改善できる強力な手法
学習時のみ適用し、検証・テスト時には適用しない
ドメイン知識に基づいて適切な変換を選択する
Mixup や CutMix などの高度な手法で決定境界を滑らかにできる
過度な拡張は逆効果になるため、実験で効果を確認する

チェックリスト

データ拡張の目的と効果を説明できる
torchvision.transforms で変換パイプラインを構築できる
学習時と評価時で異なる変換を使い分けられる
Mixup / CutMix の仕組みを理解した

推定読了時間: 30分