PyTorch基礎

田中VPoE:「Step 1 では NumPy でニューラルネットワークをスクラッチ実装したね。実務では、自動微分や GPU 対応を備えた深層学習フレームワークを使う。今回は PyTorch を学ぼう。」

あなた:「TensorFlow もありますが、PyTorch を選ぶ理由は？」

田中VPoE:「PyTorch は Python ライクな書き方ができて直感的だし、研究コミュニティでのシェアが非常に高い。Hugging Face のモデルもほぼ PyTorch ベースだ。実務と研究の両方で使える。」

Tensor の基本操作

PyTorch の中心的なデータ構造は Tensor です。NumPy の ndarray に似ていますが、GPU 上での計算と自動微分に対応しています。

Tensor の作成

import torch
import numpy as np

# 基本的な作成方法
t1 = torch.tensor([1, 2, 3])                    # リストから
t2 = torch.zeros(3, 4)                           # ゼロ行列
t3 = torch.ones(2, 3)                            # 1行列
t4 = torch.randn(3, 3)                           # 標準正規分布
t5 = torch.arange(0, 10, 2)                      # 等差数列

# NumPy からの変換
np_array = np.array([1.0, 2.0, 3.0])
t6 = torch.from_numpy(np_array)                  # NumPy → Tensor
np_back = t6.numpy()                             # Tensor → NumPy

# データ型の指定
t7 = torch.tensor([1, 2, 3], dtype=torch.float32)

print(f"形状: {t4.shape}")
print(f"データ型: {t4.dtype}")
print(f"デバイス: {t4.device}")

Tensor の演算

a = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
b = torch.tensor([[5.0, 6.0], [7.0, 8.0]])

# 要素ごとの演算
print(a + b)         # 加算
print(a * b)         # 要素積
print(a ** 2)        # べき乗

# 行列演算
print(a @ b)         # 行列積
print(torch.matmul(a, b))  # 同上

# リダクション
print(a.sum())       # 全要素の和
print(a.mean(dim=0)) # 列方向の平均
print(a.max(dim=1))  # 行方向の最大値

# 形状変換
c = torch.randn(2, 3, 4)
print(c.reshape(6, 4).shape)   # (6, 4)
print(c.view(2, 12).shape)     # (2, 12)
print(c.permute(2, 0, 1).shape) # (4, 2, 3)

GPU 活用

PyTorch では .to(device) でデータやモデルを GPU に転送できます。

# GPU が利用可能か確認
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"使用デバイス: {device}")

# Tensor を GPU に転送
x = torch.randn(1000, 1000)
x_gpu = x.to(device)

# GPU 上で計算
y_gpu = x_gpu @ x_gpu.T  # GPU 上で行列積

# CPU に戻す
y_cpu = y_gpu.cpu()

操作	CPU	GPU
小規模行列積	高速	オーバーヘッドが大きい
大規模行列積	遅い	非常に高速
データ転送	-	CPU⇔GPU転送にコストあり

自動微分（autograd）

PyTorch の最大の特徴の1つが自動微分です。requires_grad=True を設定した Tensor の演算は自動的に計算グラフに記録され、.backward() で勾配が計算されます。

# 自動微分の基本
x = torch.tensor(3.0, requires_grad=True)
y = x ** 2 + 2 * x + 1  # y = x^2 + 2x + 1

y.backward()  # dy/dx を計算
print(f"x = {x.item()}")
print(f"y = {y.item()}")
print(f"dy/dx = {x.grad.item()}")  # dy/dx = 2x + 2 = 8

ニューラルネットワークでの自動微分

# 2層ネットワークの勾配計算
torch.manual_seed(42)

# 入力とラベル
X = torch.randn(4, 2)
y = torch.tensor([0, 1, 1, 0], dtype=torch.float32).unsqueeze(1)

# パラメータ（勾配追跡を有効化）
W1 = torch.randn(2, 4, requires_grad=True)
b1 = torch.zeros(1, 4, requires_grad=True)
W2 = torch.randn(4, 1, requires_grad=True)
b2 = torch.zeros(1, 1, requires_grad=True)

# 順伝播
z1 = X @ W1 + b1
a1 = torch.relu(z1)
z2 = a1 @ W2 + b2
a2 = torch.sigmoid(z2)

# 損失計算
loss = -torch.mean(y * torch.log(a2 + 1e-7) + (1 - y) * torch.log(1 - a2 + 1e-7))

# 逆伝播（全パラメータの勾配が自動計算される）
loss.backward()

print(f"損失: {loss.item():.4f}")
print(f"W1 の勾配: {W1.grad.shape}")  # NumPy で手動計算した dW1 に相当
print(f"W2 の勾配: {W2.grad.shape}")

勾配の管理

# 勾配のリセット（学習ループでは毎回必要）
W1.grad.zero_()

# 勾配計算を無効化（推論時）
with torch.no_grad():
    predictions = torch.sigmoid(X @ W1 + b1)

# 勾配追跡からの切り離し
detached = a2.detach()  # 計算グラフから切り離す

NumPy vs PyTorch 対応表

操作	NumPy	PyTorch
配列作成	`np.array()`	`torch.tensor()`
ゼロ行列	`np.zeros()`	`torch.zeros()`
乱数	`np.random.randn()`	`torch.randn()`
形状変換	`.reshape()`	`.reshape()` / `.view()`
行列積	`a @ b`	`a @ b`
最大値	`np.max()`	`torch.max()`
GPU 対応	不可	`.to('cuda')`
自動微分	不可	`.backward()`

まとめ

PyTorch の Tensor は NumPy の ndarray に類似し、GPU 計算と自動微分に対応
.to(device) で CPU/GPU 間のデータ転送が可能
requires_grad=True と .backward() で勾配が自動計算される
推論時は torch.no_grad() で不要な勾配計算を省く

チェックリスト

Tensor の作成と基本演算ができる
NumPy と Tensor の相互変換ができる
GPU へのデータ転送の方法を理解した
autograd による自動微分の仕組みを理解した

推定読了時間: 30分