RAGアーキテクチャの基礎 - L0 カリキュラム

ストーリー

佐

佐藤CTO

LLMの選定基準は分かった。では次の問題だ

佐

佐藤CTO

LLMは汎用的な知識を持っているが、我々の社内ドキュメントの内容は知らない。ファインチューニングという手もあるが、コストが高く、データが更新されるたびに再訓練が必要だ

あなた

そこでRAGですね？

あ

佐

佐藤CTO

その通り。Retrieval-Augmented Generation。検索で関連情報を取得し、それをLLMに渡して回答を生成する。シンプルだが、本番品質にするには設計が重要だ。今日はその基礎を徹底的に押さえよう

RAGとは何か

なぜRAGが必要なのか

LLMには以下の根本的な限界があります。

課題	説明	RAGによる解決
知識のカットオフ	訓練データの時点までしか知らない	最新データをリアルタイムに参照
ハルシネーション	もっともらしい嘘をつく	根拠となるソースを明示
専門知識の不足	社内固有の知識がない	社内ドキュメントを検索して回答
コストの問題	ファインチューニングは高コスト	追加訓練不要

RAG vs ファインチューニング vs プロンプトエンジニアリング

graph TD
    Title["知識の注入方法"]
    Title --- PE & RAG & FT

    subgraph PE["プロンプトエンジニアリング"]
        PE1["コンテキストに<br/>直接情報を埋め込む"]
        PE2["小規模な知識<br/>即座に適用可能<br/>コンテキスト長制限"]
    end

    subgraph RAG["RAG"]
        RAG1["検索して関連<br/>情報を動的に注入"]
        RAG2["大規模/動的な知識<br/>リアルタイム更新<br/>スケーラブル"]
    end

    subgraph FT["ファインチューニング"]
        FT1["モデル自体に<br/>知識を学習させる"]
        FT2["行動パターンの変更<br/>高コスト・時間がかかる<br/>データの陳腐化リスク"]
    end

    classDef title fill:#1e40af,stroke:#1e40af,color:#fff,font-weight:bold
    classDef pe fill:#dbeafe,stroke:#3b82f6
    classDef rag fill:#f0fdf4,stroke:#22c55e
    classDef ft fill:#fef3c7,stroke:#f59e0b
    class Title title
    class PE1,PE2 pe
    class RAG1,RAG2 rag
    class FT1,FT2 ft

RAGパイプラインの全体像

3つのフェーズ

RAGは大きく Index（インデックス）、Retrieve（検索）、Generate（生成） の3フェーズで構成されます。

graph TD
    subgraph Phase1["Phase 1: Index（オフライン・事前処理）"]
        D1[ドキュメント] --> C1[チャンキング] --> E1[エンベディング] --> V1[ベクトルDB]
    end

    subgraph Phase2["Phase 2: Retrieve（検索・ランタイム）"]
        Q1[ユーザークエリ] --> E2[エンベディング] --> S1[類似検索] --> R1[関連チャンク]
    end

    subgraph Phase3["Phase 3: Generate（生成・ランタイム）"]
        P1[プロンプト + 関連チャンク] --> L1[LLM] --> A1[回答生成]
    end

    Phase1 --> Phase2 --> Phase3

    classDef phaseStyle fill:#e8f4fd,stroke:#1a73e8
    class Phase1,Phase2,Phase3 phaseStyle

TypeScript実装の全体像

import { OpenAI } from 'openai';

interface Document {
  id: string;
  content: string;
  metadata: Record<string, unknown>;
}

interface Chunk {
  id: string;
  documentId: string;
  content: string;
  embedding?: number[];
  metadata: Record<string, unknown>;
}

interface RetrievedContext {
  chunk: Chunk;
  score: number;
}

class RAGPipeline {
  constructor(
    private readonly embedder: EmbeddingService,
    private readonly vectorStore: VectorStore,
    private readonly llm: LLMService,
  ) {}

  // Phase 1: Index
  async indexDocuments(documents: Document[]): Promise<void> {
    for (const doc of documents) {
      const chunks = this.chunkDocument(doc);
      const embeddings = await this.embedder.embedBatch(
        chunks.map(c => c.content)
      );

      for (let i = 0; i < chunks.length; i++) {
        chunks[i].embedding = embeddings[i];
      }

      await this.vectorStore.upsert(chunks);
    }
  }

  // Phase 2: Retrieve
  async retrieve(query: string, topK: number = 5): Promise<RetrievedContext[]> {
    const queryEmbedding = await this.embedder.embed(query);
    return this.vectorStore.search(queryEmbedding, topK);
  }

  // Phase 3: Generate
  async generate(query: string): Promise<string> {
    const contexts = await this.retrieve(query);
    const contextText = contexts
      .map(c => c.chunk.content)
      .join('\n\n---\n\n');

    const prompt = `以下の情報を参考にして、質問に回答してください。
情報に含まれていない内容については「その情報は見つかりませんでした」と回答してください。

## 参考情報
${contextText}

## 質問
${query}

## 回答`;

    return this.llm.complete(prompt);
  }

  private chunkDocument(doc: Document): Chunk[] {
    // チャンキング処理（次のセクションで詳細を解説）
    return splitIntoChunks(doc.content, { chunkSize: 512, overlap: 50 });
  }
}

チャンキング戦略

なぜチャンキングが重要なのか

チャンキングはRAGの品質を左右する最も重要な前処理です。チャンクが大きすぎると検索精度が下がり、小さすぎると文脈が失われます。

主要なチャンキング手法

手法	説明	適用場面
固定長分割	トークン数で均等に分割	シンプルなテキスト
セマンティック分割	意味の切れ目で分割	構造的なドキュメント
再帰的分割	階層的なセパレーターで分割	汎用的
文書構造ベース	見出し・段落単位で分割	Markdown, HTML

実装例

interface ChunkingConfig {
  chunkSize: number;       // チャンクのトークン数
  chunkOverlap: number;    // オーバーラップのトークン数
  separators?: string[];   // 分割に使うセパレーター
}

// 再帰的チャンキング
function recursiveChunk(
  text: string,
  config: ChunkingConfig
): string[] {
  const { chunkSize, chunkOverlap } = config;
  const separators = config.separators ?? [
    '\n\n',    // 段落
    '\n',      // 改行
    '。',      // 文末（日本語）
    '. ',      // 文末（英語）
    ' ',       // 単語
    '',        // 文字
  ];

  function split(text: string, sepIndex: number): string[] {
    if (text.length <= chunkSize) return [text];
    if (sepIndex >= separators.length) {
      // 最後のセパレーターでも分割できない場合は強制分割
      return forceChunk(text, chunkSize, chunkOverlap);
    }

    const separator = separators[sepIndex];
    const parts = text.split(separator);

    const chunks: string[] = [];
    let currentChunk = '';

    for (const part of parts) {
      const candidate = currentChunk
        ? currentChunk + separator + part
        : part;

      if (candidate.length <= chunkSize) {
        currentChunk = candidate;
      } else {
        if (currentChunk) chunks.push(currentChunk);
        // 現在のパーツが大きすぎる場合は再帰的に分割
        if (part.length > chunkSize) {
          chunks.push(...split(part, sepIndex + 1));
          currentChunk = '';
        } else {
          currentChunk = part;
        }
      }
    }

    if (currentChunk) chunks.push(currentChunk);
    return chunks;
  }

  const rawChunks = split(text, 0);
  return addOverlap(rawChunks, chunkOverlap);
}

// オーバーラップの追加
function addOverlap(chunks: string[], overlapSize: number): string[] {
  if (overlapSize === 0 || chunks.length <= 1) return chunks;

  return chunks.map((chunk, i) => {
    if (i === 0) return chunk;
    const prevChunk = chunks[i - 1];
    const overlapText = prevChunk.slice(-overlapSize);
    return overlapText + chunk;
  });
}

チャンクサイズのガイドライン

graph TD
    Title["チャンクサイズの選択"]
    Title --> Small & Medium & Large

    Small["小さい（128-256 tokens）<br/>検索精度: 高い（ピンポイント）<br/>文脈の完全性: 低い（断片的）<br/>適用: Q&A、定義の検索"]
    Medium["中間（256-512 tokens）← 推奨<br/>検索精度: バランス<br/>文脈の完全性: バランス<br/>適用: 一般的なナレッジベース"]
    Large["大きい（512-1024 tokens）<br/>検索精度: 低い（ノイズ混入）<br/>文脈の完全性: 高い（文脈保持）<br/>適用: 長い技術文書、法律文書"]

    classDef title fill:#1e40af,stroke:#1e40af,color:#fff,font-weight:bold
    classDef small fill:#dbeafe,stroke:#3b82f6
    classDef medium fill:#f0fdf4,stroke:#22c55e,font-weight:bold
    classDef large fill:#fef3c7,stroke:#f59e0b
    class Title title
    class Small small
    class Medium medium
    class Large large

Markdown文書の構造ベースチャンキング

interface MarkdownChunk {
  content: string;
  heading: string;
  level: number;
  metadata: {
    headingPath: string[];  // e.g., ["# API設計", "## 認証", "### JWT"]
  };
}

function chunkMarkdown(markdown: string): MarkdownChunk[] {
  const lines = markdown.split('\n');
  const chunks: MarkdownChunk[] = [];
  let currentChunk: string[] = [];
  let headingStack: { text: string; level: number }[] = [];

  for (const line of lines) {
    const headingMatch = line.match(/^(#{1,6})\s+(.+)/);

    if (headingMatch) {
      // 前のチャンクを保存
      if (currentChunk.length > 0) {
        chunks.push({
          content: currentChunk.join('\n'),
          heading: headingStack[headingStack.length - 1]?.text ?? '',
          level: headingStack[headingStack.length - 1]?.level ?? 0,
          metadata: {
            headingPath: headingStack.map(h => h.text),
          },
        });
        currentChunk = [];
      }

      const level = headingMatch[1].length;
      const text = headingMatch[2];

      // 見出しスタックを更新
      while (
        headingStack.length > 0 &&
        headingStack[headingStack.length - 1].level >= level
      ) {
        headingStack.pop();
      }
      headingStack.push({ text, level });
    }

    currentChunk.push(line);
  }

  // 最後のチャンク
  if (currentChunk.length > 0) {
    chunks.push({
      content: currentChunk.join('\n'),
      heading: headingStack[headingStack.length - 1]?.text ?? '',
      level: headingStack[headingStack.length - 1]?.level ?? 0,
      metadata: {
        headingPath: headingStack.map(h => h.text),
      },
    });
  }

  return chunks;
}

メタデータの設計

メタデータの重要性

チャンクにメタデータを付与することで、検索時のフィルタリングや回答の出典表示が可能になります。

interface ChunkMetadata {
  // ドキュメント情報
  documentId: string;
  documentTitle: string;
  source: string;           // ファイルパスやURL

  // 構造情報
  sectionTitle: string;
  headingPath: string[];
  pageNumber?: number;

  // 分類情報
  category: string;         // "engineering", "hr", "legal"
  department: string;
  accessLevel: 'public' | 'internal' | 'confidential';

  // 時間情報
  createdAt: Date;
  updatedAt: Date;

  // チャンク情報
  chunkIndex: number;
  totalChunks: number;
}

フィルタリングの活用

// メタデータフィルタを使った検索
const results = await vectorStore.search(queryEmbedding, {
  topK: 10,
  filter: {
    category: 'engineering',
    accessLevel: { $in: ['public', 'internal'] },
    updatedAt: { $gte: new Date('2024-01-01') },
  },
});

RAGプロンプトの設計

基本テンプレート

function buildRAGPrompt(
  query: string,
  contexts: RetrievedContext[],
  systemInstruction?: string
): string {
  const contextSection = contexts
    .map((ctx, i) => {
      const source = ctx.chunk.metadata.source ?? '不明';
      return `[出典${i + 1}: ${source}]\n${ctx.chunk.content}`;
    })
    .join('\n\n---\n\n');

  return `${systemInstruction ?? 'あなたは正確で信頼性の高いAIアシスタントです。'}

## ルール
1. 以下の「参考情報」のみを根拠にして回答してください
2. 参考情報に含まれていない内容には「その情報は見つかりませんでした」と回答してください
3. 回答の根拠となった出典番号を [出典N] の形式で引用してください
4. 推測や一般知識での補完は行わないでください

## 参考情報
${contextSection}

## ユーザーの質問
${query}

## 回答`;
}

ハルシネーション対策のポイント

対策	説明
出典の明示	回答に出典番号を付けて根拠を示す
「分からない」の許容	情報がない場合は正直に伝える指示
温度パラメータ	temperature = 0 に設定して確定的な出力
後処理での検証	回答が検索結果と矛盾しないかチェック

まとめ

ポイント	内容
RAGの3フェーズ	Index（事前処理）→ Retrieve（検索）→ Generate（生成）
チャンキング	RAGの品質を決定する最重要の前処理。256-512トークンから始める
メタデータ	フィルタリングと出典表示のために必須
プロンプト設計	ハルシネーション対策として出典明示と「分からない」を許容

チェックリスト

RAGの3つのフェーズ（Index→Retrieve→Generate）を理解した
チャンキング戦略の選択肢と適用場面を理解した
メタデータ設計の重要性を理解した
ハルシネーション対策のプロンプト設計を理解した

次のステップへ

RAGの基礎を押さえました。次のセクションでは、本番品質を実現するための高度なRAGパターンを学びます。検索精度を飛躍的に向上させるテクニックが待っています。

基礎がしっかりしていれば、応用は自然と効いてきます。

推定読了時間: 40分