Human-in-the-Loop設計

「AIが全自動で診断する世界は、まだ来ていない。」

田中VPoEが断言する。

「AIは支援ツールだ。人間の専門家がAIの出力をレビューし、最終判断を行う。このHuman-in-the-Loop（HITL）をどう設計するかが、システムの信頼性を決める。」

HITLのパターン

パターン	説明	適用場面
全件レビュー	すべてのAI出力を人間が確認	導入初期、高リスク領域
選択的レビュー	低確信度のケースのみレビュー	運用安定期
サンプルレビュー	ランダムサンプルを定期レビュー	品質監視
例外レビュー	AIが判断不能なケースをレビュー	希少ケース対応

選択的レビューの実装

class HITLRouter:
    """Human-in-the-Loopのルーティング"""

    def __init__(self, auto_threshold=0.9, review_threshold=0.7):
        self.auto_threshold = auto_threshold
        self.review_threshold = review_threshold

    def route(self, classification, risk_level):
        """レビュー要否を判定"""
        confidence = classification['confidence']

        if risk_level in ('critical', 'high'):
            return {
                'action': 'mandatory_review',
                'reason': f'リスクレベル: {risk_level}',
                'priority': 'high',
            }

        if confidence >= self.auto_threshold:
            return {
                'action': 'auto_approve',
                'reason': f'高確信度: {confidence}',
                'priority': 'none',
            }

        if confidence >= self.review_threshold:
            return {
                'action': 'optional_review',
                'reason': f'中確信度: {confidence}',
                'priority': 'normal',
            }

        return {
            'action': 'mandatory_review',
            'reason': f'低確信度: {confidence}',
            'priority': 'high',
        }

レビューインターフェースの設計

class ReviewInterface:
    """専門家レビューインターフェース"""

    def present_for_review(self, case):
        """レビュー用の情報を構成"""
        return {
            'case_id': case['id'],
            'image_path': case['image_path'],
            'grad_cam_path': case['grad_cam_path'],
            'ai_classification': case['classification'],
            'ai_findings': case['findings'],
            'ai_risk': case['risk_level'],
            'similar_cases': case.get('similar_cases', []),
            'review_options': [
                {'action': 'agree', 'label': 'AIの判断に同意'},
                {'action': 'modify', 'label': '一部修正して承認'},
                {'action': 'reject', 'label': 'AIの判断を却下'},
                {'action': 'escalate', 'label': '上位専門家にエスカレーション'},
            ],
        }

    def process_review(self, case_id, reviewer_id, action, corrections=None):
        """レビュー結果を処理"""
        result = {
            'case_id': case_id,
            'reviewer_id': reviewer_id,
            'action': action,
            'corrections': corrections,
            'timestamp': datetime.now().isoformat(),
        }

        # フィードバックとして記録（再学習用）
        if action == 'reject' or action == 'modify':
            self._record_feedback(result)

        return result

    def _record_feedback(self, result):
        """モデル改善用のフィードバックを記録"""
        feedback = {
            'case_id': result['case_id'],
            'ai_was_wrong': result['action'] == 'reject',
            'corrections': result.get('corrections'),
            'label_for_training': result.get('corrections', {}).get('correct_class'),
        }
        # フィードバックストアに保存
        return feedback

HITL品質メトリクス

def compute_hitl_metrics(review_logs):
    """HITLの品質メトリクスを計算"""
    total = len(review_logs)
    agreed = sum(1 for r in review_logs if r['action'] == 'agree')
    modified = sum(1 for r in review_logs if r['action'] == 'modify')
    rejected = sum(1 for r in review_logs if r['action'] == 'reject')

    return {
        'total_reviews': total,
        'agreement_rate': round(agreed / total * 100, 1) if total else 0,
        'modification_rate': round(modified / total * 100, 1) if total else 0,
        'rejection_rate': round(rejected / total * 100, 1) if total else 0,
        'avg_review_time_min': round(np.mean([r.get('review_time_min', 0) for r in review_logs]), 1),
    }

まとめ

項目	ポイント
ルーティング	確信度とリスクに応じてレビュー要否を判定
レビューUI	AI出力 + Grad-CAM + 類似症例を提示
フィードバック	レビュー結果をモデル改善に活用
メトリクス	合意率/修正率/却下率で品質を監視

チェックリスト

HITLの4つのパターンを説明できる
選択的レビューのルーティングを設計できる
レビューインターフェースの要素を定義できる
フィードバックループの仕組みを理解した

次のステップへ

Human-in-the-Loop設計を学んだ。次は継続的学習を学ぼう。

推定読了時間: 30分