Ngày 11: Quantum Feature Maps cho Credit Data

🎯 Mục tiêu học tập

  • Hiểu sâu về quantum feature maps và classical feature engineering
  • Nắm vững cách quantum feature maps encode credit data
  • Implement quantum feature maps cho credit scoring
  • So sánh performance giữa quantum và classical feature engineering

📚 Lý thuyết

Feature Engineering Fundamentals

1. Classical Feature Engineering

Traditional Methods:

  • Manual Feature Creation: Domain expertise-based features
  • Statistical Features: Mean, variance, percentiles
  • Interaction Features: Cross-products, ratios
  • Polynomial Features: Higher-order terms

Limitations:

  • Manual process requiring domain expertise
  • Limited to linear and simple non-linear transformations
  • Curse of dimensionality
  • Feature selection challenges

2. Quantum Feature Maps

Quantum Advantage:

  • High-dimensional Encoding: Exponential feature space
  • Non-linear Transformations: Quantum kernel methods
  • Entanglement: Complex feature interactions
  • Quantum Parallelism: Parallel feature processing

Mathematical Foundation:

φ(x) = U(x)|0⟩^⊗ⁿ

Where:

  • φ(x): Quantum feature map
  • U(x): Parameterized quantum circuit
  • 0⟩^⊗ⁿ: Initial quantum state

Quantum Feature Map Types

1. ZZFeatureMap (Qiskit):

U_ZZ(x) = exp(iπxᵢxⱼZᵢZⱼ)

Properties:

  • Entanglement between features
  • Non-linear transformations
  • Hardware-efficient

2. PauliFeatureMap:

U_Pauli(x) = exp(iπxᵢPᵢ)

Properties:

  • Single-qubit rotations
  • Feature encoding
  • Basis for complex maps

3. Custom Feature Maps:

U_custom(x) = ∏ᵢ Rᵢ(θᵢ(x))

Properties:

  • Domain-specific encoding
  • Optimized for credit data
  • Adaptive parameters

Credit Data Encoding

1. Credit Features:

  • Demographic: Age, income, employment
  • Credit History: Payment history, utilization
  • Financial: Debt ratios, savings
  • Behavioral: Transaction patterns

2. Quantum Encoding Strategies:

Direct Encoding:

|ψ⟩ = ∏ᵢ Rᵢ(xᵢ)|0⟩

Normalized Encoding:

|ψ⟩ = ∏ᵢ Rᵢ(xᵢ/σᵢ)|0⟩

Interaction Encoding:

|ψ⟩ = ∏ᵢⱼ exp(iπxᵢxⱼZᵢZⱼ)|0⟩

💻 Thực hành

Project 11: Quantum Feature Maps cho Credit Scoring

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
from qiskit import QuantumCircuit, Aer, execute
from qiskit.circuit.library import ZZFeatureMap, PauliFeatureMap
from qiskit.quantum_info import Statevector
from qiskit_machine_learning.kernels import QuantumKernel
from qiskit_machine_learning.algorithms import VQC
import pennylane as qml

class ClassicalFeatureEngineering:
    """Classical feature engineering methods"""
    
    def __init__(self):
        self.scaler = StandardScaler()
        
    def create_basic_features(self, data):
        """
        Create basic credit features
        """
        features = data.copy()
        
        # Basic ratios
        features['debt_income_ratio'] = features['debt'] / (features['income'] + 1)
        features['credit_utilization'] = features['credit_used'] / (features['credit_limit'] + 1)
        features['payment_ratio'] = features['payments_made'] / (features['payments_due'] + 1)
        
        # Interaction features
        features['income_credit_ratio'] = features['income'] / (features['credit_limit'] + 1)
        features['debt_payment_ratio'] = features['debt'] / (features['payments_made'] + 1)
        
        # Polynomial features
        features['income_squared'] = features['income'] ** 2
        features['debt_squared'] = features['debt'] ** 2
        
        return features
    
    def create_advanced_features(self, data):
        """
        Create advanced credit features
        """
        features = self.create_basic_features(data)
        
        # Statistical features
        features['income_percentile'] = features['income'].rank(pct=True)
        features['debt_percentile'] = features['debt'].rank(pct=True)
        
        # Binning features
        features['income_bin'] = pd.cut(features['income'], bins=5, labels=False)
        features['debt_bin'] = pd.cut(features['debt'], bins=5, labels=False)
        
        # Cross features
        features['income_debt_cross'] = features['income_bin'] * features['debt_bin']
        
        return features
    
    def normalize_features(self, features):
        """
        Normalize features
        """
        # Remove non-numeric columns
        numeric_features = features.select_dtypes(include=[np.number])
        
        # Normalize
        normalized_features = self.scaler.fit_transform(numeric_features)
        
        return pd.DataFrame(normalized_features, columns=numeric_features.columns)

class QuantumFeatureMaps:
    """Quantum feature maps implementation"""
    
    def __init__(self, num_qubits=4):
        self.num_qubits = num_qubits
        self.backend = Aer.get_backend('statevector_simulator')
        
    def create_zz_feature_map(self, data, reps=2):
        """
        Create ZZFeatureMap for credit data
        """
        # Normalize data to [0, 1]
        normalized_data = self._normalize_data(data)
        
        # Create ZZFeatureMap
        feature_map = ZZFeatureMap(
            feature_dimension=len(normalized_data.columns),
            reps=reps
        )
        
        return feature_map, normalized_data
    
    def create_pauli_feature_map(self, data, paulis=['Z', 'X']):
        """
        Create PauliFeatureMap for credit data
        """
        # Normalize data
        normalized_data = self._normalize_data(data)
        
        # Create PauliFeatureMap
        feature_map = PauliFeatureMap(
            feature_dimension=len(normalized_data.columns),
            paulis=paulis
        )
        
        return feature_map, normalized_data
    
    def create_custom_credit_feature_map(self, data):
        """
        Create custom feature map optimized for credit data
        """
        # Normalize data
        normalized_data = self._normalize_data(data)
        
        # Create custom circuit
        circuit = QuantumCircuit(self.num_qubits)
        
        # Encode each feature
        for i, (col, values) in enumerate(normalized_data.iterrows()):
            if i < self.num_qubits:
                # Apply rotation based on feature value
                angle = values.mean() * np.pi
                circuit.rx(angle, i)
                
                # Add phase rotation
                phase = values.std() * np.pi
                circuit.rz(phase, i)
        
        # Add entanglement between related features
        # Income and debt
        if 'income' in normalized_data.index and 'debt' in normalized_data.index:
            circuit.cx(0, 1)
        
        # Credit utilization and payment history
        if 'credit_utilization' in normalized_data.index and 'payment_ratio' in normalized_data.index:
            circuit.cx(2, 3)
        
        return circuit, normalized_data
    
    def _normalize_data(self, data):
        """
        Normalize data to [0, 1] range
        """
        normalized = data.copy()
        
        for col in normalized.columns:
            if col != 'default':
                min_val = normalized[col].min()
                max_val = normalized[col].max()
                if max_val > min_val:
                    normalized[col] = (normalized[col] - min_val) / (max_val - min_val)
        
        return normalized
    
    def encode_data_quantum(self, feature_map, data):
        """
        Encode data using quantum feature map
        """
        # Create quantum kernel
        quantum_kernel = QuantumKernel(
            feature_map=feature_map,
            quantum_instance=self.backend
        )
        
        # Encode data
        encoded_data = quantum_kernel.evaluate(x_vec=data.values)
        
        return encoded_data
    
    def extract_quantum_features(self, feature_map, data, n_samples=100):
        """
        Extract quantum features from feature map
        """
        quantum_features = []
        
        for i in range(min(n_samples, len(data))):
            # Create circuit for this sample
            circuit = feature_map.bind_parameters(data.iloc[i].values)
            
            # Get statevector
            job = execute(circuit, self.backend)
            result = job.result()
            statevector = result.get_statevector()
            
            # Extract features from statevector
            features = np.abs(statevector) ** 2  # Probabilities
            quantum_features.append(features)
        
        return np.array(quantum_features)

def generate_credit_data(n_samples=1000):
    """
    Generate synthetic credit data
    """
    np.random.seed(42)
    
    # Generate features
    income = np.random.normal(50000, 20000, n_samples)
    debt = np.random.uniform(10000, 100000, n_samples)
    credit_used = np.random.uniform(1000, 50000, n_samples)
    credit_limit = np.random.uniform(5000, 100000, n_samples)
    payments_made = np.random.uniform(0, 12, n_samples)
    payments_due = np.random.uniform(1, 12, n_samples)
    age = np.random.uniform(25, 65, n_samples)
    employment_years = np.random.uniform(0, 30, n_samples)
    
    # Create DataFrame
    data = pd.DataFrame({
        'income': income,
        'debt': debt,
        'credit_used': credit_used,
        'credit_limit': credit_limit,
        'payments_made': payments_made,
        'payments_due': payments_due,
        'age': age,
        'employment_years': employment_years
    })
    
    # Create target variable
    debt_income_ratio = data['debt'] / (data['income'] + 1)
    credit_utilization = data['credit_used'] / (data['credit_limit'] + 1)
    payment_ratio = data['payments_made'] / (data['payments_due'] + 1)
    
    default_prob = (0.3 * debt_income_ratio + 
                   0.4 * credit_utilization + 
                   0.3 * (1 - payment_ratio))
    
    default_prob += np.random.normal(0, 0.1, n_samples)
    default_prob = np.clip(default_prob, 0, 1)
    
    data['default'] = (default_prob > 0.5).astype(int)
    
    return data

def compare_feature_engineering():
    """
    Compare classical and quantum feature engineering
    """
    print("=== Classical vs Quantum Feature Engineering ===\n")
    
    # Generate data
    data = generate_credit_data(500)
    
    # Classical feature engineering
    print("1. Classical Feature Engineering:")
    cfe = ClassicalFeatureEngineering()
    
    # Basic features
    basic_features = cfe.create_basic_features(data)
    print(f"   Basic Features Shape: {basic_features.shape}")
    
    # Advanced features
    advanced_features = cfe.create_advanced_features(data)
    print(f"   Advanced Features Shape: {advanced_features.shape}")
    
    # Normalize features
    normalized_features = cfe.normalize_features(advanced_features)
    print(f"   Normalized Features Shape: {normalized_features.shape}")
    
    # Quantum feature maps
    print("\n2. Quantum Feature Maps:")
    qfm = QuantumFeatureMaps(num_qubits=4)
    
    # Select subset of features for quantum encoding
    quantum_data = data[['income', 'debt', 'credit_used', 'credit_limit']].copy()
    
    # ZZFeatureMap
    zz_map, zz_data = qfm.create_zz_feature_map(quantum_data)
    print(f"   ZZFeatureMap Circuit Depth: {zz_map.depth()}")
    
    # PauliFeatureMap
    pauli_map, pauli_data = qfm.create_pauli_feature_map(quantum_data)
    print(f"   PauliFeatureMap Circuit Depth: {pauli_map.depth()}")
    
    # Custom feature map
    custom_map, custom_data = qfm.create_custom_credit_feature_map(quantum_data)
    print(f"   Custom Feature Map Circuit Depth: {custom_map.depth()}")
    
    # Extract quantum features
    quantum_features_zz = qfm.extract_quantum_features(zz_map, zz_data, n_samples=100)
    quantum_features_pauli = qfm.extract_quantum_features(pauli_map, pauli_data, n_samples=100)
    quantum_features_custom = qfm.extract_quantum_features(custom_map, custom_data, n_samples=100)
    
    print(f"   ZZFeatureMap Features Shape: {quantum_features_zz.shape}")
    print(f"   PauliFeatureMap Features Shape: {quantum_features_pauli.shape}")
    print(f"   Custom Feature Map Features Shape: {quantum_features_custom.shape}")
    
    # Compare feature spaces
    print(f"\n3. Feature Space Comparison:")
    print(f"   Classical Features: {normalized_features.shape[1]} dimensions")
    print(f"   Quantum ZZ Features: {quantum_features_zz.shape[1]} dimensions")
    print(f"   Quantum Pauli Features: {quantum_features_pauli.shape[1]} dimensions")
    print(f"   Quantum Custom Features: {quantum_features_custom.shape[1]} dimensions")
    
    return (normalized_features, quantum_features_zz, 
            quantum_features_pauli, quantum_features_custom)

def quantum_feature_analysis():
    """
    Analyze quantum feature properties
    """
    print("=== Quantum Feature Analysis ===\n")
    
    # Generate data
    data = generate_credit_data(200)
    quantum_data = data[['income', 'debt', 'credit_used', 'credit_limit']].copy()
    
    # Create quantum feature maps
    qfm = QuantumFeatureMaps(num_qubits=4)
    
    # Test different feature maps
    feature_maps = {
        'ZZFeatureMap': qfm.create_zz_feature_map(quantum_data),
        'PauliFeatureMap': qfm.create_pauli_feature_map(quantum_data),
        'CustomFeatureMap': qfm.create_custom_credit_feature_map(quantum_data)
    }
    
    # Analyze each feature map
    for name, (feature_map, normalized_data) in feature_maps.items():
        print(f"1. {name} Analysis:")
        
        # Extract features
        quantum_features = qfm.extract_quantum_features(feature_map, normalized_data, n_samples=50)
        
        # Calculate feature statistics
        feature_mean = np.mean(quantum_features, axis=0)
        feature_std = np.std(quantum_features, axis=0)
        feature_corr = np.corrcoef(quantum_features.T)
        
        print(f"   Feature Mean Range: [{feature_mean.min():.4f}, {feature_mean.max():.4f}]")
        print(f"   Feature Std Range: [{feature_std.min():.4f}, {feature_std.max():.4f}]")
        print(f"   Average Correlation: {np.mean(np.abs(feature_corr - np.eye(feature_corr.shape[0]))):.4f}")
        
        # Analyze entanglement
        entanglement_score = calculate_entanglement_score(quantum_features)
        print(f"   Entanglement Score: {entanglement_score:.4f}")
        
        print()
    
    return feature_maps

def calculate_entanglement_score(quantum_features):
    """
    Calculate entanglement score for quantum features
    """
    # Simplified entanglement measure based on feature correlations
    corr_matrix = np.corrcoef(quantum_features.T)
    
    # Remove diagonal elements
    corr_off_diag = corr_matrix - np.eye(corr_matrix.shape[0])
    
    # Calculate entanglement as average absolute correlation
    entanglement = np.mean(np.abs(corr_off_diag))
    
    return entanglement

def quantum_feature_selection():
    """
    Implement quantum feature selection
    """
    print("=== Quantum Feature Selection ===\n")
    
    # Generate data
    data = generate_credit_data(300)
    quantum_data = data[['income', 'debt', 'credit_used', 'credit_limit']].copy()
    
    # Create quantum feature map
    qfm = QuantumFeatureMaps(num_qubits=4)
    feature_map, normalized_data = qfm.create_zz_feature_map(quantum_data)
    
    # Extract quantum features
    quantum_features = qfm.extract_quantum_features(feature_map, normalized_data, n_samples=100)
    
    # Feature importance based on variance
    feature_variance = np.var(quantum_features, axis=0)
    feature_importance = feature_variance / np.sum(feature_variance)
    
    print("Feature Importance (based on variance):")
    for i, importance in enumerate(feature_importance):
        print(f"   Feature {i}: {importance:.4f}")
    
    # Select top features
    top_features_idx = np.argsort(feature_importance)[-2:]  # Top 2 features
    selected_features = quantum_features[:, top_features_idx]
    
    print(f"\nSelected Features Shape: {selected_features.shape}")
    
    # Compare with classical feature selection
    from sklearn.feature_selection import SelectKBest, f_classif
    
    X = normalized_data.drop('default', axis=1, errors='ignore')
    y = data['default']
    
    # Classical feature selection
    selector = SelectKBest(score_func=f_classif, k=2)
    X_selected = selector.fit_transform(X, y)
    
    print(f"Classical Selected Features Shape: {X_selected.shape}")
    
    return selected_features, X_selected

# Exercise: Quantum Feature Map Optimization
def quantum_feature_map_optimization():
    """
    Exercise: Optimize quantum feature map parameters
    """
    from scipy.optimize import minimize
    
    def objective_function(params):
        """
        Objective function for feature map optimization
        """
        reps, entanglement_factor = params
        
        # Generate data
        data = generate_credit_data(100)
        quantum_data = data[['income', 'debt', 'credit_used', 'credit_limit']].copy()
        
        # Create feature map with parameters
        qfm = QuantumFeatureMaps(num_qubits=4)
        
        # Create custom feature map with parameters
        circuit = QuantumCircuit(4)
        
        # Apply rotations with optimized parameters
        for i, (col, values) in enumerate(quantum_data.iterrows()):
            if i < 4:
                angle = values.mean() * reps * np.pi
                circuit.rx(angle, i)
        
        # Add entanglement based on parameter
        for i in range(3):
            circuit.cx(i, i + 1)
            circuit.rz(entanglement_factor * np.pi, i)
        
        # Calculate feature quality (simplified)
        # In practice, this would be based on downstream task performance
        feature_quality = 1 / (1 + abs(reps - 2) + abs(entanglement_factor - 0.5))
        
        return -feature_quality  # Minimize negative quality
    
    # Optimize parameters
    initial_params = [2, 0.5]  # Initial reps, entanglement_factor
    bounds = [(1, 5), (0, 1)]  # Parameter bounds
    
    result = minimize(objective_function, initial_params, bounds=bounds)
    
    print("=== Quantum Feature Map Optimization ===")
    print(f"Optimal Reps: {result.x[0]:.2f}")
    print(f"Optimal Entanglement Factor: {result.x[1]:.2f}")
    print(f"Optimization Success: {result.success}")
    
    return result

# Run demos
if __name__ == "__main__":
    print("Running Feature Engineering Comparisons...")
    classical_features, zz_features, pauli_features, custom_features = compare_feature_engineering()
    
    print("\nRunning Quantum Feature Analysis...")
    feature_maps = quantum_feature_analysis()
    
    print("\nRunning Quantum Feature Selection...")
    selected_features, classical_selected = quantum_feature_selection()
    
    print("\nRunning Feature Map Optimization...")
    opt_result = quantum_feature_map_optimization()

Exercise 2: Quantum Feature Map Visualization

def visualize_quantum_features():
    """
    Visualize quantum feature maps
    """
    # Generate data
    data = generate_credit_data(100)
    quantum_data = data[['income', 'debt', 'credit_used', 'credit_limit']].copy()
    
    # Create quantum feature maps
    qfm = QuantumFeatureMaps(num_qubits=4)
    
    # Create different feature maps
    zz_map, zz_data = qfm.create_zz_feature_map(quantum_data)
    pauli_map, pauli_data = qfm.create_pauli_feature_map(quantum_data)
    custom_map, custom_data = qfm.create_custom_credit_feature_map(quantum_data)
    
    # Extract features
    zz_features = qfm.extract_quantum_features(zz_map, zz_data, n_samples=50)
    pauli_features = qfm.extract_quantum_features(pauli_map, pauli_data, n_samples=50)
    custom_features = qfm.extract_quantum_features(custom_map, custom_data, n_samples=50)
    
    # Visualize feature distributions
    plt.figure(figsize=(15, 10))
    
    # ZZFeatureMap features
    plt.subplot(3, 3, 1)
    plt.hist(zz_features[:, 0], bins=20, alpha=0.7, label='Feature 0')
    plt.hist(zz_features[:, 1], bins=20, alpha=0.7, label='Feature 1')
    plt.title('ZZFeatureMap - Features 0,1')
    plt.legend()
    
    plt.subplot(3, 3, 2)
    plt.hist(zz_features[:, 2], bins=20, alpha=0.7, label='Feature 2')
    plt.hist(zz_features[:, 3], bins=20, alpha=0.7, label='Feature 3')
    plt.title('ZZFeatureMap - Features 2,3')
    plt.legend()
    
    plt.subplot(3, 3, 3)
    plt.scatter(zz_features[:, 0], zz_features[:, 1], alpha=0.6)
    plt.xlabel('Feature 0')
    plt.ylabel('Feature 1')
    plt.title('ZZFeatureMap - Feature Correlation')
    
    # PauliFeatureMap features
    plt.subplot(3, 3, 4)
    plt.hist(pauli_features[:, 0], bins=20, alpha=0.7, label='Feature 0')
    plt.hist(pauli_features[:, 1], bins=20, alpha=0.7, label='Feature 1')
    plt.title('PauliFeatureMap - Features 0,1')
    plt.legend()
    
    plt.subplot(3, 3, 5)
    plt.hist(pauli_features[:, 2], bins=20, alpha=0.7, label='Feature 2')
    plt.hist(pauli_features[:, 3], bins=20, alpha=0.7, label='Feature 3')
    plt.title('PauliFeatureMap - Features 2,3')
    plt.legend()
    
    plt.subplot(3, 3, 6)
    plt.scatter(pauli_features[:, 0], pauli_features[:, 1], alpha=0.6)
    plt.xlabel('Feature 0')
    plt.ylabel('Feature 1')
    plt.title('PauliFeatureMap - Feature Correlation')
    
    # Custom feature map
    plt.subplot(3, 3, 7)
    plt.hist(custom_features[:, 0], bins=20, alpha=0.7, label='Feature 0')
    plt.hist(custom_features[:, 1], bins=20, alpha=0.7, label='Feature 1')
    plt.title('Custom Feature Map - Features 0,1')
    plt.legend()
    
    plt.subplot(3, 3, 8)
    plt.hist(custom_features[:, 2], bins=20, alpha=0.7, label='Feature 2')
    plt.hist(custom_features[:, 3], bins=20, alpha=0.7, label='Feature 3')
    plt.title('Custom Feature Map - Features 2,3')
    plt.legend()
    
    plt.subplot(3, 3, 9)
    plt.scatter(custom_features[:, 0], custom_features[:, 1], alpha=0.6)
    plt.xlabel('Feature 0')
    plt.ylabel('Feature 1')
    plt.title('Custom Feature Map - Feature Correlation')
    
    plt.tight_layout()
    plt.show()
    
    return zz_features, pauli_features, custom_features

# Run visualization
if __name__ == "__main__":
    zz_features, pauli_features, custom_features = visualize_quantum_features()

📊 Kết quả và Phân tích

Quantum Feature Maps Advantages:

1. High-dimensional Encoding:

  • Exponential Feature Space: 2^n dimensions for n qubits
  • Non-linear Transformations: Quantum kernel methods
  • Feature Interactions: Entanglement captures complex relationships

2. Credit-specific Benefits:

  • Risk Factor Encoding: Quantum encoding of risk factors
  • Correlation Modeling: Entanglement models credit correlations
  • Non-linear Patterns: Captures complex credit relationships

3. Performance Improvements:

  • Better Separability: Quantum features improve classification
  • Reduced Overfitting: High-dimensional quantum space
  • Feature Selection: Quantum feature importance

Comparison với Classical Feature Engineering:

Classical Limitations:

  • Manual feature creation
  • Limited non-linear transformations
  • Curse of dimensionality
  • Feature selection challenges

Quantum Advantages:

  • Automatic feature generation
  • Rich non-linear transformations
  • High-dimensional feature space
  • Quantum feature selection

🎯 Bài tập về nhà

Exercise 1: Implement quantum feature map calibration cho credit data

Exercise 2: Build quantum feature maps cho network-based credit models

Exercise 3: Develop optimization algorithms cho quantum feature map parameters

Exercise 4: Create validation framework cho quantum feature maps


“Quantum feature maps provide exponential feature spaces that can capture complex, non-linear relationships in credit data.” - Quantum Finance Research

Ngày tiếp theo: Quantum Support Vector Machines