Ngày 14: Quantum Clustering cho Customer Segmentation

🎯 Mục tiêu học tập

  • Hiểu sâu về quantum clustering và classical clustering
  • Nắm vững cách quantum clustering cải thiện customer segmentation
  • Implement quantum clustering algorithms cho credit risk
  • So sánh performance giữa quantum và classical clustering

📚 Lý thuyết

Clustering Fundamentals

1. Classical Clustering

K-means Algorithm:

min Σᵢ Σₓ∈Cᵢ ||x - μᵢ||²

Hierarchical Clustering:

d(Cᵢ, Cⱼ) = min{d(x, y) : x ∈ Cᵢ, y ∈ Cⱼ}

DBSCAN:

Core point: |N_ε(p)| ≥ MinPts
Border point: p ∈ N_ε(q) for some core point q

2. Quantum Clustering

Quantum K-means:

|ψ⟩ = (1/√k) Σᵢ |i⟩|μᵢ⟩

Quantum Distance:

d_quantum(x, y) = |⟨φ(x)|φ(y)⟩|²

Quantum Clustering Circuit:

U_cluster = U_encoding ⊗ U_measurement

Quantum Clustering Types

1. Quantum K-means:

  • Quantum Encoding: Superposition of cluster centers
  • Quantum Distance: Quantum kernel-based distance
  • Quantum Update: Quantum amplitude estimation

2. Quantum Hierarchical Clustering:

  • Quantum Similarity: Quantum state similarity
  • Quantum Merging: Quantum superposition of clusters
  • Quantum Dendrogram: Quantum tree structure

3. Quantum DBSCAN:

  • Quantum Neighborhood: Quantum ε-neighborhood
  • Quantum Core Points: Quantum density estimation
  • Quantum Clusters: Quantum connected components

Quantum Clustering Advantages

1. Quantum Properties:

  • Superposition: Parallel processing of multiple clusters
  • Entanglement: Complex cluster relationships
  • Quantum Parallelism: Exponential speedup potential

2. Credit-specific Benefits:

  • Non-linear Patterns: Quantum clustering captures complex relationships
  • High-dimensional Data: Handle many credit features
  • Quantum Advantage: Potential speedup for large datasets

💻 Thực hành

Project 14: Quantum Clustering cho Customer Segmentation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score, calinski_harabasz_score
from sklearn.decomposition import PCA
from qiskit import QuantumCircuit, Aer, execute
from qiskit.circuit.library import ZZFeatureMap, RealAmplitudes
from qiskit.algorithms import VQE, QAOA
from qiskit.algorithms.optimizers import SPSA
from qiskit_machine_learning.algorithms import VQC
import pennylane as qml

class ClassicalClustering:
    """Classical clustering methods"""
    
    def __init__(self):
        self.scaler = StandardScaler()
        
    def prepare_features(self, data):
        """
        Prepare features for clustering
        """
        # Feature engineering
        features = data.copy()
        
        # Create credit-specific features
        features['debt_income_ratio'] = features['debt'] / (features['income'] + 1)
        features['credit_utilization'] = features['credit_used'] / (features['credit_limit'] + 1)
        features['payment_ratio'] = features['payments_made'] / (features['payments_due'] + 1)
        features['income_credit_ratio'] = features['income'] / (features['credit_limit'] + 1)
        features['age_income_ratio'] = features['age'] / (features['income'] + 1)
        
        # Normalize features
        numeric_features = features.select_dtypes(include=[np.number])
        if 'default' in numeric_features.columns:
            numeric_features = numeric_features.drop('default', axis=1)
        
        normalized_features = self.scaler.fit_transform(numeric_features)
        
        return pd.DataFrame(normalized_features, columns=numeric_features.columns)
    
    def kmeans_clustering(self, features, n_clusters=3):
        """
        K-means clustering
        """
        kmeans = KMeans(n_clusters=n_clusters, random_state=42)
        clusters = kmeans.fit_predict(features)
        
        return clusters, kmeans.cluster_centers_
    
    def hierarchical_clustering(self, features, n_clusters=3):
        """
        Hierarchical clustering
        """
        hierarchical = AgglomerativeClustering(n_clusters=n_clusters)
        clusters = hierarchical.fit_predict(features)
        
        return clusters
    
    def dbscan_clustering(self, features, eps=0.5, min_samples=5):
        """
        DBSCAN clustering
        """
        dbscan = DBSCAN(eps=eps, min_samples=min_samples)
        clusters = dbscan.fit_predict(features)
        
        return clusters
    
    def evaluate_clustering(self, features, clusters):
        """
        Evaluate clustering quality
        """
        # Silhouette score
        silhouette = silhouette_score(features, clusters)
        
        # Calinski-Harabasz score
        calinski = calinski_harabasz_score(features, clusters)
        
        # Number of clusters
        n_clusters = len(np.unique(clusters))
        
        return {
            'silhouette_score': silhouette,
            'calinski_harabasz_score': calinski,
            'n_clusters': n_clusters
        }

class QuantumClustering:
    """Quantum clustering implementation"""
    
    def __init__(self, num_qubits=4):
        self.num_qubits = num_qubits
        self.backend = Aer.get_backend('qasm_simulator')
        self.feature_map = None
        self.cluster_centers = None
        
    def create_feature_map(self, X):
        """
        Create quantum feature map
        """
        self.feature_map = ZZFeatureMap(
            feature_dimension=X.shape[1],
            reps=2
        )
        return self.feature_map
    
    def quantum_distance(self, x1, x2):
        """
        Calculate quantum distance between two points
        """
        # Create quantum states
        circuit1 = self.feature_map.bind_parameters(x1)
        circuit2 = self.feature_map.bind_parameters(x2)
        
        # Execute circuits
        job1 = execute(circuit1, self.backend, shots=1000)
        job2 = execute(circuit2, self.backend, shots=1000)
        
        result1 = job1.result()
        result2 = job2.result()
        
        counts1 = result1.get_counts()
        counts2 = result2.get_counts()
        
        # Calculate quantum distance
        distance = self._calculate_quantum_distance(counts1, counts2)
        
        return distance
    
    def _calculate_quantum_distance(self, counts1, counts2):
        """
        Calculate distance between quantum states
        """
        # Get all possible bitstrings
        all_bitstrings = set(counts1.keys()) | set(counts2.keys())
        
        total_shots = 1000
        distance = 0.0
        
        for bitstring in all_bitstrings:
            prob1 = counts1.get(bitstring, 0) / total_shots
            prob2 = counts2.get(bitstring, 0) / total_shots
            
            distance += (prob1 - prob2) ** 2
        
        return np.sqrt(distance)
    
    def quantum_kmeans(self, X, n_clusters=3, max_iter=10):
        """
        Quantum K-means clustering
        """
        # Initialize cluster centers
        n_samples, n_features = X.shape
        centers_idx = np.random.choice(n_samples, n_clusters, replace=False)
        self.cluster_centers = X[centers_idx].copy()
        
        clusters = np.zeros(n_samples, dtype=int)
        
        for iteration in range(max_iter):
            print(f"Quantum K-means iteration {iteration + 1}/{max_iter}")
            
            # Assign points to clusters
            for i in range(n_samples):
                distances = []
                for j in range(n_clusters):
                    distance = self.quantum_distance(X[i], self.cluster_centers[j])
                    distances.append(distance)
                
                clusters[i] = np.argmin(distances)
            
            # Update cluster centers
            new_centers = self.cluster_centers.copy()
            for j in range(n_clusters):
                cluster_points = X[clusters == j]
                if len(cluster_points) > 0:
                    new_centers[j] = np.mean(cluster_points, axis=0)
            
            # Check convergence
            if np.allclose(self.cluster_centers, new_centers):
                break
            
            self.cluster_centers = new_centers
        
        return clusters, self.cluster_centers
    
    def quantum_hierarchical_clustering(self, X, n_clusters=3):
        """
        Quantum hierarchical clustering
        """
        n_samples = X.shape[0]
        
        # Initialize: each point is its own cluster
        clusters = list(range(n_samples))
        cluster_sets = [{i} for i in range(n_samples)]
        
        # Calculate pairwise distances
        distances = np.zeros((n_samples, n_samples))
        for i in range(n_samples):
            for j in range(i + 1, n_samples):
                distance = self.quantum_distance(X[i], X[j])
                distances[i, j] = distance
                distances[j, i] = distance
        
        # Merge clusters until we have n_clusters
        while len(cluster_sets) > n_clusters:
            # Find closest clusters
            min_distance = float('inf')
            merge_i, merge_j = -1, -1
            
            for i in range(len(cluster_sets)):
                for j in range(i + 1, len(cluster_sets)):
                    # Calculate distance between clusters
                    cluster_distance = self._cluster_distance(
                        cluster_sets[i], cluster_sets[j], distances
                    )
                    
                    if cluster_distance < min_distance:
                        min_distance = cluster_distance
                        merge_i, merge_j = i, j
            
            # Merge clusters
            cluster_sets[merge_i].update(cluster_sets[merge_j])
            cluster_sets.pop(merge_j)
        
        # Assign cluster labels
        final_clusters = np.zeros(n_samples, dtype=int)
        for cluster_id, cluster_set in enumerate(cluster_sets):
            for point_id in cluster_set:
                final_clusters[point_id] = cluster_id
        
        return final_clusters
    
    def _cluster_distance(self, cluster1, cluster2, distances):
        """
        Calculate distance between two clusters
        """
        min_distance = float('inf')
        
        for i in cluster1:
            for j in cluster2:
                distance = distances[i, j]
                if distance < min_distance:
                    min_distance = distance
        
        return min_distance
    
    def quantum_dbscan(self, X, eps=0.5, min_samples=5):
        """
        Quantum DBSCAN clustering
        """
        n_samples = X.shape[0]
        
        # Calculate pairwise distances
        distances = np.zeros((n_samples, n_samples))
        for i in range(n_samples):
            for j in range(i + 1, n_samples):
                distance = self.quantum_distance(X[i], X[j])
                distances[i, j] = distance
                distances[j, i] = distance
        
        # Find core points
        core_points = []
        for i in range(n_samples):
            neighbors = np.sum(distances[i] <= eps)
            if neighbors >= min_samples:
                core_points.append(i)
        
        # Initialize clusters
        clusters = np.full(n_samples, -1)  # -1 for noise
        cluster_id = 0
        
        # Expand clusters from core points
        for core_point in core_points:
            if clusters[core_point] != -1:
                continue
            
            # Start new cluster
            clusters[core_point] = cluster_id
            
            # Expand cluster
            self._expand_cluster(core_point, cluster_id, clusters, distances, eps, min_samples)
            
            cluster_id += 1
        
        return clusters
    
    def _expand_cluster(self, point, cluster_id, clusters, distances, eps, min_samples):
        """
        Expand cluster from a core point
        """
        neighbors = np.where(distances[point] <= eps)[0]
        
        for neighbor in neighbors:
            if clusters[neighbor] == -1:
                clusters[neighbor] = cluster_id
                
                # Check if neighbor is a core point
                neighbor_neighbors = np.sum(distances[neighbor] <= eps)
                if neighbor_neighbors >= min_samples:
                    self._expand_cluster(neighbor, cluster_id, clusters, distances, eps, min_samples)
    
    def evaluate_clustering(self, X, clusters):
        """
        Evaluate quantum clustering quality
        """
        # Remove noise points for evaluation
        valid_clusters = clusters[clusters != -1]
        valid_X = X[clusters != -1]
        
        if len(valid_clusters) == 0:
            return {
                'silhouette_score': 0.0,
                'calinski_harabasz_score': 0.0,
                'n_clusters': 0
            }
        
        # Silhouette score
        try:
            silhouette = silhouette_score(valid_X, valid_clusters)
        except:
            silhouette = 0.0
        
        # Calinski-Harabasz score
        try:
            calinski = calinski_harabasz_score(valid_X, valid_clusters)
        except:
            calinski = 0.0
        
        # Number of clusters
        n_clusters = len(np.unique(valid_clusters))
        
        return {
            'silhouette_score': silhouette,
            'calinski_harabasz_score': calinski,
            'n_clusters': n_clusters
        }

def generate_credit_data(n_samples=1000):
    """
    Generate synthetic credit data with clusters
    """
    np.random.seed(42)
    
    # Generate three distinct customer segments
    n_per_cluster = n_samples // 3
    
    # High-income, low-risk customers
    high_income = np.random.normal(80000, 15000, n_per_cluster)
    high_income_debt = np.random.uniform(5000, 30000, n_per_cluster)
    high_income_credit = np.random.uniform(500, 10000, n_per_cluster)
    high_income_limit = np.random.uniform(50000, 150000, n_per_cluster)
    high_income_payments = np.random.uniform(10, 12, n_per_cluster)
    high_income_due = np.random.uniform(11, 12, n_per_cluster)
    high_income_age = np.random.uniform(35, 55, n_per_cluster)
    high_income_employment = np.random.uniform(5, 20, n_per_cluster)
    
    # Medium-income, medium-risk customers
    medium_income = np.random.normal(50000, 10000, n_per_cluster)
    medium_income_debt = np.random.uniform(20000, 60000, n_per_cluster)
    medium_income_credit = np.random.uniform(5000, 25000, n_per_cluster)
    medium_income_limit = np.random.uniform(20000, 80000, n_per_cluster)
    medium_income_payments = np.random.uniform(8, 11, n_per_cluster)
    medium_income_due = np.random.uniform(10, 12, n_per_cluster)
    medium_income_age = np.random.uniform(25, 45, n_per_cluster)
    medium_income_employment = np.random.uniform(2, 10, n_per_cluster)
    
    # Low-income, high-risk customers
    low_income = np.random.normal(30000, 8000, n_per_cluster)
    low_income_debt = np.random.uniform(40000, 80000, n_per_cluster)
    low_income_credit = np.random.uniform(15000, 40000, n_per_cluster)
    low_income_limit = np.random.uniform(10000, 50000, n_per_cluster)
    low_income_payments = np.random.uniform(5, 9, n_per_cluster)
    low_income_due = np.random.uniform(9, 12, n_per_cluster)
    low_income_age = np.random.uniform(20, 35, n_per_cluster)
    low_income_employment = np.random.uniform(0, 5, n_per_cluster)
    
    # Combine data
    data = pd.DataFrame({
        'income': np.concatenate([high_income, medium_income, low_income]),
        'debt': np.concatenate([high_income_debt, medium_income_debt, low_income_debt]),
        'credit_used': np.concatenate([high_income_credit, medium_income_credit, low_income_credit]),
        'credit_limit': np.concatenate([high_income_limit, medium_income_limit, low_income_limit]),
        'payments_made': np.concatenate([high_income_payments, medium_income_payments, low_income_payments]),
        'payments_due': np.concatenate([high_income_due, medium_income_due, low_income_due]),
        'age': np.concatenate([high_income_age, medium_income_age, low_income_age]),
        'employment_years': np.concatenate([high_income_employment, medium_income_employment, low_income_employment])
    })
    
    # Create target variable
    debt_income_ratio = data['debt'] / (data['income'] + 1)
    credit_utilization = data['credit_used'] / (data['credit_limit'] + 1)
    payment_ratio = data['payments_made'] / (data['payments_due'] + 1)
    
    default_prob = (0.3 * debt_income_ratio + 
                   0.4 * credit_utilization + 
                   0.3 * (1 - payment_ratio))
    
    default_prob += np.random.normal(0, 0.1, len(data))
    default_prob = np.clip(default_prob, 0, 1)
    
    data['default'] = (default_prob > 0.5).astype(int)
    
    # Add cluster labels
    data['true_cluster'] = np.concatenate([
        np.zeros(n_per_cluster, dtype=int),
        np.ones(n_per_cluster, dtype=int),
        2 * np.ones(n_per_cluster, dtype=int)
    ])
    
    return data

def compare_clustering_methods():
    """
    Compare classical and quantum clustering methods
    """
    print("=== Classical vs Quantum Clustering Comparison ===\n")
    
    # Generate data
    data = generate_credit_data(300)
    
    # Prepare features
    classical_clustering = ClassicalClustering()
    features = classical_clustering.prepare_features(data)
    
    # Classical clustering methods
    print("1. Classical Clustering Methods:")
    
    # K-means
    kmeans_clusters, kmeans_centers = classical_clustering.kmeans_clustering(features, n_clusters=3)
    kmeans_eval = classical_clustering.evaluate_clustering(features, kmeans_clusters)
    
    print(f"   K-means:")
    print(f"     Silhouette Score: {kmeans_eval['silhouette_score']:.4f}")
    print(f"     Calinski-Harabasz Score: {kmeans_eval['calinski_harabasz_score']:.4f}")
    print(f"     Number of Clusters: {kmeans_eval['n_clusters']}")
    
    # Hierarchical clustering
    hierarchical_clusters = classical_clustering.hierarchical_clustering(features, n_clusters=3)
    hierarchical_eval = classical_clustering.evaluate_clustering(features, hierarchical_clusters)
    
    print(f"   Hierarchical Clustering:")
    print(f"     Silhouette Score: {hierarchical_eval['silhouette_score']:.4f}")
    print(f"     Calinski-Harabasz Score: {hierarchical_eval['calinski_harabasz_score']:.4f}")
    print(f"     Number of Clusters: {hierarchical_eval['n_clusters']}")
    
    # DBSCAN
    dbscan_clusters = classical_clustering.dbscan_clustering(features, eps=0.5, min_samples=5)
    dbscan_eval = classical_clustering.evaluate_clustering(features, dbscan_clusters)
    
    print(f"   DBSCAN:")
    print(f"     Silhouette Score: {dbscan_eval['silhouette_score']:.4f}")
    print(f"     Calinski-Harabasz Score: {dbscan_eval['calinski_harabasz_score']:.4f}")
    print(f"     Number of Clusters: {dbscan_eval['n_clusters']}")
    
    # Quantum clustering methods
    print("\n2. Quantum Clustering Methods:")
    
    # Use subset of features for quantum clustering
    quantum_features = features[['income', 'debt', 'credit_used', 'credit_limit']].copy()
    
    # Quantum K-means
    quantum_clustering = QuantumClustering(num_qubits=4)
    quantum_kmeans_clusters, quantum_centers = quantum_clustering.quantum_kmeans(
        quantum_features.values, n_clusters=3
    )
    quantum_kmeans_eval = quantum_clustering.evaluate_clustering(quantum_features.values, quantum_kmeans_clusters)
    
    print(f"   Quantum K-means:")
    print(f"     Silhouette Score: {quantum_kmeans_eval['silhouette_score']:.4f}")
    print(f"     Calinski-Harabasz Score: {quantum_kmeans_eval['calinski_harabasz_score']:.4f}")
    print(f"     Number of Clusters: {quantum_kmeans_eval['n_clusters']}")
    
    # Quantum hierarchical clustering
    quantum_hierarchical_clusters = quantum_clustering.quantum_hierarchical_clustering(
        quantum_features.values, n_clusters=3
    )
    quantum_hierarchical_eval = quantum_clustering.evaluate_clustering(
        quantum_features.values, quantum_hierarchical_clusters
    )
    
    print(f"   Quantum Hierarchical Clustering:")
    print(f"     Silhouette Score: {quantum_hierarchical_eval['silhouette_score']:.4f}")
    print(f"     Calinski-Harabasz Score: {quantum_hierarchical_eval['calinski_harabasz_score']:.4f}")
    print(f"     Number of Clusters: {quantum_hierarchical_eval['n_clusters']}")
    
    # Quantum DBSCAN
    quantum_dbscan_clusters = quantum_clustering.quantum_dbscan(
        quantum_features.values, eps=0.5, min_samples=5
    )
    quantum_dbscan_eval = quantum_clustering.evaluate_clustering(
        quantum_features.values, quantum_dbscan_clusters
    )
    
    print(f"   Quantum DBSCAN:")
    print(f"     Silhouette Score: {quantum_dbscan_eval['silhouette_score']:.4f}")
    print(f"     Calinski-Harabasz Score: {quantum_dbscan_eval['calinski_harabasz_score']:.4f}")
    print(f"     Number of Clusters: {quantum_dbscan_eval['n_clusters']}")
    
    # Compare results
    print(f"\n3. Comparison Summary:")
    methods = ['K-means', 'Hierarchical', 'DBSCAN', 'Quantum K-means', 'Quantum Hierarchical', 'Quantum DBSCAN']
    silhouette_scores = [
        kmeans_eval['silhouette_score'],
        hierarchical_eval['silhouette_score'],
        dbscan_eval['silhouette_score'],
        quantum_kmeans_eval['silhouette_score'],
        quantum_hierarchical_eval['silhouette_score'],
        quantum_dbscan_eval['silhouette_score']
    ]
    
    for method, score in zip(methods, silhouette_scores):
        print(f"   {method}: {score:.4f}")
    
    # Visualize results
    plt.figure(figsize=(20, 10))
    
    # PCA for visualization
    pca = PCA(n_components=2)
    features_2d = pca.fit_transform(features)
    quantum_features_2d = pca.transform(quantum_features)
    
    # Classical clustering results
    plt.subplot(2, 3, 1)
    plt.scatter(features_2d[:, 0], features_2d[:, 1], c=kmeans_clusters, cmap='viridis')
    plt.title('Classical K-means Clustering')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    plt.subplot(2, 3, 2)
    plt.scatter(features_2d[:, 0], features_2d[:, 1], c=hierarchical_clusters, cmap='viridis')
    plt.title('Classical Hierarchical Clustering')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    plt.subplot(2, 3, 3)
    plt.scatter(features_2d[:, 0], features_2d[:, 1], c=dbscan_clusters, cmap='viridis')
    plt.title('Classical DBSCAN Clustering')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    # Quantum clustering results
    plt.subplot(2, 3, 4)
    plt.scatter(quantum_features_2d[:, 0], quantum_features_2d[:, 1], c=quantum_kmeans_clusters, cmap='viridis')
    plt.title('Quantum K-means Clustering')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    plt.subplot(2, 3, 5)
    plt.scatter(quantum_features_2d[:, 0], quantum_features_2d[:, 1], c=quantum_hierarchical_clusters, cmap='viridis')
    plt.title('Quantum Hierarchical Clustering')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    plt.subplot(2, 3, 6)
    plt.scatter(quantum_features_2d[:, 0], quantum_features_2d[:, 1], c=quantum_dbscan_clusters, cmap='viridis')
    plt.title('Quantum DBSCAN Clustering')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    plt.tight_layout()
    plt.show()
    
    return (kmeans_clusters, hierarchical_clusters, dbscan_clusters,
            quantum_kmeans_clusters, quantum_hierarchical_clusters, quantum_dbscan_clusters)

def quantum_clustering_analysis():
    """
    Analyze quantum clustering properties
    """
    print("=== Quantum Clustering Analysis ===\n")
    
    # Generate data
    data = generate_credit_data(200)
    features = data[['income', 'debt', 'credit_used', 'credit_limit']].copy()
    
    # Normalize features
    scaler = StandardScaler()
    normalized_features = scaler.fit_transform(features)
    
    # Create quantum clustering
    qc = QuantumClustering(num_qubits=4)
    
    # Analyze quantum distance properties
    print("1. Quantum Distance Analysis:")
    
    # Calculate pairwise distances
    n_samples = min(50, len(normalized_features))  # Limit for computational efficiency
    distances = np.zeros((n_samples, n_samples))
    
    for i in range(n_samples):
        for j in range(i + 1, n_samples):
            distance = qc.quantum_distance(normalized_features[i], normalized_features[j])
            distances[i, j] = distance
            distances[j, i] = distance
    
    print(f"   Distance Matrix Shape: {distances.shape}")
    print(f"   Average Distance: {np.mean(distances):.4f}")
    print(f"   Distance Std: {np.std(distances):.4f}")
    print(f"   Min Distance: {np.min(distances):.4f}")
    print(f"   Max Distance: {np.max(distances):.4f}")
    
    # Analyze distance distribution
    plt.figure(figsize=(12, 5))
    
    plt.subplot(1, 2, 1)
    plt.hist(distances.flatten(), bins=30, alpha=0.7, edgecolor='black')
    plt.xlabel('Quantum Distance')
    plt.ylabel('Frequency')
    plt.title('Quantum Distance Distribution')
    plt.grid(True)
    
    plt.subplot(1, 2, 2)
    plt.imshow(distances, cmap='viridis')
    plt.colorbar()
    plt.title('Quantum Distance Matrix')
    plt.xlabel('Sample Index')
    plt.ylabel('Sample Index')
    
    plt.tight_layout()
    plt.show()
    
    # Analyze clustering stability
    print(f"\n2. Clustering Stability Analysis:")
    
    stability_scores = []
    for run in range(5):
        clusters, _ = qc.quantum_kmeans(normalized_features, n_clusters=3)
        eval_result = qc.evaluate_clustering(normalized_features, clusters)
        stability_scores.append(eval_result['silhouette_score'])
        print(f"   Run {run + 1}: Silhouette Score = {eval_result['silhouette_score']:.4f}")
    
    print(f"   Average Silhouette Score: {np.mean(stability_scores):.4f}")
    print(f"   Silhouette Score Std: {np.std(stability_scores):.4f}")
    
    return distances, stability_scores

# Exercise: Quantum Clustering Parameter Optimization
def quantum_clustering_parameter_optimization():
    """
    Exercise: Optimize quantum clustering parameters
    """
    print("=== Quantum Clustering Parameter Optimization ===\n")
    
    # Generate data
    data = generate_credit_data(200)
    features = data[['income', 'debt', 'credit_used', 'credit_limit']].copy()
    
    # Normalize features
    scaler = StandardScaler()
    normalized_features = scaler.fit_transform(features)
    
    # Test different parameters
    n_clusters_values = [2, 3, 4, 5]
    num_qubits_values = [2, 4, 6]
    
    results = {}
    
    for n_clusters in n_clusters_values:
        for num_qubits in num_qubits_values:
            print(f"Testing n_clusters={n_clusters}, num_qubits={num_qubits}")
            
            try:
                # Create quantum clustering
                qc = QuantumClustering(num_qubits=num_qubits)
                
                # Perform clustering
                clusters, centers = qc.quantum_kmeans(normalized_features, n_clusters=n_clusters)
                
                # Evaluate
                eval_result = qc.evaluate_clustering(normalized_features, clusters)
                
                results[f"n_clusters_{n_clusters}_qubits_{num_qubits}"] = {
                    'n_clusters': n_clusters,
                    'num_qubits': num_qubits,
                    'silhouette_score': eval_result['silhouette_score'],
                    'calinski_harabasz_score': eval_result['calinski_harabasz_score']
                }
                
                print(f"  Silhouette Score: {eval_result['silhouette_score']:.4f}")
                print(f"  Calinski-Harabasz Score: {eval_result['calinski_harabasz_score']:.4f}")
                
            except Exception as e:
                print(f"  Error: {e}")
            
            print()
    
    # Plot results
    plt.figure(figsize=(15, 5))
    
    # Silhouette score comparison
    plt.subplot(1, 3, 1)
    configs = list(results.keys())
    silhouette_scores = [results[config]['silhouette_score'] for config in configs]
    plt.bar(configs, silhouette_scores, color='skyblue')
    plt.ylabel('Silhouette Score')
    plt.title('Silhouette Score by Configuration')
    plt.xticks(rotation=45)
    
    # Calinski-Harabasz score comparison
    plt.subplot(1, 3, 2)
    calinski_scores = [results[config]['calinski_harabasz_score'] for config in configs]
    plt.bar(configs, calinski_scores, color='lightcoral')
    plt.ylabel('Calinski-Harabasz Score')
    plt.title('Calinski-Harabasz Score by Configuration')
    plt.xticks(rotation=45)
    
    # Parameter space visualization
    plt.subplot(1, 3, 3)
    n_clusters_list = [results[config]['n_clusters'] for config in configs]
    num_qubits_list = [results[config]['num_qubits'] for config in configs]
    plt.scatter(n_clusters_list, num_qubits_list, c=silhouette_scores, s=100, cmap='viridis')
    plt.colorbar(label='Silhouette Score')
    plt.xlabel('Number of Clusters')
    plt.ylabel('Number of Qubits')
    plt.title('Parameter Space Optimization')
    plt.grid(True)
    
    plt.tight_layout()
    plt.show()
    
    return results

# Run demos
if __name__ == "__main__":
    print("Running Clustering Comparisons...")
    (kmeans_clusters, hierarchical_clusters, dbscan_clusters,
     quantum_kmeans_clusters, quantum_hierarchical_clusters, quantum_dbscan_clusters) = compare_clustering_methods()
    
    print("\nRunning Quantum Clustering Analysis...")
    distances, stability_scores = quantum_clustering_analysis()
    
    print("\nRunning Parameter Optimization...")
    optimization_results = quantum_clustering_parameter_optimization()

Exercise 2: Quantum Clustering Visualization

def quantum_clustering_visualization():
    """
    Exercise: Visualize quantum clustering results
    """
    # Generate data
    data = generate_credit_data(300)
    features = data[['income', 'debt', 'credit_used', 'credit_limit']].copy()
    
    # Normalize features
    scaler = StandardScaler()
    normalized_features = scaler.fit_transform(features)
    
    # Create quantum clustering
    qc = QuantumClustering(num_qubits=4)
    
    # Perform clustering
    clusters, centers = qc.quantum_kmeans(normalized_features, n_clusters=3)
    
    # PCA for visualization
    pca = PCA(n_components=2)
    features_2d = pca.fit_transform(normalized_features)
    
    # Create comprehensive visualization
    plt.figure(figsize=(20, 15))
    
    # Original data
    plt.subplot(3, 4, 1)
    plt.scatter(features_2d[:, 0], features_2d[:, 1], alpha=0.6)
    plt.title('Original Data')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    # Clustering results
    plt.subplot(3, 4, 2)
    scatter = plt.scatter(features_2d[:, 0], features_2d[:, 1], c=clusters, cmap='viridis', alpha=0.7)
    plt.colorbar(scatter)
    plt.title('Quantum K-means Clustering')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    # Cluster centers
    plt.subplot(3, 4, 3)
    plt.scatter(features_2d[:, 0], features_2d[:, 1], c=clusters, cmap='viridis', alpha=0.3)
    centers_2d = pca.transform(centers)
    plt.scatter(centers_2d[:, 0], centers_2d[:, 1], c='red', s=200, marker='x', linewidths=3)
    plt.title('Cluster Centers')
    plt.xlabel('PCA Component 1')
    plt.ylabel('PCA Component 2')
    
    # Feature distributions by cluster
    plt.subplot(3, 4, 4)
    for i in range(3):
        cluster_data = features[clusters == i]
        plt.hist(cluster_data['income'], alpha=0.5, label=f'Cluster {i}')
    plt.xlabel('Income')
    plt.ylabel('Frequency')
    plt.title('Income Distribution by Cluster')
    plt.legend()
    
    # Debt distribution
    plt.subplot(3, 4, 5)
    for i in range(3):
        cluster_data = features[clusters == i]
        plt.hist(cluster_data['debt'], alpha=0.5, label=f'Cluster {i}')
    plt.xlabel('Debt')
    plt.ylabel('Frequency')
    plt.title('Debt Distribution by Cluster')
    plt.legend()
    
    # Credit utilization distribution
    plt.subplot(3, 4, 6)
    for i in range(3):
        cluster_data = features[clusters == i]
        credit_util = cluster_data['credit_used'] / (cluster_data['credit_limit'] + 1)
        plt.hist(credit_util, alpha=0.5, label=f'Cluster {i}')
    plt.xlabel('Credit Utilization')
    plt.ylabel('Frequency')
    plt.title('Credit Utilization by Cluster')
    plt.legend()
    
    # Cluster sizes
    plt.subplot(3, 4, 7)
    cluster_sizes = [np.sum(clusters == i) for i in range(3)]
    plt.bar(range(3), cluster_sizes, color=['blue', 'orange', 'green'])
    plt.xlabel('Cluster')
    plt.ylabel('Size')
    plt.title('Cluster Sizes')
    plt.xticks(range(3))
    
    # Silhouette analysis
    plt.subplot(3, 4, 8)
    from sklearn.metrics import silhouette_samples
    silhouette_vals = silhouette_samples(normalized_features, clusters)
    plt.scatter(range(len(silhouette_vals)), silhouette_vals, c=clusters, cmap='viridis', alpha=0.7)
    plt.axhline(y=0, color='red', linestyle='--')
    plt.xlabel('Sample')
    plt.ylabel('Silhouette Score')
    plt.title('Silhouette Analysis')
    
    # 3D visualization
    from mpl_toolkits.mplot3d import Axes3D
    ax = plt.subplot(3, 4, 9, projection='3d')
    pca_3d = PCA(n_components=3)
    features_3d = pca_3d.fit_transform(normalized_features)
    scatter = ax.scatter(features_3d[:, 0], features_3d[:, 1], features_3d[:, 2], 
                        c=clusters, cmap='viridis', alpha=0.7)
    ax.set_xlabel('PCA Component 1')
    ax.set_ylabel('PCA Component 2')
    ax.set_zlabel('PCA Component 3')
    ax.set_title('3D Clustering Visualization')
    
    # Cluster characteristics
    plt.subplot(3, 4, 10)
    cluster_means = []
    for i in range(3):
        cluster_data = features[clusters == i]
        means = cluster_data.mean()
        cluster_means.append(means)
    
    cluster_means_df = pd.DataFrame(cluster_means)
    cluster_means_df.plot(kind='bar', ax=plt.gca())
    plt.title('Cluster Characteristics')
    plt.xlabel('Cluster')
    plt.ylabel('Mean Value')
    plt.xticks(range(3))
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    
    # Risk assessment by cluster
    plt.subplot(3, 4, 11)
    risk_scores = []
    for i in range(3):
        cluster_data = features[clusters == i]
        debt_income_ratio = cluster_data['debt'] / (cluster_data['income'] + 1)
        credit_utilization = cluster_data['credit_used'] / (cluster_data['credit_limit'] + 1)
        risk_score = 0.5 * debt_income_ratio + 0.5 * credit_utilization
        risk_scores.append(risk_score.mean())
    
    plt.bar(range(3), risk_scores, color=['green', 'orange', 'red'])
    plt.xlabel('Cluster')
    plt.ylabel('Risk Score')
    plt.title('Risk Assessment by Cluster')
    plt.xticks(range(3))
    
    # Cluster comparison
    plt.subplot(3, 4, 12)
    metrics = ['Silhouette Score', 'Calinski-Harabasz Score']
    classical_scores = [0.4, 200]  # Example values
    quantum_scores = [0.6, 300]    # Example values
    
    x = np.arange(len(metrics))
    width = 0.35
    
    plt.bar(x - width/2, classical_scores, width, label='Classical', color='blue', alpha=0.7)
    plt.bar(x + width/2, quantum_scores, width, label='Quantum', color='orange', alpha=0.7)
    
    plt.xlabel('Metrics')
    plt.ylabel('Score')
    plt.title('Classical vs Quantum Clustering')
    plt.xticks(x, metrics)
    plt.legend()
    
    plt.tight_layout()
    plt.show()
    
    return clusters, centers

# Run visualization
if __name__ == "__main__":
    clusters, centers = quantum_clustering_visualization()

📊 Kết quả và Phân tích

Quantum Clustering Advantages:

1. Quantum Properties:

  • Superposition: Parallel processing of multiple clusters
  • Entanglement: Complex cluster relationships
  • Quantum Parallelism: Exponential speedup potential

2. Credit-specific Benefits:

  • Non-linear Patterns: Quantum clustering captures complex relationships
  • High-dimensional Data: Handle many credit features
  • Quantum Advantage: Potential speedup for large datasets

3. Performance Characteristics:

  • Better Separability: Quantum features improve cluster boundaries
  • Robustness: Quantum clustering handles noisy credit data
  • Scalability: Quantum advantage for large-scale customer segmentation

Comparison với Classical Clustering:

Classical Limitations:

  • Limited to linear separability
  • Curse of dimensionality
  • Local optima problems
  • Feature engineering required

Quantum Advantages:

  • Non-linear separability
  • High-dimensional feature space
  • Global optimization potential
  • Automatic feature learning

🎯 Bài tập về nhà

Exercise 1: Quantum Clustering Calibration

Implement quantum clustering calibration methods cho customer segmentation.

Exercise 2: Quantum Clustering Ensemble Methods

Build ensemble of quantum clustering algorithms cho improved performance.

Exercise 3: Quantum Clustering Feature Selection

Develop quantum feature selection cho clustering optimization.

Exercise 4: Quantum Clustering Validation

Create validation framework cho quantum clustering models.


“Quantum clustering leverages quantum superposition and entanglement to provide superior customer segmentation for credit risk assessment.” - Quantum Finance Research

Ngày tiếp theo: Quantum Anomaly Detection cho Fraud