DDCM Algorithm Summary: GPU/Tensor Implementation

This document provides a comprehensive summary of the Dynamic Discrete Choice Model (DDCM) simulation, specifically focusing on the optimized Tensor-Based (GPU) implementation. It covers the high-level workflow, mathematical formulation, and calibration results.

1. Algorithm Workflow

The simulation proceeds in three distinct phases, orchestrated by main.py. The tensor-based implementation replaces Python objects with PyTorch tensors to achieve massive parallelism.

Phase 1: Forward Pass (Reachability Analysis)

graph TD
    subgraph "Time Loop (t = 0 to T)"
        A[Scheduled States at t] -->|Gather| B(Current Batch)
        B -->|Merge| C{Deduplicate}
        C -->|Unique States| D[GPU Expansion Kernel]
        D -->|Expand| E[Next States & Actions]
        E -->|Scatter| F[Schedule into Future Buckets]
    end
    
    Init[Initial State] --> A
    F -->|t + duration| A
    
    style A fill:#555,stroke:#333,stroke-width:2px
    style D fill:#555,stroke:#333,stroke-width:2px

Phase 2: Backward Induction (Value Iteration)

graph TD
    subgraph "Step A: Graph Building"
        Raw[Raw Forward Pass Data] -->|Unique| Nodes[Unique Global States]
        Raw -->|Map| Edges[Global Indexed Edges]
        Nodes -->|Assign IDs| V[Value Tensor V initialized]
    end

    subgraph "Step B: Backward Pass (t = T down to 0)"
        V -->|Read V_next| CalcQ[Compute Q = U + V_next]
        Edges -->|Slice at t| CalcQ
        CalcQ -->|Scatter Reduce| Agg[LogSumExp]
        Agg -->|Update| V
    end

    style Nodes fill:#555,stroke:#333,stroke-width:2px
    style CalcQ fill:#555,stroke:#333,stroke-width:2px

Phase 3: Simulation (Choice)

graph TD
    subgraph "Simulation Loop"
        Agents[Current Agent States] -->|Lookup| ValidEdges[Valid Moves]
        ValidEdges -->|Compute| Probs[Probabilities P = Softmax]
        Probs -->|Sample| Choice[Select Next Action]
        Choice -->|Update| Agents
    end

    V[Value Tensor] --> Probs
    Init[Start: Home] --> Agents
    Agents -->|End of Day| Result[Trajectories]

    style Choice fill:#555,stroke:#333,stroke-width:2px

2. Implementation Details

Phase 1: Forward Pass (Reachability Analysis)

Goal: Discover all reachable states in the state space $S$ and build the transition graph.

Class: planning.forward_pass_tensor.TensorForwardPass

The algorithm uses a Level-Synchronous BFS with Time Buckets to ensure correct ordering and efficient deduplication.

Initialization: Start with initial_state (Home, t=0) as a tensor $(1, 8)$.
Time Loop: Iterate $t$ from $0$ to $T_{end}$ (e.g., 1440 min).
- GATHER: Collect all state tensors scheduled for time $t$.
- MERGE: Concatenate and deduplicate using torch.unique. This merges paths arriving at the same state from different origins.
- EXPAND: Use GPUStateExpansionKernel to generate all valid actions and next states in parallel.
- SCATTER: Schedule resulting next_states into future time buckets based on their arrival time ($t + \text{duration} + \text{travel time}$).

Key Components:

State Tensor: States are packed into (N, 8) integer tensors: [Time, Zone, Activity, Duration, Mode, CarInUse, MotoInUse, History]
GPU Kernel: Generates candidate actions (sparse lookup), filters invalid actions (vectorized constraints), and computes transitions (vectorized physics).

Phase 2: Backward Induction (Value Iteration)

Goal: Compute the optimal Value Function $V(s, t)$ for every state.

Classes: planning.graph_builder_tensor.GraphBuilder, planning.backward_induction_tensor.BackwardInduction

Graph Construction: Convert raw forward pass results into a structured Sparse Graph.
- Assign unique global indices to states.
- Map “raw” next states to global indices.
- Sort edges by source index.
Vectorized Backward Pass: Iterate backwards from $T_{end}$ to $0$.
- Slice: Identify all edges originating from states at time $t$.
- Compute Q: Calculate $Q(s, a) = u(s, a) + V(s_{next})$ for all edges simultaneously.
- Aggregate: Compute $V(s) = \text{LogSumExp}(Q(s, \cdot))$ using a scatter-reduce operation.

Phase 3: Simulation (Forward Simulation)

Goal: Generate daily activity-travel schedules for a population of agents.

Class: planning.simulate_batch_tensor.simulate_batch_tensor

Simulates $N$ agents (e.g., 1,000) in parallel on the GPU.

Initialization: All agents start at initial_state.
Step Loop: Until all agents reach $T_{end}$:
- Lookup: Find valid edges for current states in the global graph.
- Probabilities: Compute $P(a s) = \text{softmax}(u(s, a) + V(s_{next}))$.
- Sample: Sample next actions for all agents using torch.multinomial.
- Update: Move agents to next states.
Decoding: Convert the resulting tensor trajectories into human-readable DataFrames (CSV format).

3. Mathematical Formulation

Tensor State Representation

States are represented as rows in a compact integer tensor $\mathbf{S}$ of shape $(N, 8)$.

\[s = [t, z, a, d, m, x_{car}, x_{moto}, h]\]

Where:

$t$: Time step (minutes from midnight)
$z$: Zone ID (integer)
$a$: Activity Type ID (integer)
$d$: Duration (time steps spent in current activity)
$m$: Transport Mode ID (integer, NONE if stationary)
$x_{car}, x_{moto}$: Vehicle availability (boolean)
$h$: Activity History (counter for mandatory sequence)

Vectorized Utility Computation

The utility $u(s, a)$ is computed in parallel for a batch of $M$ edges.

\[u_{total} = u_{travel} + u_{change} + u_{tail} + u_{partial} + u_{act} + u_{start}\]

Broken down into component tensors:

Travel Utility: Lookup from OD Tensor $\mathbf{T}{OD}$ and $\mathbf{C}{OD}$.
Activity Utility: Marginal utility of performing activity $a$ for duration $\Delta t$.
Penalties: Constant penalties based on action type masks.

Vectorized Backward Induction (Bellman Equation)

The Bellman equation is solved backwards:

\[V(s, t) = \ln \sum_{a \in A(s)} \exp \left( u(s, a) + V(s', t+\Delta t) \right)\]

Vectorized Implementation (Scatter-Reduce): Given a batch of edges at time $t$ with source indices $\mathbf{I}{src}$, target indices $\mathbf{I}{tgt}$, and utilities $\mathbf{U}$:

Compute Q-Values: $\mathbf{Q} = \mathbf{U} + \mathbf{V}[\mathbf{I}_{tgt}]$
Scatter LogSumExp: Aggregate $\mathbf{Q}$ values by their source index $\mathbf{I}_{src}$. $\mathbf{V}_{new}[k] = \text{LogSumExp}(\{ \mathbf{Q}[i] \mid \mathbf{I}_{src}[i] = k \})$

Simulation Probabilities

For a given agent at state $s$, the probability of choosing action $a$ is:

\[P(a|s) = \frac{\exp(u(s, a) + V(s'))}{\sum_{a'} \exp(u(s, a') + V(s'_{next}))}\]

Batch Sampling:

Logits: $\mathbf{L} = \mathbf{U}{batch} + \mathbf{V}[\mathbf{I}{tgt_batch}]$
Probabilities: $\mathbf{P} = \text{softmax}(\mathbf{L}, \text{dim}=1)$
Sampling: $a_{selected} \sim \text{Multinomial}(\mathbf{P})$

4. Results and Calibration

Calibration Metrics (1,000 Agents)

Activity Frequency:
- HOME: 31.7%
- SCHOOL: 16.5%
- WORK: 12.5%
- SHOPPING: 5.3%
Mode Shares:
- WALK: 81.2%
- BUS: 18.8%
Start Times:
- WORK: Mean ~8:22 AM.
- SCHOOL: Mean ~7:56 AM.

Computational Performance (GPU vs CPU)

The Tensor-based GPU implementation provides significant speedups:

Phase	Original (CPU)	Tensor (GPU)	Speedup
Forward Pass	~78s	~10-15s	~5-8x
Backward Induction	~45s	~2-5s	~10-20x
Simulation (1k)	~120s	~5s	~24x

Document Status: Updated 2025-11-26 Implementation: Tensor-based GPU optimization complete Next Steps: Large-scale validation and parameter tuning