Problem Solving Note

 

Speed-up from Base code

1. Read data and generate N-asset portfolios

1) Load daily return data

  GPU 1 (Base code) GPU 1 (Naive) GPU 2 (Naive) Acceleration ratio (GPU 2)
Current 0.81s 0.04s 0.04s 1.00x
Accumulated 0.81s 0.04s 0.04s 1.00x
  GPU 1 (Base code) GPU 1 (Optimized) GPU 2 (Optimized) Acceleration ratio (GPU 2)
Optimized 0.82s 0.04s 0.05s 0.08x
Accumulated 0.82s 0.04s 0.05s 0.08x
  • Optimization details

2) Select N-asset

  GPU 1 (Base code) GPU 1 (Naive) GPU 2 (Naive) Acceleration ratio (GPU 2)
Current 0.00s 0.01s 0.01s 1.00x
Accumulated 0.81s 0.05s 0.05s 1.00x
  GPU 1 (Base code) GPU 1 (Optimized) GPU 2 (Optimized) Acceleration ratio (GPU 2)
Optimized 0.00s 0.01s 0.01s 1.00x
Accumulated 0.82s 0.05s 0.06s 0.83x
  • Optimization details

3) Generate N-asset portfolio

  GPU 1 (Base code) GPU 1 (Naive) GPU 2 (Naive) Acceleration ratio (GPU 2)
Current 0.06s 0.00s 0.00s 1.00x
Accumulated 0.87s 0.05s 0.05s 1.00x
  GPU 1 (Base code) GPU 1 (Optimized) GPU 2 (Optimized) Acceleration ratio (GPU 2)
Optimized 0.06s 0.01s 0.00s 1.00x
Accumulated 0.88s 0.06s 0.06s 1.00x
  • Optimization details

2. Monte-Carlo Simulation

1) Generate random number generator states(rng_states)

  GPU 1 (Base code) GPU 1 (Naive) GPU 2 (Naive) Acceleration ratio (GPU 2)
Current 1.94s 2.02s 2.09s 0.96x
Accumulated 2.81s 2.07s 2.14s 0.97x
  GPU 1 (Base code) GPU 1 (Optimized) GPU 2 (Optimized) Acceleration ratio (GPU 2)
Optimized 2.11s 2.24s 2.31s 0.96x
Accumulated 2.99s 2.30s 2.36s 0.97x
  • Optimization details

2) Generate M-asset portfolios from N-asset portfolio with random sampling and compute measure and sort

  GPU 1 (Base code) GPU 1 (Naive) GPU 2 (Naive) Acceleration ratio (GPU 2)
Current 1.60s 2.07s 1.45s 1.43x
Accumulated 4.42s 4.14s 3.59s 1.15x
  GPU 1 (Base code) GPU 1 (Optimized) GPU 2 (Optimized) Acceleration ratio (GPU 2)
Optimized 1.64s 1.44s 1.20s 1.20x
Accumulated 4.63s 3.74s 3.56s 1.05x
  • Optimization details
    1. Generate
      Base: 0.4s
      Naive1: 0.4s
    2. Compute measure
      Base: 1.0s
      Naive1: 1.3s Optimized1: 1.25s (cuda.synchthread()를 가장 내부의 연산 함수로 넣음)
    3. Sort
      Base: 0.16s
      Naive1: 0.16s

3) Select top portfolios by measure

  GPU 1 (Base code) GPU 1 (Naive) GPU 2 (Naive) Acceleration ratio (GPU 2)
Current 0.00s 0.00s 0.00s 1.00x
Accumulated 4.42s 4.14s 3.59s 1.15x
  GPU 1 (Base code) GPU 1 (Optimized) GPU 2 (Optimized) Acceleration ratio (GPU 2)
Optimized 0.00s 0.00s 0.00s 1.00x
Accumulated 4.63s 3.74s 3.56s 1.05x
  • Optimization details

3. Asset selection

1) Generate K-asset portfolios from M-asset portfolios with random sampling and compute measure and sort

  GPU 1 (Base code) GPU 1 (Naive) GPU 2 (Naive) Acceleration ratio (GPU 2)
Current 1.88s 5.10s 4.26s 1.20x
Accumulated 6.30s 9.24s 7.85s 1.18x
  GPU 1 (Base code) GPU 1 (Optimized) GPU 2 (Optimized) Acceleration ratio (GPU 2)
Optimized 2.41s 1.84s 1.79s 1.03x
Accumulated 7.04s 5.58s 5.35s 1.04x
  • Optimization details
    1. Initialize variables
      Base: 0.27s
      Naive1: 1.74s
      Optimized1: 0.31s (np.array() 대신 np.arange(), np.concatenate() 사용)
# self.idx = np.array(list(range(self.num_port // N_GPU)) * N_GPU, dtype=np.int32)
self.idx = np.concatenate([np.arange(self.num_port // N_GPU, dtype=np.int32) for _ in range(N_GPU)])
  1. Reduce cuda.select_device()

2) Execute Parallel Genetic algorithm (10 generations)

  GPU 1 (Base code) GPU 1 (Naive) GPU 2 (Naive) Acceleration ratio (GPU 2)
Current 20.93s 51.2s 26.49s 1.93x
Accumulated 27.27s 60.44s 34.34s 1.76x
  GPU 1 (Base code) GPU 1 (Optimized) GPU 2 (Optimized) Acceleration ratio (GPU 2)
Optimized 22.78s 20.92s 11.04s 1.89x
Accumulated 29.82s 26.50s 16.39s 1.62x
  • Optimization details
    1. Reduce cuda.select_device(idx_gpu) (remove redundant codes) and change to with cuda.gpus[idx_gpu]
      51.2s → 36.49s
    2. In kernel(device) function, use while instead of enumerate, zip