Speed-up from Base code
1. Read data and generate N-asset portfolios
1) Load daily return data
GPU 1 (Base code) | GPU 1 (Naive) | GPU 2 (Naive) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Current | 0.81s | 0.04s | 0.04s | 1.00x |
Accumulated | 0.81s | 0.04s | 0.04s | 1.00x |
GPU 1 (Base code) | GPU 1 (Optimized) | GPU 2 (Optimized) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Optimized | 0.82s | 0.04s | 0.05s | 0.08x |
Accumulated | 0.82s | 0.04s | 0.05s | 0.08x |
- Optimization details
2) Select N-asset
GPU 1 (Base code) | GPU 1 (Naive) | GPU 2 (Naive) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Current | 0.00s | 0.01s | 0.01s | 1.00x |
Accumulated | 0.81s | 0.05s | 0.05s | 1.00x |
GPU 1 (Base code) | GPU 1 (Optimized) | GPU 2 (Optimized) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Optimized | 0.00s | 0.01s | 0.01s | 1.00x |
Accumulated | 0.82s | 0.05s | 0.06s | 0.83x |
- Optimization details
3) Generate N-asset portfolio
GPU 1 (Base code) | GPU 1 (Naive) | GPU 2 (Naive) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Current | 0.06s | 0.00s | 0.00s | 1.00x |
Accumulated | 0.87s | 0.05s | 0.05s | 1.00x |
GPU 1 (Base code) | GPU 1 (Optimized) | GPU 2 (Optimized) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Optimized | 0.06s | 0.01s | 0.00s | 1.00x |
Accumulated | 0.88s | 0.06s | 0.06s | 1.00x |
- Optimization details
2. Monte-Carlo Simulation
1) Generate random number generator states(rng_states
)
GPU 1 (Base code) | GPU 1 (Naive) | GPU 2 (Naive) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Current | 1.94s | 2.02s | 2.09s | 0.96x |
Accumulated | 2.81s | 2.07s | 2.14s | 0.97x |
GPU 1 (Base code) | GPU 1 (Optimized) | GPU 2 (Optimized) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Optimized | 2.11s | 2.24s | 2.31s | 0.96x |
Accumulated | 2.99s | 2.30s | 2.36s | 0.97x |
- Optimization details
2) Generate M-asset portfolios from N-asset portfolio with random sampling and compute measure and sort
GPU 1 (Base code) | GPU 1 (Naive) | GPU 2 (Naive) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Current | 1.60s | 2.07s | 1.45s | 1.43x |
Accumulated | 4.42s | 4.14s | 3.59s | 1.15x |
GPU 1 (Base code) | GPU 1 (Optimized) | GPU 2 (Optimized) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Optimized | 1.64s | 1.44s | 1.20s | 1.20x |
Accumulated | 4.63s | 3.74s | 3.56s | 1.05x |
- Optimization details
- Generate
Base: 0.4s
Naive1: 0.4s - Compute measure
Base: 1.0s
Naive1: 1.3s Optimized1: 1.25s (cuda.synchthread()
를 가장 내부의 연산 함수로 넣음) - Sort
Base: 0.16s
Naive1: 0.16s
- Generate
3) Select top portfolios by measure
GPU 1 (Base code) | GPU 1 (Naive) | GPU 2 (Naive) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Current | 0.00s | 0.00s | 0.00s | 1.00x |
Accumulated | 4.42s | 4.14s | 3.59s | 1.15x |
GPU 1 (Base code) | GPU 1 (Optimized) | GPU 2 (Optimized) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Optimized | 0.00s | 0.00s | 0.00s | 1.00x |
Accumulated | 4.63s | 3.74s | 3.56s | 1.05x |
- Optimization details
3. Asset selection
1) Generate K-asset portfolios from M-asset portfolios with random sampling and compute measure and sort
GPU 1 (Base code) | GPU 1 (Naive) | GPU 2 (Naive) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Current | 1.88s | 5.10s | 4.26s | 1.20x |
Accumulated | 6.30s | 9.24s | 7.85s | 1.18x |
GPU 1 (Base code) | GPU 1 (Optimized) | GPU 2 (Optimized) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Optimized | 2.41s | 1.84s | 1.79s | 1.03x |
Accumulated | 7.04s | 5.58s | 5.35s | 1.04x |
- Optimization details
- Initialize variables
Base: 0.27s
Naive1: 1.74s
Optimized1: 0.31s (np.array()
대신np.arange(), np.concatenate()
사용)
- Initialize variables
# self.idx = np.array(list(range(self.num_port // N_GPU)) * N_GPU, dtype=np.int32)
self.idx = np.concatenate([np.arange(self.num_port // N_GPU, dtype=np.int32) for _ in range(N_GPU)])
- Reduce
cuda.select_device()
2) Execute Parallel Genetic algorithm (10 generations)
GPU 1 (Base code) | GPU 1 (Naive) | GPU 2 (Naive) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Current | 20.93s | 51.2s | 26.49s | 1.93x |
Accumulated | 27.27s | 60.44s | 34.34s | 1.76x |
GPU 1 (Base code) | GPU 1 (Optimized) | GPU 2 (Optimized) | Acceleration ratio (GPU 2) | |
---|---|---|---|---|
Optimized | 22.78s | 20.92s | 11.04s | 1.89x |
Accumulated | 29.82s | 26.50s | 16.39s | 1.62x |
- Optimization details
- Reduce
cuda.select_device(idx_gpu)
(remove redundant codes) and change towith cuda.gpus[idx_gpu]
51.2s → 36.49s - In kernel(device) function, use
while
instead ofenumerate
,zip
- Reduce
PREVIOUSEtc