Numpy optimization

 

단순 연산을 반복할 때 numpy에서 지원하는 vectorization, broadcasting과 numexpr library의 병렬화를 통해 훨씬 더 빠르게 연산을 수행할 수 있습니다.
다음 실험의 경우, 7.63s 에서 0.0768s로 거의 100배 빠르게 수행한 것을 확인할 수 있습니다.

from math import *
import numpy as np
import numexpr as ne

loops = 25000000
a = range(1, loops)
## 1. Naive for loop
f = lambda x: 3*log(x) + cos(x)**2
%time r = [f(x) for x in a]
CPU times: user 7.43 s, sys: 208 ms, total: 7.63 s
Wall time: 7.63 s
## 2. Numpy vector broadcasting
a = np.arange(1, loops)
%time r = 3*np.log(a) + np.cos(a)**2
CPU times: user 942 ms, sys: 68.1 ms, total: 1.01 s
Wall time: 1.01 s
## 3. 1 thread optimization
ne.set_num_threads(1)
f = '3*log(a) + cos(a)**2'
%time r = ne.evaluate(f)
CPU times: user 272 ms, sys: 24 ms, total: 296 ms
Wall time: 323 ms
## 4. 4 threads optimization
ne.set_num_threads(4)
f = '3*log(a) + cos(a)**2'
%time r = ne.evaluate(f)
CPU times: user 270 ms, sys: 20.1 ms, total: 290 ms
Wall time: 76.8 ms