이 글은 Numba 0.48.0 (Jan 27, 2020) documentation를 정리한 글입니다.
I. Overview
1. Numba
Numba
A compiler for Python array and numerical functions written directly in Python
Numba generates optimized machine code from pure Python code using the LLVM compiler infrastructure.
With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time optimized to performance similar as C, C++ and Fortran, without having to switch languages or Python interpreters.
2. Main features
- on-the-fly code generation(JIT) (at import time or runtime)
- native code generation for the CPU and GPU
- integration with the Python scientific software stack
II. Installation
$ pip install --ignore-installed numba
Details: http://numba.pydata.org/numba-doc/latest/user/installing.html
III. Compiling Python code with @jit
Numba provides several utilities for code generation, but its central feature is the numba.jit()
decorator.
Using this decorator, you can mark a function for optimization by Numba’s JIT compiler.
1. Basic usage
1) Lazy compilation
The recommended way to use the @jit
decorator is to let Numba decide when and how to optimize:
from numba import jit
@jit
def f(x, y):
# A somewhat trivial example
return x + y
2) Eager compilation
You can also tell Numba the function signature you are expecting. The function f() would now look like:
from numba import jit, int32
@jit(int32(int32, int32))
def f(x, y):
# A somewhat trivial example
return x + y
This is useful if you want fine-grained control over types chosen by the compiler (for example, to use single-precision floats).
If you omit the return type, e.g. by writing (int32, int32) instead of int32(int32, int32), Numba will try to infer it for you.
2. Calling and inlining other functions
Numba-compiled functions can call other compiled functions.
The function calls may even be inlined in the native code, depending on optimizer heuristics.
@jit
def square(x):
return x ** 2
@jit
def hypot(x, y):
return math.sqrt(square(x) + square(y))
The @jit
decorator must be added to any such library function, otherwise Numba may generate much slower code.
3. Signature specifications
Explicit @jit
signatures can use a number of types. Here are some common ones:
void
int8
,uint8
,int16
,uint16
,int32
,uint32
,int64
,uint64
float32
,float64
complex64
,complex128
float32[:]
: one-dimensional single-precision array
int8[:, :]
: two-dimensional array of 8-bit integers
4. Complation options
A number of keyword-only arguments can be passed to the @jit
decorator.
Details: numba.jit()
1) nopython
(default: False
)
Numba has two compilation modes: nopython mode and object mode. The former produces much faster code, but has limitations that can force Numba to fall back to the latter. To prevent Numba from falling back, and instead raise an error, pass nopython=True
.
@jit(nopython=True)
def f(x, y):
return x + y
2) nogil
(default: False
)
Whenever Numba optimizes Python code to native code that only works on native types and variables (rather than Python objects), it is not necessary anymore to hold Python’s global interpreter lock (GIL). Numba will release the GIL when entering such a compiled function if you passed nogil=True
.
@jit(nogil=True)
def f(x, y):
return x + y
Code running with the GIL released runs concurrently with other threads executing Python or Numba code (either the same compiled function, or another one), allowing you to take advantage of multi-core systems. This will not be possible if the function is compiled in object mode.
When using nogil=True
, you’ll have to be wary of the usual pitfalls of multi-threaded programming (consistency, synchronization, race conditions, etc.).
3) cache
(default: False
)
To avoid compilation times each time you invoke a Python program, you can instruct Numba to write the result of function compilation into a file-based cache. This is done by passing cache=True:
@jit(cache=True)
def f(x, y):
return x + y
4) parallel
(default=False)
Enables automatic parallelization (and related optimizations) for those operations in the function known to have parallel semantics. For a list of supported operations, see Automatic parallelization with @jit
. This feature is enabled by passing parallel=True
and must be used in conjunction with nopython=True
:
@jit(nopython=True, parallel=True)
def f(x, y):
return x + y
IV. Flexible specializations with @generated_jit
Sometimes you want to write a function that has different implementations depending on its input types.
1. Example
Suppose you want to write a function which returns whether a given value is a “missing” value according to certain conventions.
import numpy as np
from numba import generated_jit, types
@generated_jit(nopython=True)
def is_missing(x):
"""
Return True if the value is missing, False otherwise.
"""
if isinstance(x, types.Float):
return lambda x: np.isnan(x)
elif isinstance(x, (types.NPDatetime, types.NPTimedelta)):
# The corresponding Not-a-Time value
missing = x('NaT')
return lambda x: x == missing
else:
return lambda x: False
2. Compilation options
Same keyword-only arguments as the jit()
decorator.
V. Creating Numpy universal functions
Details: http://numba.pydata.org/numba-doc/latest/user/vectorize.html
VI. Automatic parallelization with @jit
CPU only
…
VII. Numba for CUDA GPUs
1. Overview
1) Terminology
- host: the CPU
- device: the GPU
- host memory: the system main memory
- device memory: onboard memory on a GPU card
- kernels: a GPU function launched by the host and executed on the device
- device function: a GPU function executed on the device which can only be called from the device (i.e. from a kernel or another device function)