Numba 0.48.0 documentation

 

이 글은 Numba 0.48.0 (Jan 27, 2020) documentation를 정리한 글입니다.


I. Overview

1. Numba

Numba
A compiler for Python array and numerical functions written directly in Python

Numba generates optimized machine code from pure Python code using the LLVM compiler infrastructure.
With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time optimized to performance similar as C, C++ and Fortran, without having to switch languages or Python interpreters.

2. Main features

  1. on-the-fly code generation(JIT) (at import time or runtime)
  2. native code generation for the CPU and GPU
  3. integration with the Python scientific software stack

II. Installation

$ pip install --ignore-installed numba

Details: http://numba.pydata.org/numba-doc/latest/user/installing.html

III. Compiling Python code with @jit

Numba provides several utilities for code generation, but its central feature is the numba.jit() decorator.
Using this decorator, you can mark a function for optimization by Numba’s JIT compiler.

1. Basic usage

1) Lazy compilation

The recommended way to use the @jit decorator is to let Numba decide when and how to optimize:

from numba import jit

@jit
def f(x, y):
    # A somewhat trivial example
    return x + y

2) Eager compilation

You can also tell Numba the function signature you are expecting. The function f() would now look like:

from numba import jit, int32

@jit(int32(int32, int32))
def f(x, y):
    # A somewhat trivial example
    return x + y

This is useful if you want fine-grained control over types chosen by the compiler (for example, to use single-precision floats).
If you omit the return type, e.g. by writing (int32, int32) instead of int32(int32, int32), Numba will try to infer it for you.

2. Calling and inlining other functions

Numba-compiled functions can call other compiled functions.
The function calls may even be inlined in the native code, depending on optimizer heuristics.

@jit
def square(x):
    return x ** 2

@jit
def hypot(x, y):
    return math.sqrt(square(x) + square(y))

The @jit decorator must be added to any such library function, otherwise Numba may generate much slower code.

3. Signature specifications

Explicit @jit signatures can use a number of types. Here are some common ones:

  1. void
  2. int8, uint8, int16, uint16, int32, uint32, int64, uint64
  3. float32, float64
  4. complex64, complex128
  5. float32[:]: one-dimensional single-precision array
    int8[:, :]: two-dimensional array of 8-bit integers

4. Complation options

A number of keyword-only arguments can be passed to the @jit decorator.
Details: numba.jit()

1) nopython(default: False)

Numba has two compilation modes: nopython mode and object mode. The former produces much faster code, but has limitations that can force Numba to fall back to the latter. To prevent Numba from falling back, and instead raise an error, pass nopython=True.

@jit(nopython=True)
def f(x, y):
    return x + y

2) nogil(default: False)

Whenever Numba optimizes Python code to native code that only works on native types and variables (rather than Python objects), it is not necessary anymore to hold Python’s global interpreter lock (GIL). Numba will release the GIL when entering such a compiled function if you passed nogil=True.

@jit(nogil=True)
def f(x, y):
    return x + y

Code running with the GIL released runs concurrently with other threads executing Python or Numba code (either the same compiled function, or another one), allowing you to take advantage of multi-core systems. This will not be possible if the function is compiled in object mode.

When using nogil=True, you’ll have to be wary of the usual pitfalls of multi-threaded programming (consistency, synchronization, race conditions, etc.).

3) cache(default: False)

To avoid compilation times each time you invoke a Python program, you can instruct Numba to write the result of function compilation into a file-based cache. This is done by passing cache=True:

@jit(cache=True)
def f(x, y):
    return x + y

4) parallel(default=False)

Enables automatic parallelization (and related optimizations) for those operations in the function known to have parallel semantics. For a list of supported operations, see Automatic parallelization with @jit. This feature is enabled by passing parallel=True and must be used in conjunction with nopython=True:

@jit(nopython=True, parallel=True)
def f(x, y):
    return x + y

IV. Flexible specializations with @generated_jit

Sometimes you want to write a function that has different implementations depending on its input types.

1. Example

Suppose you want to write a function which returns whether a given value is a “missing” value according to certain conventions.

import numpy as np
from numba import generated_jit, types

@generated_jit(nopython=True)
def is_missing(x):
    """
    Return True if the value is missing, False otherwise.
    """
    if isinstance(x, types.Float):
        return lambda x: np.isnan(x)
    elif isinstance(x, (types.NPDatetime, types.NPTimedelta)):
        # The corresponding Not-a-Time value
        missing = x('NaT')
        return lambda x: x == missing
    else:
        return lambda x: False

2. Compilation options

Same keyword-only arguments as the jit() decorator.

V. Creating Numpy universal functions

Details: http://numba.pydata.org/numba-doc/latest/user/vectorize.html

VI. Automatic parallelization with @jit

CPU only

VII. Numba for CUDA GPUs

1. Overview

1) Terminology

  • host: the CPU
  • device: the GPU
  • host memory: the system main memory
  • device memory: onboard memory on a GPU card
  • kernels: a GPU function launched by the host and executed on the device
  • device function: a GPU function executed on the device which can only be called from the device (i.e. from a kernel or another device function)