numcodecs_random_projection

numcodecs_random_projection

RPCodec for the numcodecs buffer compression API.

Modules:

  • mt_rng

Classes:

  • RPMethod

    Random projection method.

  • RPCodec

    Random projection codec for lossy compression of numerical data.

RPMethod

Bases: Enum

Random projection method.

dct class-attribute instance-attribute

dct = 'dct'

Discrete Cosine Transform (DCT).

Generate DxK projection matrix R using Type II Discrete Cosine Transform (DCT) basis 1.


  1. Amador, J. J. (2006). Random projection and orthonormality for lossy image compression. Image and Vision Computing, 25(5), 754–766. Available from: doi:10.1016/j.imavis.2006.05.018

gaussian class-attribute instance-attribute

gaussian = 'gaussian'

Gaussian random projection.

Generate random DxK matrix R with entries drawn from N(0, 1/sqrt(K)) distribution, which preserves expected distances according to Johnson-Lindenstrauss lemma.

RPCodec

RPCodec(
    mae: None | float = None,
    cr: None | float = None,
    k: None | int = None,
    method: str | RPMethod = dct,
    seed: None | int = None,
    max_block_memory: None | int = None,
    debug: bool = False,
)

Bases: Codec

Random projection codec for lossy compression of numerical data.

Compresses 2D finite floating point data by projecting it onto a lower-dimensional subspace using a specified method. The Discrete Cosine Transform (DCT) is used by default.

A two-dimensional array of shape N x D is encoded as an array of shape N x K, where k is either set explicitly or chosen based on the compression ratio cr. Alternatively, k can be estimated from the data during encoding by giving a loose Mean Absolute Error (MAE) bound.

Arrays that are not two-dimensional are automatically reshaped to be 2D as follows: - 0D scalar -> 1x1 - 1D array of shape D -> 1xD - 2D array of shape NxD -> NxD - >2D arrays: the dimensions are automatically partitioned into two subsets ...N and ...D that balance the product dimensions NxD; to use a different partitioning, you need to manually transpose and reshape the data into 2D before encoding it with this codec

Initialize Random Projection codec.

Parameters:
  • mae (None | float, default: None ) –

    Target mean absolute error. If specified, k will be estimated from data during encoding. Note that the bound is not guaranteed to be met.

  • cr (None | float, default: None ) –

    Target compression ratio. If specified, k will be calculated as D/cr where D is the number of features in the input data.

  • k (None | int, default: None ) –

    Number of dimensions in the projected space. Will be used over cr if both are specified. Estimated if mae is specified.

  • method (str | RPMethod, default: dct ) –

    Method for generating the projection matrix. Please refer to the RPMethod enumeration for all supported methods.

  • seed (None | int, default: None ) –

    Random seed for reproducible results. If [None], the seed is determined non-deterministically at encoding time.

  • max_block_memory (None | int, default: None ) –

    Maximum non-negative amount of memory, in bytes, that a projection matrix block should not exceed. If small or zero, the blocks will be as small as possible. If -1, the projection matrix is produced in one block, no matter how large. If None, the available amount of memory is determined non-deterministically at encoding time.

  • debug (bool, default: False ) –

    Whether debug information should be logged during encoding and decoding.

Raises:
  • ValueError

    If not exactly one of mae, cr, or k is set.

Methods:

  • encode

    Encode data using random projection.

  • decode

    Decode random projection encoded data.

  • get_config

    Get codec configuration.

codec_id class-attribute instance-attribute

codec_id: str = 'rp'

encode

encode(buf: Buffer) -> Buffer

Encode data using random projection.

During encode, the input data is standardized (mean=0, std=1) before projection.

If mae is specified, the number of projected dimensions k is estimated based on the standardized data.

Parameters:
  • buf (Buffer) –

    Input data buffer. Must be an n-dimensional floating-point array.

Returns:
  • enc( bytes ) –

    Serialized encoded data containing: - Standardized data statistics (mean, std) - Original data shape and dtype - Projected data - Compression parameters

decode

decode(buf: Buffer, out: None | Buffer = None) -> Buffer

Decode random projection encoded data.

During decode, the standardized data is reconstructed and denormalized.

Parameters:
  • buf (Buffer) –

    Encoded data from RPCodec.

  • out (Buffer, default: None ) –

    Writeable buffer to store decoded data.

Returns:
  • dec( Buffer ) –

    Reconstructed data with original shape and dtype.

get_config

get_config() -> dict

Get codec configuration.

Returns: dict: Codec configuration.