pygra.dataset

dataset.py — data loading and transform operations

class pygra.dataset.DataSet(path, step=1)[source]

Bases: object

Load a whitespace-delimited data file into a NumPy array.

Lines beginning with # and blank lines are silently ignored. Rows that cannot be fully converted to floats are skipped and recorded in skipped_rows.

Parameters:
  • path (str) – Absolute or relative path to the data file.

  • step (int)

path

Path passed to the constructor.

Type:

str

name

Filename component of path (basename).

Type:

str

raw

Parsed data rows as a nested list, populated during loading.

Type:

list of list of float

arr

2-D float64 array of shape (nrows, ncols). Shape is (0, 0) when no valid rows were found.

Type:

numpy.ndarray

skipped_rows

(line_number, raw_content) for every row that could not be parsed. Line numbers are 1-based.

Type:

list of tuple[int, str]

__init__(path, step=1)[source]
Parameters:
  • path (str) – Path to the data file to load.

  • step (int, optional) – Load every step-th row (default 1, no downsampling).

property ncols: int

Number of columns in the loaded data.

Returns:

arr.shape[1] when the array is 2-D and non-empty, otherwise 0.

Return type:

int

property nrows: int

Number of data rows.

Returns:

arr.shape[0].

Return type:

int

col(idx)[source]

Return a copy of column idx.

Parameters:

idx (int) – Column index (0-based).

Returns:

Copy of the column as a 1-D float64 array, or None if idx is out of range.

Return type:

numpy.ndarray or None

pygra.dataset.apply_transform(dataset, cfg)[source]

Apply a transform operation to one column of a DataSet.

Parameters:
  • dataset (DataSet) – The dataset to modify. Extended in-place when cfg["new_col"] is True; column overwritten when False.

  • cfg (dict) –

    Transform configuration with the following keys:

    "col"int

    Target column index.

    "op"str

    Operation name; must be one of the strings in constants.TRANSFORM_OPS.

    "val"float

    Scalar operand for arithmetic and normalisation operations, or window size for the moving average.

    "xcol"int, optional

    x-column index; required when op is "numerical derivative (dy/dx)".

    "new_col"bool, optional

    If True (default) the result is appended as a new column; if False it overwrites column "col".

Returns:

1-D array containing the transform result.

Return type:

numpy.ndarray

Raises:

ValueError – If the target column does not exist, if division or normalisation by zero is attempted, or if op is not a recognised operation.