pygra.dataset

dataset.py — data loading and transform operations

class pygra.dataset.DataSet(path, step=1)[source]

Bases: object

Load a whitespace-delimited data file into a NumPy array.

Lines beginning with # and blank lines are silently ignored. Rows that cannot be fully converted to floats are skipped and recorded in skipped_rows.

Parameters:

path (str) – Absolute or relative path to the data file.
step (int)

path

Path passed to the constructor.

Type:: str

name

Filename component of path (basename).

Type:: str

raw

Parsed data rows as a nested list, populated during loading.

Type:: list of list of float

arr

2-D float64 array of shape (nrows, ncols). Shape is (0, 0) when no valid rows were found.

Type:: numpy.ndarray

skipped_rows

(line_number, raw_content) for every row that could not be parsed. Line numbers are 1-based.

Type:: list of tuple[int, str]

__init__(path, step=1)[source]

Parameters:

path (str) – Path to the data file to load.
step (int, optional) – Load every step-th row (default 1, no downsampling).

property ncols: int

Number of columns in the loaded data.

Returns:: arr.shape[1] when the array is 2-D and non-empty, otherwise 0.
Return type:: int

property nrows: int

Number of data rows.

Returns:: arr.shape[0].
Return type:: int

col(idx)[source]

Return a copy of column idx.

Parameters:: idx (int) – Column index (0-based).
Returns:: Copy of the column as a 1-D float64 array, or None if idx is out of range.
Return type:: numpy.ndarray or None

pygra.dataset.apply_transform(dataset, cfg)[source]

Apply a transform operation to one column of a DataSet.

Parameters:

dataset (DataSet) – The dataset to modify. Extended in-place when cfg["new_col"] is True; column overwritten when False.
cfg (dict) –
Transform configuration with the following keys:

"col"int
Target column index.

"op"str
Operation name; must be one of the strings in constants.TRANSFORM_OPS.

"val"float
Scalar operand for arithmetic and normalisation operations, or window size for the moving average.

"xcol"int, optional
x-column index; required when op is "numerical derivative (dy/dx)".

"new_col"bool, optional
If True (default) the result is appended as a new column; if False it overwrites column "col".

Returns:

1-D array containing the transform result.

Return type:

numpy.ndarray

Raises:

ValueError – If the target column does not exist, if division or normalisation by zero is attempted, or if op is not a recognised operation.