pygra.dataset
dataset.py — data loading and transform operations
- class pygra.dataset.DataSet(path, step=1)[source]
Bases:
objectLoad a whitespace-delimited data file into a NumPy array.
Lines beginning with
#and blank lines are silently ignored. Rows that cannot be fully converted to floats are skipped and recorded inskipped_rows.- Parameters:
path (str) – Absolute or relative path to the data file.
step (int)
- path
Path passed to the constructor.
- Type:
str
- name
Filename component of path (basename).
- Type:
str
- raw
Parsed data rows as a nested list, populated during loading.
- Type:
list of list of float
- arr
2-D float64 array of shape
(nrows, ncols). Shape is(0, 0)when no valid rows were found.- Type:
numpy.ndarray
- skipped_rows
(line_number, raw_content)for every row that could not be parsed. Line numbers are 1-based.- Type:
list of tuple[int, str]
- __init__(path, step=1)[source]
- Parameters:
path (str) – Path to the data file to load.
step (int, optional) – Load every step-th row (default 1, no downsampling).
- property ncols: int
Number of columns in the loaded data.
- Returns:
arr.shape[1]when the array is 2-D and non-empty, otherwise0.- Return type:
int
- property nrows: int
Number of data rows.
- Returns:
arr.shape[0].- Return type:
int
- pygra.dataset.apply_transform(dataset, cfg)[source]
Apply a transform operation to one column of a DataSet.
- Parameters:
dataset (DataSet) – The dataset to modify. Extended in-place when
cfg["new_col"]isTrue; column overwritten whenFalse.cfg (dict) –
Transform configuration with the following keys:
"col"intTarget column index.
"op"strOperation name; must be one of the strings in
constants.TRANSFORM_OPS."val"floatScalar operand for arithmetic and normalisation operations, or window size for the moving average.
"xcol"int, optionalx-column index; required when op is
"numerical derivative (dy/dx)"."new_col"bool, optionalIf
True(default) the result is appended as a new column; ifFalseit overwrites column"col".
- Returns:
1-D array containing the transform result.
- Return type:
numpy.ndarray
- Raises:
ValueError – If the target column does not exist, if division or normalisation by zero is attempted, or if op is not a recognised operation.