Reference: Datasets

Note

Since you’re here, you probably deserve to know how the source code is organised. Dataset classes are implemented using mixins. It’s not pretty, but I think it’s the best way, and I spent a long time thinking about this. I’ve tried to follow the principles outlined in Chapter 12 of Luciano Ramalho’s Fluent Python, and I think I’ve succeeded.

Anyway, what happens is that we have a _Dataset superclass which contains all the top-level behaviour that is shared between all datasets. As of the time of writing, this behaviour is only the dictionary-style parameter lookup. After that, we have a series of mixins for 1D raw data, 2D raw data, 1D processed data, 2D processed data, 1D plotting, and 2D plotting. (All of these mixins have very recognisable names in the source code.) The Dataset1D class simply inherits the methods that it needs to. In this case it would inherit from _Dataset, _1D_RawDataMixin, _1D_ProcDataMixin, and _1D_PlotMixin. Likewise, the Dataset2D class inherits all the 2D methods.

This seems to be an utterly stupid use of mixins until you realise that Dataset1DProj inherits from _2D_RawDataMixin, _1D_ProcDataMixin, and _1D_PlotMixin. In other words, it has behaviour that sometimes mimics the 2D data, and sometimes mimics 1D data. So I chose to use multiple inheritance as a way of not repeating code. I’ve provided “aggregate classes” which have useful groups of methods, and the user should never actually have to deal with any of the issues associated with multiple inheritance.

class penguins.dataset._Dataset(path, **kwargs)

Defines behaviour that is common to all datasets. Specifically, this class defines the dictionary-style lookup of NMR parameters.

This should never be instantiated directly! Use read() for that instead.

Attributes:
pars_parDict

Case-insensitive dictionary in which the parameters are stored. See the _parDict documentation for more details. This should never be accessed directly as the _Dataset special methods (__getitem__, etc.) are routed to the underlying _parDict.


class penguins.dataset._parDict(path)

Modified dictionary for storing acquisition & processing parameters.

Parameter names are stored as lower-case strings. When looking up a parameter, if its value is not already stored, the dictionary will look it up in the associated TopSpin parameter files. Subsequently, the value is cached.

Therefore, the dictionary can be treated as if it were already fully populated when initialised. However, because values are only read and stored on demand, we avoid also cluttering it with a whole range of useless parameters upon initialisation.

The lookup function attempts to be clever and convert the parameters to floats if possible; otherwise, the parameters are stored as strings. There are a number of exceptions to this rule, such as TD and SI, which should be ints and not floats. The list of such parameters is extracted from the TopSpin documentation.

For 2D spectra, string parameters are stored as a tuple of (f1_value, f2_value). Float and int parameters are stored as a ndarray to facilitate elementwise manipulations (e.g. calculating O1P in both dimensions at one go).


1D mixins

class penguins.dataset._1D_RawDataMixin

Defines behaviour that is applicable for 1D raw data, i.e. fid files. Also contains a few private methods which initialise (for example) the paths to the parameter files.

Attributes:
fidndarray

Complex-valued FID.

raw_data(shift_grpdly=False)

Returns the FID as a complex ndarray.

Parameters:
shift_grpdlybool, default False

Whether to circularly shift the group delay to the end of the FID, i.e. take the first N points (where N is given by the TopSpin GRPDLY parameter) and move them to the end of the FID.


class penguins.dataset._1D_ProcDataMixin

Defines behaviour that is applicable for 1D processed data.

Attributes:
realndarray

Real-valued ndarray containing the real spectrum. real[0] contains the left-most point of the spectrum, i.e. the greatest chemical shift.

imagndarray

Real-valued ndarray containing the imaginary spectrum. imag[0] contains the left-most point of the spectrum, i.e. the greatest chemical shift.

This attribute only applies to Dataset1D instances. The projection classes do not have this attribute.

proc_data(bounds='', component='real')

Returns the processed spectrum as a real-valued ndarray. By default this returns the real part of the spectrum, but this can be changed using the component argument.

Note that if (for example) a magnitude mode calculation has been performed, then the “real” part is actually the magnitude mode spectrum. In short, the “real” part is whatever is stored in the 1r file.

Parameters:
boundsstr or (float, float), optional

Bounds can be specified as a string lower..upper or a tuple of floats (lower, upper), upon which the appropriate slice of the spectrum will be taken.

componentstr from {“real”, “r”, “imag”, “i”} (default “real”)

The component of the processed data to return. “real” or “r” return the real part of the spectrum, the others return the imaginary part.

Returns:
ndarray

The spectrum or the slice of interest.

integrate(peak=None, margin=None, bounds=None, mode='sum')

Integrates a region of a spectrum.

Regions can either be defined via peak and margin, which leads to the region of (peak - margin) to (peak + margin), or manually via the bounds parameter. Note that specifying (peak, margin) overrules the bounds parameter if both are passed.

Parameters:
peakfloat, optional

The chemical shift of the peak of interest.

marginfloat, optional

The integration margin which extends on either side of the peak.

boundsstr or (float, float), optional

Integration bounds which can be directly specified in the usual format. Note that passing (peak, margin) will overrule this parameter.

mode{“sum”, “max”, “min”}, optional

Mode of integration. sum (the default) directly adds up all points in the region, max finds the greatest intensity, and min finds the lowest intensity.

Returns:
float

The value of the integral.

bounds_to_slice(bounds='')

Converts a string lower..upper or a tuple of chemical shifts (upper, lower) to a slice object, which can be used to slice a spectrum ndarray. Note that upper must be greater than lower.

to_magnitude()

Calculates the magnitude mode spectrum and returns it as a new Dataset1D object.

mc()

Alias for to_magnitude().


class penguins.dataset._1D_PlotMixin

Defines 1D plotting methods.

stage(*args, **kwargs)

Calls penguins.pgplot._stage1d() on the dataset.


2D mixins

class penguins.dataset._2D_RawDataMixin

Defines behaviour that is applicable for 2D raw data, i.e. ser files.

There are no functions which actually read the ser file (I haven’t implemented those yet, as I’ve never needed it), but this mixin defines a few private methods which initialise (for example) the paths to the parameter files, so it’s not useless at all.


class penguins.dataset._2D_ProcDataMixin
proc_data(f1_bounds='', f2_bounds='', component='rr')

Returns the processed 2D data as a two-dimensional, real-valued ndarray. By default this returns the real part of the spectrum (the ‘RR quadrant’), but this can be changed using the component argument.

Note that if a magnitude mode calculation has been performed, this will return the magnitude mode spectrum (i.e. it returns whatever is in TopSpin’s 2rr file).

Parameters:
f1_boundsstr or (float, float), optional

Bounds for the indirect dimension.

f2_boundsstr or (float, float), optional

Bounds for the direct dimension.

componentstr from {“rr”, “ri”, “ir”, “ii”} (default “ii”)

The quadrant of the processed data to return.

Returns:
ndarray

The processed 2D data, or the section of interest.

integrate(peak=None, margin=None, f1_bounds=None, f2_bounds=None, mode='sum')

Integrates a region of a spectrum.

The interface is exactly analogous to the 1D version (integrate()), except that peak and margin now need to be specified as tuples of (f1_shift, f2_shift), or bounds must be specified as f1_bounds and f2_bounds separately.

Parameters:
peak(float, float), optional

The chemical shifts of the peak of interest.

margin(float, float), optional

The integration margins which extends on all sides of the peak. The first number refers to the margin in the indirect dimension, the second the margin in the direct dimension.

f1_boundsstr or (float, float), optional

Integration bounds for the indirect dimension which can be directly specified in the usual format. Note that passing (peak, margin) will overrule the f1_bounds and f2_bounds parameters.

f2_boundsstr or (float, float), optional

Integration bounds for the direct dimension which can be directly specified in the usual format.

mode{“sum”, “max”, “min”}, optional

Mode of integration. sum (the default) directly adds up all points in the region, max finds the greatest intensity, and min finds the lowest intensity.

Returns:
float

The value of the integral.

bounds_to_slice(axis, bounds='')

Converts a string lower..upper or a tuple of chemical shifts (upper, lower) to a slice object, which can be used to slice a spectrum ndarray.

Parameters:
axisint from {0, 1}

0 for indirect dimension, 1 for direct dimension.

boundsstr or (float, float), optional

Bounds given in the usual format.

Returns:
slice

Slice object for the requested axis.

to_magnitude(axis)

Calculates the magnitude mode spectrum along the specified axis and returns it as a new Dataset2D object.

Parameters:
axisint from {0, 1}

The axis along which to perform the magnitude calculation. 0 for f1, or 1 for f2.

xf1m()

Alias for to_magnitude(axis=0), i.e. magnitude mode calculation along f1.

xf2m()

Alias for to_magnitude(axis=1), i.e. magnitude mode calculation along f2.

xfbm()

Performs magnitude mode calculation along both axes. ds.xfbm() is equivalent to ds.xf1m().xf2m(). It is manually implemented here for efficiency reasons.


class penguins.dataset._2D_PlotMixin

Defines 2D plotting methods.

stage(*args, **kwargs)

Calls penguins.pgplot._stage2d() on the dataset.

find_baselev(*args, **kwargs)

Calls penguins.pgplot._find_baselev() on the dataset.


Actual Dataset classes

These are the classes that the user will see. Even then, much of the interface is abstracted away: for example, the staging and plotting functions have a unified interface that delegate to different methods behind the scenes depending on the object that is being staged / plotted.

class penguins.dataset.Dataset1D(path, **kwargs)

Dataset object representing 1D spectra.

Inherits from: _1D_RawDataMixin, _1D_ProcDataMixin, _1D_PlotMixin, and _Dataset.

ppm_to_index(ppm)

Converts a chemical shift into the index which is closest to the chemical shift.

Parameters:
ppmfloat (optional)

The chemical shift of interest.

Returns:
indexint

The index, or None if ppm is None.

ppm_scale(bounds='')

Constructs an ndarray of the chemical shifts at each point of the spectrum, in descending order of chemical shift.

This is used in generating the x-values for plotting.

Parameters:
boundsstr or (float, float), optional

Bounds specified in the usual manner.

Returns:
scalendarray

The appropriate slice of chemical shifts.

hz_scale(bounds='')

Constructs an ndarray of the frequencies (in units of Hz) at each point of the spectrum, in descending order of frequency.

Parameters:
boundsstr or (float, float), optional

Bounds specified in the usual manner.

Returns:
scalendarray

The appropriate slice of frequencies.

nuclei_to_str()

Returns a string with the nucleus nicely formatted in LaTeX syntax. Can be directly used with e.g. matplotlib.


class penguins.dataset.Dataset1DProj(path, **kwargs)

Dataset object representing 1D projections or slices of 2D spectra, which have been generated inside TopSpin.

Inherits from: _2D_RawDataMixin, _1D_ProcDataMixin, _1D_PlotMixin, and _Dataset.

Notes

The implementation of these methods has to be different from the equivalent methods on Dataset1D, because the parameters (e.g. O1, SW) are read as 2-element arrays (for both dimensions) but the returned value must select the correct projection axis.

ppm_to_index(ppm)

Converts a chemical shift into the index which is closest to the chemical shift.

Parameters:
ppmfloat (optional)

The chemical shift of interest.

Returns:
indexint

The index, or None if ppm is None.

ppm_scale(bounds='')

Constructs an ndarray of the chemical shifts at each point of the spectrum, in descending order of chemical shift.

This is used in generating the x-values for plotting.

Parameters:
boundsstr or (float, float), optional

Bounds specified in the usual manner.

Returns:
scalendarray

The appropriate slice of chemical shifts.

hz_scale(bounds='')

Constructs an ndarray of the frequencies (in units of Hz) at each point of the spectrum, in descending order of frequency.

Parameters:
boundsstr or (float, float), optional

Bounds specified in the usual manner.

Returns:
scalendarray

The appropriate slice of frequencies.

nuclei_to_str()

Returns a string with the nucleus nicely formatted in LaTeX syntax. Can be directly used with e.g. matplotlib.


class penguins.dataset.Dataset2D(path, **kwargs)

Dataset object representing 2D spectra.

Inherits from: _2D_RawDataMixin, _2D_ProcDataMixin, _2D_PlotMixin, and _Dataset.

ppm_to_index(axis, ppm)

Converts a chemical shift into the index which is closest to the chemical shift.

Parameters:
axisint

0 for f1 (indirect dimension), 1 for f2 (direct dimension).

ppmfloat (optional)

The chemical shift of interest.

Returns:
indexint

The index, or None if ppm is None.

ppm_scale(axis, bounds='')

Constructs an ndarray of the chemical shifts at each point of the spectrum, in descending order of chemical shift.

This is used in generating the x- and y-values for plotting.

Parameters:
axisint

0 for f1 (indirect dimension), 1 for f2 (direct dimension).

boundsstr or (float, float), optional

Bounds specified in the usual manner.

Returns:
scalendarray

The appropriate slice of chemical shifts.

hz_scale(axis, bounds='')

Constructs an ndarray of the frequencies (in units of Hz) at each point of the spectrum, in descending order of frequency.

Parameters:
axisint

0 for f1 (indirect dimension), 1 for f2 (direct dimension).

boundsstr or (float, float), optional

Bounds specified in the usual manner.

Returns:
scalendarray

The appropriate slice of frequencies.

project(axis, sign, bounds='')

Make a 1D projection from a 2D spectrum.

Parameters:
axisint or str from {0, “column”, 1, “row”}

The axis to project onto, 0 / “column” being f1 and 1 / “row” being f2. This can be very confusing, so an example will help.

Projections onto f1 will collapse multiple columns into one column. This should be done by passing 0 or column as the axis argument. For example, if you used this on a C–H HSQC, you would get a projection with <sup>13</sup>C chemical shifts.

signstr from {“positive”, “pos”, “negative”, “neg”}

The sign desired. Using positive (or the short form pos) means that the greatest point along the collapsed axis will be taken, and vice versa for negative/neg.

boundsstr or (float, float), optional

Bounds specified in the usual manner, representing the segment of chemical shifts that should be collapsed. That is to say, if you are projecting onto f2, then bounds would represent the section of f1 chemical shifts to collapse. If not provided, then defaults to the entire range of chemical shifts along the collapsed axis.

Returns:
projDataset1DProjVirtual

A Dataset1DProjVirtual object that is similar in every way to a typical Dataset1DProj and can be plotted, integrated, etc. in the same manner. The actual projection can be accessed using _1D_ProcDataMixin.proc_data, which Dataset1DProj inherits.

f1projp(bounds='')

Alias for project(axis="column", sign="pos"). See project.

f1projn(bounds='')

Alias for project(axis="column", sign="neg"). See project.

f2projp(bounds='')

Alias for project(axis="row", sign="pos"). See project.

f2projn(bounds='')

Alias for project(axis="row", sign="neg"). See project.

sum(axis, bounds='')

Make a 1D sum from a 2D spectrum.

Parameters:
axisint or str from {0, “column”, 1, “row”}

The axis to sum onto. 0 / column is f1 (i.e. adding up multiple columns) and 1 / row is f2 (i.e. adding up multiple rows).

boundsstr or (float, float), optional

Bounds specified in the usual manner, representing the segment of chemical shifts that should be collapsed. That is to say, if you are projecting onto f2, then bounds would represent the section of f1 chemical shifts to collapse. If not provided, then defaults to the entire range of chemical shifts along the collapsed axis.

Returns:
projDataset1DProjVirtual

A Dataset1DProjVirtual object that is similar in every way to a typical Dataset1DProj and can be plotted, integrated, etc. in the same manner. The actual sum can be accessed using _1D_ProcDataMixin.proc_data, which Dataset1DProj inherits.

f1sum(bounds='')

Alias for sum(axis="column"). See sum.

f2sum(bounds='')

Alias for sum(axis="row"). See sum.

slice(axis=None, ppm=None, f1=None, f2=None)

Extract a 1D slice from a 2D spectrum. You must either specify both axis and ppm arguments, or f1 only, or f2 only.

Parameters:
axisstr from {“column”, “row”}, optional

Axis to slice along. To extract a column (i.e. at one particular value of f2), use column, and vice versa.

ppmfloat, optional

The chemical shift to slice at. For example, if you are extracting a column, then this would be the f2 chemical shift of interest.

f1float, optional

slice(f1=y) is an alias for slice(axis=”row”, ppm=y). If specified, this overrules the axis and ppm keyword arguments. Cannot be used together with f2.

f2float, optional

slice(f2=x) is an alias for slice(axis=”column”, ppm=x). If specified, this overrules the axis and ppm keyword arguments. Cannot be used together with f1.

Returns:
projDataset1DProjVirtual

A Dataset1DProjVirtual object that is similar in every way to a typical Dataset1DProj and can be plotted, integrated, etc. in the same manner. The actual projection or sum can be accessed using _1D_ProcDataMixin.proc_data, which Dataset1DProj inherits.


class penguins.dataset.Dataset1DProjVirtual(path, rr, sign=None, index_bounds=None, index=None, **kwargs)

Dataset representing 1D projections which have been constructed by calling the project, slice, or sum methods (or their short forms) on Dataset2D objects.

This is a subclass of Dataset1DProj, so the available methods are exactly the same.

See the __init__() docstring for implementation details.