Reference: Datasets¶
Note
Since you’re here, you probably deserve to know how the source code is organised. Dataset classes are implemented using mixins. It’s not pretty, but I think it’s the best way, and I spent a long time thinking about this. I’ve tried to follow the principles outlined in Chapter 12 of Luciano Ramalho’s Fluent Python, and I think I’ve succeeded.
Anyway, what happens is that we have a _Dataset
superclass which contains all the top-level behaviour that is shared between all datasets.
As of the time of writing, this behaviour is only the dictionary-style parameter lookup.
After that, we have a series of mixins for 1D raw data, 2D raw data, 1D processed data, 2D processed data, 1D plotting, and 2D plotting.
(All of these mixins have very recognisable names in the source code.)
The Dataset1D
class simply inherits the methods that it needs to.
In this case it would inherit from _Dataset
, _1D_RawDataMixin
, _1D_ProcDataMixin
, and _1D_PlotMixin
.
Likewise, the Dataset2D
class inherits all the 2D methods.
This seems to be an utterly stupid use of mixins until you realise that Dataset1DProj
inherits from _2D_RawDataMixin
, _1D_ProcDataMixin
, and _1D_PlotMixin
.
In other words, it has behaviour that sometimes mimics the 2D data, and sometimes mimics 1D data.
So I chose to use multiple inheritance as a way of not repeating code.
I’ve provided “aggregate classes” which have useful groups of methods, and the user should never actually have to deal with any of the issues associated with multiple inheritance.
- class penguins.dataset._Dataset(path, **kwargs)¶
Defines behaviour that is common to all datasets. Specifically, this class defines the dictionary-style lookup of NMR parameters.
This should never be instantiated directly! Use
read()
for that instead.
- class penguins.dataset._parDict(path)¶
Modified dictionary for storing acquisition & processing parameters.
Parameter names are stored as lower-case strings. When looking up a parameter, if its value is not already stored, the dictionary will look it up in the associated TopSpin parameter files. Subsequently, the value is cached.
Therefore, the dictionary can be treated as if it were already fully populated when initialised. However, because values are only read and stored on demand, we avoid also cluttering it with a whole range of useless parameters upon initialisation.
The lookup function attempts to be clever and convert the parameters to floats if possible; otherwise, the parameters are stored as strings. There are a number of exceptions to this rule, such as
TD
andSI
, which should be ints and not floats. The list of such parameters is extracted from the TopSpin documentation.For 2D spectra, string parameters are stored as a tuple of (f1_value, f2_value). Float and int parameters are stored as a
ndarray
to facilitate elementwise manipulations (e.g. calculatingO1P
in both dimensions at one go).
1D mixins¶
- class penguins.dataset._1D_RawDataMixin¶
Defines behaviour that is applicable for 1D raw data, i.e.
fid
files. Also contains a few private methods which initialise (for example) the paths to the parameter files.- Attributes:
fid
ndarrayComplex-valued FID.
- class penguins.dataset._1D_ProcDataMixin¶
Defines behaviour that is applicable for 1D processed data.
- Attributes:
- realndarray
Real-valued
ndarray
containing the real spectrum.real[0]
contains the left-most point of the spectrum, i.e. the greatest chemical shift.- imagndarray
Real-valued
ndarray
containing the imaginary spectrum.imag[0]
contains the left-most point of the spectrum, i.e. the greatest chemical shift.This attribute only applies to
Dataset1D
instances. The projection classes do not have this attribute.
- proc_data(bounds='', component='real')¶
Returns the processed spectrum as a real-valued
ndarray
. By default this returns the real part of the spectrum, but this can be changed using thecomponent
argument.Note that if (for example) a magnitude mode calculation has been performed, then the “real” part is actually the magnitude mode spectrum. In short, the “real” part is whatever is stored in the
1r
file.- Parameters:
- boundsstr or (float, float), optional
Bounds can be specified as a string
lower..upper
or a tuple of floats(lower, upper)
, upon which the appropriate slice of the spectrum will be taken.- componentstr from {“real”, “r”, “imag”, “i”} (default “real”)
The component of the processed data to return. “real” or “r” return the real part of the spectrum, the others return the imaginary part.
- Returns:
- ndarray
The spectrum or the slice of interest.
- integrate(peak=None, margin=None, bounds=None, mode='sum')¶
Integrates a region of a spectrum.
Regions can either be defined via peak and margin, which leads to the region of (peak - margin) to (peak + margin), or manually via the bounds parameter. Note that specifying (peak, margin) overrules the bounds parameter if both are passed.
- Parameters:
- peakfloat, optional
The chemical shift of the peak of interest.
- marginfloat, optional
The integration margin which extends on either side of the peak.
- boundsstr or (float, float), optional
Integration bounds which can be directly specified in the usual format. Note that passing (peak, margin) will overrule this parameter.
- mode{“sum”, “max”, “min”}, optional
Mode of integration.
sum
(the default) directly adds up all points in the region,max
finds the greatest intensity, andmin
finds the lowest intensity.
- Returns:
- float
The value of the integral.
- bounds_to_slice(bounds='')¶
Converts a string
lower..upper
or a tuple of chemical shifts(upper, lower)
to a slice object, which can be used to slice a spectrumndarray
. Note thatupper
must be greater thanlower
.
- to_magnitude()¶
Calculates the magnitude mode spectrum and returns it as a new Dataset1D object.
- mc()¶
Alias for
to_magnitude()
.
- class penguins.dataset._1D_PlotMixin¶
Defines 1D plotting methods.
- stage(*args, **kwargs)¶
Calls
penguins.pgplot._stage1d()
on the dataset.
2D mixins¶
- class penguins.dataset._2D_RawDataMixin¶
Defines behaviour that is applicable for 2D raw data, i.e.
ser
files.There are no functions which actually read the
ser
file (I haven’t implemented those yet, as I’ve never needed it), but this mixin defines a few private methods which initialise (for example) the paths to the parameter files, so it’s not useless at all.
- class penguins.dataset._2D_ProcDataMixin¶
- proc_data(f1_bounds='', f2_bounds='', component='rr')¶
Returns the processed 2D data as a two-dimensional, real-valued
ndarray
. By default this returns the real part of the spectrum (the ‘RR quadrant’), but this can be changed using thecomponent
argument.Note that if a magnitude mode calculation has been performed, this will return the magnitude mode spectrum (i.e. it returns whatever is in TopSpin’s
2rr
file).- Parameters:
- f1_boundsstr or (float, float), optional
Bounds for the indirect dimension.
- f2_boundsstr or (float, float), optional
Bounds for the direct dimension.
- componentstr from {“rr”, “ri”, “ir”, “ii”} (default “ii”)
The quadrant of the processed data to return.
- Returns:
- ndarray
The processed 2D data, or the section of interest.
- integrate(peak=None, margin=None, f1_bounds=None, f2_bounds=None, mode='sum')¶
Integrates a region of a spectrum.
The interface is exactly analogous to the 1D version (
integrate()
), except that peak and margin now need to be specified as tuples of (f1_shift, f2_shift), or bounds must be specified as f1_bounds and f2_bounds separately.- Parameters:
- peak(float, float), optional
The chemical shifts of the peak of interest.
- margin(float, float), optional
The integration margins which extends on all sides of the peak. The first number refers to the margin in the indirect dimension, the second the margin in the direct dimension.
- f1_boundsstr or (float, float), optional
Integration bounds for the indirect dimension which can be directly specified in the usual format. Note that passing (peak, margin) will overrule the f1_bounds and f2_bounds parameters.
- f2_boundsstr or (float, float), optional
Integration bounds for the direct dimension which can be directly specified in the usual format.
- mode{“sum”, “max”, “min”}, optional
Mode of integration.
sum
(the default) directly adds up all points in the region,max
finds the greatest intensity, andmin
finds the lowest intensity.
- Returns:
- float
The value of the integral.
- bounds_to_slice(axis, bounds='')¶
Converts a string
lower..upper
or a tuple of chemical shifts(upper, lower)
to a slice object, which can be used to slice a spectrumndarray
.- Parameters:
- axisint from {0, 1}
0 for indirect dimension, 1 for direct dimension.
- boundsstr or (float, float), optional
Bounds given in the usual format.
- Returns:
- slice
Slice object for the requested axis.
- to_magnitude(axis)¶
Calculates the magnitude mode spectrum along the specified axis and returns it as a new Dataset2D object.
- Parameters:
- axisint from {0, 1}
The axis along which to perform the magnitude calculation. 0 for f1, or 1 for f2.
- xf1m()¶
Alias for
to_magnitude(axis=0)
, i.e. magnitude mode calculation along f1.
- xf2m()¶
Alias for
to_magnitude(axis=1)
, i.e. magnitude mode calculation along f2.
- xfbm()¶
Performs magnitude mode calculation along both axes. ds.xfbm() is equivalent to ds.xf1m().xf2m(). It is manually implemented here for efficiency reasons.
- class penguins.dataset._2D_PlotMixin¶
Defines 2D plotting methods.
- stage(*args, **kwargs)¶
Calls
penguins.pgplot._stage2d()
on the dataset.
- find_baselev(*args, **kwargs)¶
Calls
penguins.pgplot._find_baselev()
on the dataset.
Actual Dataset classes¶
These are the classes that the user will see. Even then, much of the interface is abstracted away: for example, the staging and plotting functions have a unified interface that delegate to different methods behind the scenes depending on the object that is being staged / plotted.
- class penguins.dataset.Dataset1D(path, **kwargs)¶
Dataset object representing 1D spectra.
Inherits from:
_1D_RawDataMixin
,_1D_ProcDataMixin
,_1D_PlotMixin
, and_Dataset
.- ppm_to_index(ppm)¶
Converts a chemical shift into the index which is closest to the chemical shift.
- Parameters:
- ppmfloat (optional)
The chemical shift of interest.
- Returns:
- indexint
The index, or None if ppm is None.
- ppm_scale(bounds='')¶
Constructs an
ndarray
of the chemical shifts at each point of the spectrum, in descending order of chemical shift.This is used in generating the x-values for plotting.
- Parameters:
- boundsstr or (float, float), optional
Bounds specified in the usual manner.
- Returns:
- scalendarray
The appropriate slice of chemical shifts.
- hz_scale(bounds='')¶
Constructs an
ndarray
of the frequencies (in units of Hz) at each point of the spectrum, in descending order of frequency.- Parameters:
- boundsstr or (float, float), optional
Bounds specified in the usual manner.
- Returns:
- scalendarray
The appropriate slice of frequencies.
- nuclei_to_str()¶
Returns a string with the nucleus nicely formatted in LaTeX syntax. Can be directly used with e.g. matplotlib.
- class penguins.dataset.Dataset1DProj(path, **kwargs)¶
Dataset object representing 1D projections or slices of 2D spectra, which have been generated inside TopSpin.
Inherits from:
_2D_RawDataMixin
,_1D_ProcDataMixin
,_1D_PlotMixin
, and_Dataset
.Notes
The implementation of these methods has to be different from the equivalent methods on
Dataset1D
, because the parameters (e.g. O1, SW) are read as 2-element arrays (for both dimensions) but the returned value must select the correct projection axis.- ppm_to_index(ppm)¶
Converts a chemical shift into the index which is closest to the chemical shift.
- Parameters:
- ppmfloat (optional)
The chemical shift of interest.
- Returns:
- indexint
The index, or None if ppm is None.
- ppm_scale(bounds='')¶
Constructs an
ndarray
of the chemical shifts at each point of the spectrum, in descending order of chemical shift.This is used in generating the x-values for plotting.
- Parameters:
- boundsstr or (float, float), optional
Bounds specified in the usual manner.
- Returns:
- scalendarray
The appropriate slice of chemical shifts.
- hz_scale(bounds='')¶
Constructs an
ndarray
of the frequencies (in units of Hz) at each point of the spectrum, in descending order of frequency.- Parameters:
- boundsstr or (float, float), optional
Bounds specified in the usual manner.
- Returns:
- scalendarray
The appropriate slice of frequencies.
- nuclei_to_str()¶
Returns a string with the nucleus nicely formatted in LaTeX syntax. Can be directly used with e.g. matplotlib.
- class penguins.dataset.Dataset2D(path, **kwargs)¶
Dataset object representing 2D spectra.
Inherits from:
_2D_RawDataMixin
,_2D_ProcDataMixin
,_2D_PlotMixin
, and_Dataset
.- ppm_to_index(axis, ppm)¶
Converts a chemical shift into the index which is closest to the chemical shift.
- Parameters:
- axisint
0 for f1 (indirect dimension), 1 for f2 (direct dimension).
- ppmfloat (optional)
The chemical shift of interest.
- Returns:
- indexint
The index, or None if ppm is None.
- ppm_scale(axis, bounds='')¶
Constructs an
ndarray
of the chemical shifts at each point of the spectrum, in descending order of chemical shift.This is used in generating the x- and y-values for plotting.
- Parameters:
- axisint
0 for f1 (indirect dimension), 1 for f2 (direct dimension).
- boundsstr or (float, float), optional
Bounds specified in the usual manner.
- Returns:
- scalendarray
The appropriate slice of chemical shifts.
- hz_scale(axis, bounds='')¶
Constructs an
ndarray
of the frequencies (in units of Hz) at each point of the spectrum, in descending order of frequency.- Parameters:
- axisint
0 for f1 (indirect dimension), 1 for f2 (direct dimension).
- boundsstr or (float, float), optional
Bounds specified in the usual manner.
- Returns:
- scalendarray
The appropriate slice of frequencies.
- project(axis, sign, bounds='')¶
Make a 1D projection from a 2D spectrum.
- Parameters:
- axisint or str from {0, “column”, 1, “row”}
The axis to project onto, 0 / “column” being f1 and 1 / “row” being f2. This can be very confusing, so an example will help.
Projections onto f1 will collapse multiple columns into one column. This should be done by passing
0
orcolumn
as the axis argument. For example, if you used this on a C–H HSQC, you would get a projection with <sup>13</sup>C chemical shifts.- signstr from {“positive”, “pos”, “negative”, “neg”}
The sign desired. Using
positive
(or the short formpos
) means that the greatest point along the collapsed axis will be taken, and vice versa fornegative
/neg
.- boundsstr or (float, float), optional
Bounds specified in the usual manner, representing the segment of chemical shifts that should be collapsed. That is to say, if you are projecting onto f2, then bounds would represent the section of f1 chemical shifts to collapse. If not provided, then defaults to the entire range of chemical shifts along the collapsed axis.
- Returns:
- projDataset1DProjVirtual
A
Dataset1DProjVirtual
object that is similar in every way to a typicalDataset1DProj
and can be plotted, integrated, etc. in the same manner. The actual projection can be accessed using_1D_ProcDataMixin.proc_data
, whichDataset1DProj
inherits.
- sum(axis, bounds='')¶
Make a 1D sum from a 2D spectrum.
- Parameters:
- axisint or str from {0, “column”, 1, “row”}
The axis to sum onto.
0
/column
is f1 (i.e. adding up multiple columns) and1
/row
is f2 (i.e. adding up multiple rows).- boundsstr or (float, float), optional
Bounds specified in the usual manner, representing the segment of chemical shifts that should be collapsed. That is to say, if you are projecting onto f2, then bounds would represent the section of f1 chemical shifts to collapse. If not provided, then defaults to the entire range of chemical shifts along the collapsed axis.
- Returns:
- projDataset1DProjVirtual
A
Dataset1DProjVirtual
object that is similar in every way to a typicalDataset1DProj
and can be plotted, integrated, etc. in the same manner. The actual sum can be accessed using_1D_ProcDataMixin.proc_data
, whichDataset1DProj
inherits.
- slice(axis=None, ppm=None, f1=None, f2=None)¶
Extract a 1D slice from a 2D spectrum. You must either specify both axis and ppm arguments, or f1 only, or f2 only.
- Parameters:
- axisstr from {“column”, “row”}, optional
Axis to slice along. To extract a column (i.e. at one particular value of f2), use
column
, and vice versa.- ppmfloat, optional
The chemical shift to slice at. For example, if you are extracting a column, then this would be the f2 chemical shift of interest.
- f1float, optional
slice(f1=y) is an alias for slice(axis=”row”, ppm=y). If specified, this overrules the axis and ppm keyword arguments. Cannot be used together with f2.
- f2float, optional
slice(f2=x) is an alias for slice(axis=”column”, ppm=x). If specified, this overrules the axis and ppm keyword arguments. Cannot be used together with f1.
- Returns:
- projDataset1DProjVirtual
A
Dataset1DProjVirtual
object that is similar in every way to a typicalDataset1DProj
and can be plotted, integrated, etc. in the same manner. The actual projection or sum can be accessed using_1D_ProcDataMixin.proc_data
, whichDataset1DProj
inherits.
- class penguins.dataset.Dataset1DProjVirtual(path, rr, sign=None, index_bounds=None, index=None, **kwargs)¶
Dataset representing 1D projections which have been constructed by calling the
project
,slice
, orsum
methods (or their short forms) onDataset2D
objects.This is a subclass of
Dataset1DProj
, so the available methods are exactly the same.See the __init__() docstring for implementation details.