A stochastic analysis tool for studying the structure of lipid membrane systems using small-angle scattering (SAS).
The currently supported models of Large Unilamellar Vesicles (LUVs) and protein-containing LUVs (pLUVs, or proteoliposomes) are based on a combination of the Scattering Density Profile (SDP)
- Large unilamellar vesicles (LUVs):
- LUV_DMPC -> LUVs: DMPC/DMPG 95:5 mol/mol mixture; suspension in pure water.
- LUV_POPC -> LUVs: POPC/POPG 95:5 mol/mol mixture; suspension in pure water.
- LUV_POPE -> LUVs: POPE/POPG 90:10 mol/mol mixture; suspension in pure water.
- Proteoliposomes (pLUVs):
- pLUV_DLPC_OmpLA_RecBuf -> Hosting LUV: DLPC/DLPG 95:5 mol/mol mixture; protein: Outer membrane phospholipase A (OmpLA) monomer/dimer; suspension in 20 mM TRIS, 2mM EDTA.
- pLUV_POPC_OmpLA_RecBuf -> Hosting LUVs: POPC/POPG 95:5 mol/mol mixture; protein: Outer membrane phospholipase A (OmpLA) monomer/dimer; suspension in 20 mM TRIS, 2mM EDTA;
Abbreviations:
- DLPC -> 1,2-dilauroyl-sn-glycero-3-phosphatidylcholine
- DMPC -> 1,2-dimyristoyl-sn-glycero-3-phosphocholine
- POPC -> 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine
- POPE -> 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine
- DLPG -> 1,2-dilauroyl-sn-glycero-3-phosphoglycerol
- DMPG -> 1,2-dimyristoyl-sn-glycero-3-phosphoglycerol
- POPG -> 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoglycerol
- Adaptive Thermodynamic Simulated Annealing (TSA)
$^{4,5}$
The package works without installation. We recommend to run sasmoca on a dedicated conda environment. Installation instructions are in the ./conda-env folder.
The input file contains the information needed to configure and initialize data analysis, including the data file path, minimization options, scattering model, and initial parameters.
As this is a YAML file, any changes must comply with YAML syntax (e.g. indentation). You can include/remove comments and descriptions (text preceded by #).
This is how it looks like:
config:
datafile: /filepath/filename.dat
save-folder: MySample
qrange: [0.005,0.6]
model: LUV_POPC
temperature-init: 400
target-X2: 12.0
data-binning: 10
error-scale: 1
iterations: 42
processes: 24
plot-only: off
parameters:
parameter_1: [4.2, off, Null, 1.0, 10.]
parameter_3: [2.4, on, Null, 0.8, 4.8]
parameter_2: [420, on, 0.042, Null, Null]
.
.
.
parameter_N: [420, on, 0.042, 0, 10000]
In the config block there are all the configuration options.
datafile
Path of scattering data. The datafile should not contain text in header and footer sections, but just numbers is 3 columns: an array for
save-folder
String of characters used to create the folder for saving the output (do not uses empty spaces): RES_save-folder; the position of the folder is the current working directory.
qrange
Min. and max. q-values to define the range of data to fit.
model
Scattering model to use (chosen among the available ones in Model_list.py).
temperature-init
Initial temperature of the Simulated Annealing algorithm; ideally it should be about 100 times bigger than the expected best
data-binning:
Reduce the dataset size by a factor n, keeping data point with a "frequency" of n and discarding the others.
error-scale(0..1):
Scale factor for
iterations
Number of iterations to build up statistics; each iteration consists of one simulated annealing run, but alone does not provide error estimations.
processes
Number of processes to handle different iterations with parallel computing; if this value is set to 1 serial computation will be used.
plot-only
on/off entry: only plotting (on) or data fitting (off).
state
Assume either 'monomer' or 'dimer' state of OmpLA (needed for PLUVs models only).
In the second block there is the list of the parameters required by a given scattering model.
parameter_1: [4.2, off, Null, 1.0, 10.]
parameter_3: [2.4, on, Null, 0.8, 4.8]
parameter_2: [420, on, 0.042, Null, Null]
.
.
.
parameter_N: [420, on, 0.042, 0, 10000]
-
label:
Name of the parameters, it is just a label for visualization and should be short and without spaces; use the description are after # to include a longer description if needed. The order of parameters matters: check the model to verify the order of the parameters. -
List of values
It contains, in the order, 1) initialization value, 2) fix/free parameter option, 3) prior information, 4) low hard-boundary, 5) high hard-boundary.-
initialization
Starting values to initialize the fitting routine. -
fix/free parameter option
set off to have a fixed value, set on to adjust the parameter during the minimization run. -
prior information
Set the relative$\sigma_{prioir}$ value of a Gaussian prior pdf centered at the initialization value. If the value is Null, no informative prior is set for the given parameter. -
low and high hard-boundaries
Set the lower and higher boundaries accessible to the adjustable parameters. The Null option is only valid if the prior information is not Null, as the lower and higher boundaries are automatically set to$\pm5\times\sigma_{prioir}$ .
-
Note
To simplify the the initial use of SAS_MoCa see the examples folder for working examples and templates.
To fit data o preview the outcome of the chosen scattering model, open a command-line terminal (idealyl on Linux Ubuntu), create and/or move to the folder where you want to save the results and type:
>python <path-to-SAS_MoCa>/sasmoca/sasmoca.py ./input_parameter-file.yml
The fitting results (or preview) are saved in the configured folder. Here the list of saved files:
Each of the following files includes a header containing a timestamp, the version used, and a summary of the configuration used to fit the data (metadata).
-
correlations.dat
It contains a symmetric matrix of Pearson's correlation coefficients for the adjustable parameters. This file is only created if the number of iterations is greater than 10!. -
iterations_collection.dat
Collection of the result parameters (means only) for each iteration. Fixed parameters are also included. -
plot_intensity.dat
It contains the data points used for fitting and the resulting scattering intensity; column 1:$q_i$ , column 2:$I_i^{exp}$ , column 3:$\sigma_i$ , column 4:$I^{model}(q_i)$ . -
plot_SDP-SLD.dat
It contains the calculate probability density for each quasi-molecular group of the (p)LUV model (SDP profile), as well as the corresponding SLD contrast profile. The$z$ -axis units are$\AA$ , and the SLD units are$\AA^{-2}$ . -
results_recap.dat
Summary of the input parameter table (from name to boundaries) that includes two new columns for the results: median and MAD (median absolute deviation from the median) for each adjustable parameter. Fixed parameters are also included. Additionally, there is the list of calculated values (marked with *) such as area per lipid$A_L$ , Luzzati thickness$D_B$ , phosphate-to-phosphate distance$D_{pp}$ (in the case of SAXS), and estimate of number of water molecules per headgroup$n_W$ .
-
plot.png
The plot of the data and the fitted scattering curves includes: (i) a collection of all the scattering curves relative to each iteration; (ii) a plot of the data alongside the best-fit curve, calculated from the median of each adjustable parameter; and (iii) a comparison of experimental error with data-to-model discrepancy. -
plot_histograms.png
Histograms showing the distribution obtained for each adjustable parameter. These are only saved if the number of iterations is equal to or greater than the default value of 10, which can be modified in the settings.yml file. The plots include kernel density estimation visualisations of distributions, marks indicating the median and mean absolute deviation (MAD) values, and Gaussian prior profiles where applicable. -
plot_histogram_X2.png
Histograms showing the distribution obtained for the best$\chi^2$ values. The plot includes a kernel density estimation visualization of the distribution. -
plot_SDP-SLD.png
Calculated SDP and SLD contrast profiles. The plots include visualizations of acyl-chain thickness ($D_C$ ) and Luzzati length ($D_B$ ).
A few general settings can be tuned in the settings.yml file. The default settings are:
convergence:
thermo: 1
maxcount: 100000
conv_threshold: 4
neg_water_scale: 5
statistics:
scale_MAD_to_std: False
hist_mincount: 10
- thermo : gain factor used in the temperature control of the TSA engine;
- maxcount: maximum number of iterations within one TSA run;
- conv_threshold: threshold factor used for the adaptive ending criterion of a single TSA run;
- neg_water_scale: penalty factor used in the negative water check [5];
- scale_MAD_to_std: converts MAD to standard deviation values (factor $\sim$1.48) [5];
- hist_mincount: minimum number of iterations needed to plot histograms of adjustable parameters.
We recommend that beginner users do not change the default settings. Note that such modifications are not saved as metadata. As this is a YAML file, any changes must comply with YAML syntax (e.g. indentation).
- Semeraro E. F. & Pabst G. SAS_MoCa (Version 1.5.0) [Computer software]. https://github.com/PabstLab/SAS_MoCa
- Semeraro E. F. & Pabst G., in submission
SAS_MoCa is free and open source software, distributed under the BSD 3‑Clause “New” or “Revised” License.
For the full legal text, see the LICENSE file in this repository.
- Kučerka, N., Nagle, J. F., Sachs, J. N., Feller, S. E., Pencer, J., Jackson, A., & Katsaras, J. (2008). Lipid Bilayer Structure Determined by the Simultaneous Analysis of Neutron and X-Ray Scattering Data. Biophysical Journal, 95(5), 2356–2367. https://doi.org/10.1529/biophysj.108.132662
- Pencer, J., Krueger, S., Adams, C. P., & Katsaras, J. (2006). Method of separated form factors for polydisperse vesicles. Journal of Applied Crystallography, 39(3), 293–303. https://doi.org/10.1107/S0021889806005255
- Frewein, M. P. K., Doktorova, M., Heberle, F. A., Scott, H. L., Semeraro, E. F., Porcar, L., & Pabst, G. (2021). Structure and Interdigitation of Chain-Asymmetric Phosphatidylcholines and Milk Sphingomyelin in the Fluid Phase. Symmetry, 13(8), 1441. https://doi.org/10.3390/sym13081441
- de Vicente, J., Lanchares, J., & Hermida, R. (2003). Placement by thermodynamic simulated annealing. Physics Letters A, 317(5–6), 415–423. https://doi.org/10.1016/j.physleta.2003.08.070
- Semeraro E. F. and Pabst G., in submission
- Semeraro E. F. et al., in submission