Skip to content

PabstLab/SAS_MoCa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAS_MoCa

A stochastic analysis tool for studying the structure of lipid membrane systems using small-angle scattering (SAS).

Currently supported models

The currently supported models of Large Unilamellar Vesicles (LUVs) and protein-containing LUVs (pLUVs, or proteoliposomes) are based on a combination of the Scattering Density Profile (SDP) $^1$ and Separated Form Factor (SFF) $^2$ scattering models. Recent updates for bound-water molecules and thickness fluctuaions are included. $^3$

  • Large unilamellar vesicles (LUVs):
    • LUV_DMPC -> LUVs: DMPC/DMPG 95:5 mol/mol mixture; suspension in pure water.
    • LUV_POPC -> LUVs: POPC/POPG 95:5 mol/mol mixture; suspension in pure water.
    • LUV_POPE -> LUVs: POPE/POPG 90:10 mol/mol mixture; suspension in pure water.
  • Proteoliposomes (pLUVs):
    • pLUV_DLPC_OmpLA_RecBuf -> Hosting LUV: DLPC/DLPG 95:5 mol/mol mixture; protein: Outer membrane phospholipase A (OmpLA) monomer/dimer; suspension in 20 mM TRIS, 2mM EDTA.
    • pLUV_POPC_OmpLA_RecBuf -> Hosting LUVs: POPC/POPG 95:5 mol/mol mixture; protein: Outer membrane phospholipase A (OmpLA) monomer/dimer; suspension in 20 mM TRIS, 2mM EDTA;

Abbreviations:

  • DLPC -> 1,2-dilauroyl-sn-glycero-3-phosphatidylcholine
  • DMPC -> 1,2-dimyristoyl-sn-glycero-3-phosphocholine
  • POPC -> 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine
  • POPE -> 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine
  • DLPG -> 1,2-dilauroyl-sn-glycero-3-phosphoglycerol
  • DMPG -> 1,2-dimyristoyl-sn-glycero-3-phosphoglycerol
  • POPG -> 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoglycerol

Minimization algorithm:

  • Adaptive Thermodynamic Simulated Annealing (TSA) $^{4,5}$

Installation

The package works without installation. We recommend to run sasmoca on a dedicated conda environment. Installation instructions are in the ./conda-env folder.

Documentation

Setting up the analysis: Input file (*.yml file)

The input file contains the information needed to configure and initialize data analysis, including the data file path, minimization options, scattering model, and initial parameters. As this is a YAML file, any changes must comply with YAML syntax (e.g. indentation). You can include/remove comments and descriptions (text preceded by #).
This is how it looks like:

config:
   datafile: /filepath/filename.dat
   save-folder: MySample
   qrange: [0.005,0.6] 
   model: LUV_POPC 
   temperature-init: 400
   target-X2: 12.0
   data-binning: 10
   error-scale: 1
   iterations: 42
   processes: 24
   plot-only: off 

parameters:
    parameter_1: [4.2, off, Null, 1.0, 10.]
    parameter_3: [2.4, on, Null, 0.8, 4.8] 
    parameter_2: [420, on, 0.042, Null, Null]
    .
    .
    .
    parameter_N: [420, on, 0.042, 0, 10000]

Configuration

In the config block there are all the configuration options.

datafile
Path of scattering data. The datafile should not contain text in header and footer sections, but just numbers is 3 columns: an array for $q_i$, one for the scattering intensity $I_i$, and the last for the associated error $\sigma_i$.

save-folder
String of characters used to create the folder for saving the output (do not uses empty spaces): RES_save-folder; the position of the folder is the current working directory.

qrange
Min. and max. q-values to define the range of data to fit.

model
Scattering model to use (chosen among the available ones in Model_list.py).

temperature-init
Initial temperature of the Simulated Annealing algorithm; ideally it should be about 100 times bigger than the expected best $\chi^2$.

data-binning:
Reduce the dataset size by a factor n, keeping data point with a "frequency" of n and discarding the others.

error-scale(0..1):
Scale factor for $\sigma_i$; in the case of scaling values in [0:1), $\sigma_i$ values do not become smaller that 2% of $I_i$.

iterations
Number of iterations to build up statistics; each iteration consists of one simulated annealing run, but alone does not provide error estimations.

processes
Number of processes to handle different iterations with parallel computing; if this value is set to 1 serial computation will be used.

plot-only
on/off entry: only plotting (on) or data fitting (off).

state
Assume either 'monomer' or 'dimer' state of OmpLA (needed for PLUVs models only).

Initialize model parameters

In the second block there is the list of the parameters required by a given scattering model.

    parameter_1: [4.2, off, Null, 1.0, 10.]
    parameter_3: [2.4, on, Null, 0.8, 4.8] 
    parameter_2: [420, on, 0.042, Null, Null]
    .
    .
    .
    parameter_N: [420, on, 0.042, 0, 10000]
  • label:
    Name of the parameters, it is just a label for visualization and should be short and without spaces; use the description are after # to include a longer description if needed. The order of parameters matters: check the model to verify the order of the parameters.

  • List of values
    It contains, in the order, 1) initialization value, 2) fix/free parameter option, 3) prior information, 4) low hard-boundary, 5) high hard-boundary.

    • initialization
      Starting values to initialize the fitting routine.

    • fix/free parameter option
      set off to have a fixed value, set on to adjust the parameter during the minimization run.

    • prior information
      Set the relative $\sigma_{prioir}$ value of a Gaussian prior pdf centered at the initialization value. If the value is Null, no informative prior is set for the given parameter.

    • low and high hard-boundaries
      Set the lower and higher boundaries accessible to the adjustable parameters. The Null option is only valid if the prior information is not Null, as the lower and higher boundaries are automatically set to $\pm5\times\sigma_{prioir}$.

Note

To simplify the the initial use of SAS_MoCa see the examples folder for working examples and templates.


Run the fitting routine

To fit data o preview the outcome of the chosen scattering model, open a command-line terminal (idealyl on Linux Ubuntu), create and/or move to the folder where you want to save the results and type:

>python <path-to-SAS_MoCa>/sasmoca/sasmoca.py ./input_parameter-file.yml

Output

The fitting results (or preview) are saved in the configured folder. Here the list of saved files:

Text files

Each of the following files includes a header containing a timestamp, the version used, and a summary of the configuration used to fit the data (metadata).

  • correlations.dat
    It contains a symmetric matrix of Pearson's correlation coefficients for the adjustable parameters. This file is only created if the number of iterations is greater than 10!.

  • iterations_collection.dat
    Collection of the result parameters (means only) for each iteration. Fixed parameters are also included.

  • plot_intensity.dat
    It contains the data points used for fitting and the resulting scattering intensity; column 1: $q_i$, column 2: $I_i^{exp}$, column 3: $\sigma_i$, column 4: $I^{model}(q_i)$.

  • plot_SDP-SLD.dat
    It contains the calculate probability density for each quasi-molecular group of the (p)LUV model (SDP profile), as well as the corresponding SLD contrast profile. The $z$-axis units are $\AA$, and the SLD units are $\AA^{-2}$.

  • results_recap.dat
    Summary of the input parameter table (from name to boundaries) that includes two new columns for the results: median and MAD (median absolute deviation from the median) for each adjustable parameter. Fixed parameters are also included. Additionally, there is the list of calculated values (marked with *) such as area per lipid $A_L$, Luzzati thickness $D_B$, phosphate-to-phosphate distance $D_{pp}$ (in the case of SAXS), and estimate of number of water molecules per headgroup $n_W$.

Plots

  • plot.png
    The plot of the data and the fitted scattering curves includes: (i) a collection of all the scattering curves relative to each iteration; (ii) a plot of the data alongside the best-fit curve, calculated from the median of each adjustable parameter; and (iii) a comparison of experimental error with data-to-model discrepancy.

  • plot_histograms.png
    Histograms showing the distribution obtained for each adjustable parameter. These are only saved if the number of iterations is equal to or greater than the default value of 10, which can be modified in the settings.yml file. The plots include kernel density estimation visualisations of distributions, marks indicating the median and mean absolute deviation (MAD) values, and Gaussian prior profiles where applicable.

  • plot_histogram_X2.png
    Histograms showing the distribution obtained for the best $\chi^2$ values. The plot includes a kernel density estimation visualization of the distribution.

  • plot_SDP-SLD.png
    Calculated SDP and SLD contrast profiles. The plots include visualizations of acyl-chain thickness ($D_C$) and Luzzati length ($D_B$).

General settings

A few general settings can be tuned in the settings.yml file. The default settings are:

convergence:
   thermo: 1
   maxcount: 100000
   conv_threshold: 4
   neg_water_scale: 5
statistics:
    scale_MAD_to_std: False
    hist_mincount: 10
  • thermo : gain factor used in the temperature control of the TSA engine;
  • maxcount: maximum number of iterations within one TSA run;
  • conv_threshold: threshold factor used for the adaptive ending criterion of a single TSA run;
  • neg_water_scale: penalty factor used in the negative water check [5];
  • scale_MAD_to_std: converts MAD to standard deviation values (factor $\sim$1.48) [5];
  • hist_mincount: minimum number of iterations needed to plot histograms of adjustable parameters.

We recommend that beginner users do not change the default settings. Note that such modifications are not saved as metadata. As this is a YAML file, any changes must comply with YAML syntax (e.g. indentation).

If you use SAS_MoCa repository please cite:

License

SAS_MoCa is free and open source software, distributed under the BSD 3‑Clause “New” or “Revised” License.
For the full legal text, see the LICENSE file in this repository.

References

  1. Kučerka, N., Nagle, J. F., Sachs, J. N., Feller, S. E., Pencer, J., Jackson, A., & Katsaras, J. (2008). Lipid Bilayer Structure Determined by the Simultaneous Analysis of Neutron and X-Ray Scattering Data. Biophysical Journal, 95(5), 2356–2367. https://doi.org/10.1529/biophysj.108.132662
  2. Pencer, J., Krueger, S., Adams, C. P., & Katsaras, J. (2006). Method of separated form factors for polydisperse vesicles. Journal of Applied Crystallography, 39(3), 293–303. https://doi.org/10.1107/S0021889806005255
  3. Frewein, M. P. K., Doktorova, M., Heberle, F. A., Scott, H. L., Semeraro, E. F., Porcar, L., & Pabst, G. (2021). Structure and Interdigitation of Chain-Asymmetric Phosphatidylcholines and Milk Sphingomyelin in the Fluid Phase. Symmetry, 13(8), 1441. https://doi.org/10.3390/sym13081441
  4. de Vicente, J., Lanchares, J., & Hermida, R. (2003). Placement by thermodynamic simulated annealing. Physics Letters A, 317(5–6), 415–423. https://doi.org/10.1016/j.physleta.2003.08.070
  5. Semeraro E. F. and Pabst G., in submission
  6. Semeraro E. F. et al., in submission

About

A stochastic analysis tool for studying the structure of lipid membrane systems using small-angle scattering

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages