update documentation

FelixMoelder · FelixMoelder · commit c2f510285c82 · 2025-05-28T09:07:29.000+02:00
diff --git a/config/README.md b/config/README.md
@@ -27,7 +27,8 @@ For each sample, add one or more sequencing units (runs, lanes or replicates) to
   * `fq1` and `fq2` for paired end reads. These can point to any FASTQ files on your system
   * `sra` only: specify an SRA (sequence read archive) accession (starting with e.g. ERR or SRR). The pipeline will automatically download the corresponding paired end reads from SRA.
   * If both local files (`fq1`, `fq2`) and SRA accession (`sra`) are available, the local files will be used.
-* Define adapters in the `adapters` column, by putting [cutadapt arguments](https://cutadapt.readthedocs.org) in quotation marks (e.g. `"-a ACGCGATCG -A GCTAGCGTACT"`).
+* Define adapters in the `adapters` column, by putting [fastp arguments](https://github.com/OpenGene/fastp?tab=readme-ov-file#adapters) in quotation marks (e.g. `"--adapter_sequence ACGCGATCG --adapter_sequence_r2 GCTAGCGTACT"`).
+Automatic adapter trimming can be enabled by setting the keyword `auto_trim` (Please consider the fastp documentation for paired end auto trimming). If the column is empty no trimming will be performed.
 
 Missing values can be specified by empty columns or by writing `NA`. Lines can be commented out with `#`.
 
@@ -57,7 +58,5 @@ For annotating UMIs two additional columns in `sample.tsv` must be set:
   * `fq1` if the UMIs are part of read 1
   * `fq2` if the UMIs are part of read 2
   * `both` if there are UMIs in both paired end reads
-  * the path to an additional fastq file containing just the UMI of each fragment in fq1 and fq2 (with the same read names)
-* `umi_read_structure`: A read structure defining the UMI position in each UMI record (see https://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures). If `both` reads contain a UMI, specify a read structure for both with whitespace in between (for example, `8M142T 8M142T`). In case a separate fastq file only containg UMI sequences is set the read structure needs to be `+M`.
-Read names of UMI records must match the corresponding records of the sample fastqs.
+* `umi_len`: Number of bases (UMI length) to be annotated as UMI.
 
diff --git a/config/samples.tsv b/config/samples.tsv
@@ -1,2 +1,2 @@
-sample_name	alias	group	platform	purity	panel	umi_read	umi_read_structure	datatype	calling
+sample_name	alias	group	platform	purity	panel	umi_read	umi_len	datatype	calling
 SRR702070	tumor	SRR702070_group	ILLUMINA	1.0				dna	variants
diff --git a/workflow/rules/common.smk b/workflow/rules/common.smk
@@ -371,7 +371,12 @@ def get_fastp_adapters(wildcards):
     try:
         adapters = unit["adapters"]
         if isinstance(adapters, str):
-            return adapters
+            # Autotrimming is enabled by default.
+            # Therefore no adapter parameter needs to be passed.
+            if adapters == "auto_trim":
+                return ""
+            else:
+                return adapters
         return ""
     except KeyError:
         return ""
diff --git a/workflow/scripts/split-call-tables.py b/workflow/scripts/split-call-tables.py
@@ -173,6 +173,7 @@ def variants(
             self.pos = pos
             self._variants = self._load_variants()
         for variant in self._variants:
+            print(variant.pos)
             if variant.pos == pos and variant.alts[0] == alt:
                 yield variant
             if variant.pos > pos:
@@ -195,6 +196,7 @@ def annotate_row(self, row):
         )
 
     def _load_variants(self):
+        print(self.pos)
         return self.bcf.fetch(str(self.contig), self.pos - 1, self.end)
 
     @property

Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,2 @@`
`1`		`-sample_name alias group platform purity panel umi_read umi_read_structure datatype calling`
	`1`	`+sample_name alias group platform purity panel umi_read umi_len datatype calling`
`2`	`2`	`SRR702070 tumor SRR702070_group ILLUMINA 1.0 dna variants`