Why use CulebrONT ?

Assembly, circularisation, polishing and correction steps are included in CulebrONT, and can be activated (or not) according to user’s requests. The most commonly used tools in the community for each step are integrated, as well as various quality control tools. CulebrONT also generates a report compiling information obtained at each step.

From assembly to correction

CulebrONT is really flexible to assembly and circularise (or not) assembled molecules as well as polish and correct assemblies. You can give directives on config.yaml file to CulebrONT to generate a modular pipeline.

Warning

Logically, you must launch at least one of assemblers included in CulebrONT and pipe assembly with circularisation, polishing and correction steps as well as with the quality control pipeline.

Assembly

CulebrONT includes (at the moment) six recent and community-validated assemblers.

Included tools :

Optional circularisation

Using CulebrONT you can activate or deactivate circularisation steps. Typically, if you are interested on eukaryotic organims, circularisation is not necessary, use CIRCULAR=False on config.yaml file. Option circular on the configuration file is key on the workflow framework.

Directed acyclic graphs (DAGs) show the differences between deactivated (CIRCULAR=False):

CIRCULAR_FALSE

and activated CIRCULAR step on configuration file (CIRCULAR=True):

CIRCULAR_TRUE

If an assembled molecule is circular, e.g. for bacteria, this molecule is tagged and will be treated specially in pipeline. We implemented tagging and rotation of circular molecule before each polishing step, and we fixing start position on circular genome. This is efficient when multiple genome alignments are envisaged.

Note

  • If circular step is activated, the –plasmids option on Flye is used.
  • Ciclator tool is used to circularise assemblies from Canu and Smartdenovo. Circlator will attempt to identify each circular sequence and output a linearised version from each of them.
  • On circularisation step performed by circlator, trimmed corrected fastq files obtained by Canu are used by circlator. Raw fastq files are used directly for others assemblers.
  • Circularisation for Miniasm is performed by minipolish.
  • Circularisation for Miniasm, Raven and Shasta is checked using generated GFA files on a special tag_circular step, tagging circular fasta sequences.

OPTIONAL FIXING START

Only if the circular step is activated, a fixstart step is performed before the quality control. This step uses circlator tool with the option fixstart to rotate circular sequences. Rotation is done using the start dnaA gene (if found). This is important when multiple alignments are envisaged.

Fixstart is always performed on the last draft sequenced obtained on the pipeline. In others words:

Note

  • Fixstart is launched after the assembly step if only assembly is activated.
  • Fixstart is launched after the polishing step if only assembly+polishing are activated.
  • Fixstart is launched after correction if assembly+polishing+correction or assembly+correction are activated

Warning

In any case, Fixstart will be deactivated if CIRCULAR is False

Included tools :

Polishing

Polishing step is ensured by Racon. Racon corrects raw contigs generated by rapid assembly methods with original ONT reads. Choose how many rounds of Racon you want to perform (constrains from 1 to 9 rounds), and CulebrONT will recursively do it for you. Generally 2 to 4 iterations are the best choices.

Note

  • Minipolish includes racon polishing, so if polishing is deactivated for the others assemblers (flye, shasta, raven, smartdenovo and canu) miniasm will polish anyway, please take it into account to comparisons.
  • Raven parameter -p (for polishing) is by default 0, racon is performed on CulebrONT to control rotation of circular molecule before every racon step.

Included tools :

Correction

Correction can improve the consensus sequence for a draft genome assembly. We include Nanopolish and Medaka on correction steps. With CulebrONT you can train a Medaka model and use it directly to obtain a consensus from you favorite organism.

Note

  • We have included a split on segments of the assembled molecule before nanopolish and medaka. Each segment is polished on parallel to improve speed and gain time. Segments polished are merged subsequently. CulebrONT has implemented parallelism following medaka documentation and nanopolish practices.

If you have short reads, you can now use Pilon to correct assemblies. As racon, using CulebrONT several recursive rounds of Pilon can now be run !

Included tools :

Quality on assemblies

A variety of useful tools exist to check high accuracy assemblies.

QUALITY

CulebrONT checks the quality of the assemblies with using these optional tools:

Note

  • BUSCO : helps to check if you have a good assembly, by searching the expected single-copy lineage-conserved orthologs in any newly-sequenced genome from an appropriate phylogenetic clade.
  • QUAST : is a good starting point to help evaluate the quality of assemblies, providing many helpful contiguity statistics.
  • Bloobtools : allows to detect contamination on assembled contigs
  • Assemblytics to compare contiguity of the assemblies against a reference genome
  • KAT to explore kmers frequencies and check possible contamination
  • Samtools flagstats calculates remapping stats using illumina reads over obtained assemblies
  • Weesam can be used to check the coverage of the reads for each assembled contig (for small genome only).
  • Multiple alignment of several assembles is performed using Mauve (for small genome only).

Danger

Please, only activate these last two tools for small genomes only.

Included tools :

  • BUSCO version >= 4.0.5
  • QUAST version >= 5.0.2
  • Bloobtools version >= 1.1.1
  • Assemblytics version >= 1.2
  • KAT version >= 2.4.2
  • Samtools version>= 1.10
  • Weesam version > 1.6
  • Mauve > 2.4.0.snapshot_2015_02_13