Toggle navigation
Pan-genome Explorer
Home
Import genomes
Doc
Project:
Xanthomonas_oryzae
Anaplasmataceae
Overview
Search
Synteny
Clusters
Gene Search
Circos
Clinker
How to import new genomes?
(1) Give a name to your dataset
For example, you may write the name of the bacteria species that will be analyzed. This project name must be alphanumeric and must not contain space. Once the analysis done, the project will appear in the drop-down list of the main menu.
(2) Enter a list of Genbank identifiers
It must be a comma separated list of identifiers. It can be either GenBank accession number of genomes (ex: CP001079, CP000235) or GenBank assembly identifiers (ex: GCA_000011945, GCA_000024505). You may refer to the section below to retrieve GenBank assembly ids of your favorite organism.
Note that genomes must be completely annotated.
(3) Enter your email
You will be notified by email when the analysis is done, and will receive the URL to access to your results and data
(4) Choose the software for pan-genome analysis
Three softwares have been implemented in the workflow:
PGAP (
Zhao et al., 2012
)
Roary (
Page et al., 2015
)
PanACoTA (
Perrin et al., 2021
)
(5) Click the button for checking ids
Each of the GenBank identifiers will be checked for compatibility (assembly and annotation status)
(6) Click the Submit button
If everything is OK, a Submit button appears. Click on it to finally send your list to the workflow.
How to get the list of available genomes from GenBank?
In order to define the list of genbank identifiers, you can refer to this page which allows to extract the genomes available at NCBI:
https://www.ncbi.nlm.nih.gov/datasets/genomes
(1) Enter the name of organism
(2) Ideally, restrict the assembly level to "chromosome" and "complete". However, the process also accept draft genomes.
(3) Restrict to genomes that are annotated
(4) Check all genomes and export as a table
(5) Open the Excel file and sort by "Assembly Accession". Get all GCA accessions
How to browse pangenome results?
(1) Distribution of core-genome and accessory genome
A pie-chart shows the distribution of respective percentage of genes that compose core-genome, dispensable genome or strain-specific genes.
By clicking on Display "Distribution of strain-specific genes", a focus allows to display as a bar chart the number of strain-specific genes for each genome
(2) Interactive presence/absence matrix
a) Zoom panel for zooming horizontally on gene clusters or vertically on strains
b) Metadata about strains. It includes by default, the genus, the organism name, the country and continent (extracted from GenBank file).
c) Metadata about strains and gene clusters when the mouse passes over cells of the matrix.
d) Clicking on a specific cell of the matrix conducts to the Cluster search in order to get more information about this cluster.
e) Cluster details: list of genes included in the cluster, their sequences and alignment.
A distance-based phylogeny is performed if the number of genes is below 160
(3) Dispensable genome as an Upset diagram
This section allows to represent the size of the different intersections within the accessory genome, from most abundant to least abundant.
Each column corresponds to an intersection between strains, and bar charts on top show the size of this intersection. Each row corresponds to a possible intersection: the filled-in cells show which strain is part of an intersection.
Only the first 20 most abundant intersections are shown.
The SVG is zoomable by double-click or mouse scroll.
How to evaluate synteny between genomes?
(1) Choose three genomes to be compared among the list of genomes available in the project
Each genome has a specific color (blue,red,green) that will be used in the HivePlot.
(2) Click on Submit
(3) A Hive plot shows the links between core-genes projected on each genome
The HivePlot representation is a way to visualize the conservation of gene order between 3 pre-selected genomes of a dataset. Genes are connected by links between these 3 genomes, if they are composing the core-genome. Links are thus materialized only by core-genes because they can be connected between genomes, using their respective physical order in each genome. Each axis corresponds to a genome and is colorized with the corresponding color of the drop-down list of genomes.
This section allows to visually identify translocation (right blue arrow) or inversion (left blue arrow) events.
(4) Mauve viewer
In the same way, the Mauve viewer allows to display horizontally gene order conservation between the three selected genomes.
(5) Focus on a specific region by zooming
The Mauve viewer allows to zoom in on a specific region of interest. It gives access to the annotation of the strain (gene names, etc).
Poster/Communications