OrthoSelect
Orthologs in Phylogenomics!

Navigation

Home

Where you are

View ortholog groups.

Useful Links

View the documentation for this page here
or get to know OrthoSelect by taking a look at theTutorial

OrthoSelect Documentation




Input


The following will explain how parameters and sequences can be entered and what additional options are available.


Parameter settings & User details



An overview of the web page for entering user details and selecting parameters.
Index

Project Details




The form requires you to provide a name and a email-address.
Make sure the email-address is correct, because you will receive an email after the analysis has finished.
Index

Default Options




By default, you can use OrthoSelect with the provided parameters. OrthoSelect uses a cut-off value of 1e-10 for the initial BLAST searches. If the translated sequences is shorter than 20 amino acids, the sequence will be rejected.
Based on the custom distance matrix one species will be selected as the one most likely to orthologous for each resulting orthologous group. The sequences will then be aligned using muscle. In the final step, Gblocks is used to remove poorly aligned regions from the alignments and those sequences with a character content of less than 50% are rejected. To change those parameters click on "Expert mode"
Index

Expert Options




The "Expert mode" allows you to change the default parameters of OrthoSelect.
Note that the option "Gene Selection" can be selected after you have uploaded sequences in the next step.
Index

Sequence Upload




To upload sequences, simply search your computer clicking the "browse"-button and uploading the file. You will see a list of currently uploaded sequences after an successfull upload.
Make sure that the sequences are in fasta format and that the fasta header is in a format readeable for OrthoSelect.
A correct format is ">accession_number|anything", e.g. ">NC98472|predicted catalase | Homo sapiens". The important part is before the "|". The remaining part will be ignored. Make sure to use only accession numbers containing digits or characters.
The files containing the sequences should be named after the species they belong to. Sequences from "Drosophila melanogaster" should be saved in a file "Drosophila_melanogaster.fa".
The following picture displays the general concept of sequence uploads.

The files should contain the taxon name and the first part of the fasta header should be an accession number.

Index

Name association




Given that the file name equals the species name (see Section "Sequence Upload"), OrthoSelect will suggest a shortcut for the species name. The shortcut will have a typical 10-character phylip format. The shortcut for "Drosophila melanogaster" will then be "Drosoph_me". The user can change the shortcut to whatever he/she likes.
Index

Gene Selection




The selection of those orthologous groups having at least one member of a pre-defined set of species or monophyla present is optional. To pre-define a set of species, simply enter a "1" in the corresponding fields under "Present".
To pre-define monophyla, simply assign the same number to those sequences that build a monophylum. Using different numbers, you can pre-define multiple monophyla. The effect is, that you will get a subset of orthologous groups having at least one sequences from one member from every monophylum present.

Results


You will receive an email with a link after the analysis has finished.
Index

Results Overview




The overview page of the results section. After receiving an email, the provided link will lead to this page. This will be the starting page for all investigations of the results.
Index

Results-Navigation




The user can use the navigation panel to easily access the most important results subsections.
These sections include an overview of all annotations per species, an overview of those orthologous groups that have at least sequences from three different species or have been selected during the "gene-selection step" (Best orthologous groups) and an overview of all orthologous groups sequences have been assigned to (All orthologous groups).

Statistics




The statistic page gives an overview of the functional classification of all sequences under study. The functional classification pie chart gives an overview of functional classes sequences have been assigned to. The single letter code corresponds to that used in NCBI's KOG database here.
Index

Taxa present/absent table




The gene/taxa table gives information about the presence and absence of sequences for each species and gene. The table above lists those OGs that have a minimum of one sequence for at least two species present and/or were selected according to the species defined by the user for "gene-selection". OGs that do not match that criteria are marked as grey

This table simply shows all OGs. Even those for which there were only hits for one species.
Index

Annotations




An overview of all sequences for each species in the analysis. Clicking the link leads to an overview page for that species

Index

Annotations Overview




This overview shows a summary for the annotations a species under study. The overview shows the proportion of sequences that could be assigned to an OG and the functional classifications.
Index

Annotations List




The list of sequences from a species that could be assigned to OG along with information about annotation, e-value, method used for translating the sequence, and e-value for that translation with the closest hit from the orthologous database.
Index

Orthologous Groups




List of all orthologous groups with at least one sequence from two different species and/or those orthologous groups that have been selected according to the presence of species defined by the user.
Index

Group Summary




The overview page for an OG. Here you can find information about the composition of taxa for this orthologous group. The information include the number of sequences, the number of different taxa, the annotation for this OG along with its functional class and an overview of the taxa for this group.
Index

Difference btw. all and best orthologous groups


OrthoSelect outputs two sets of orthologous groups. One ("All orthologous groups") that contains all orthologous groups (as defined by the KOG database) with sequences assigned to and the second ("Best orthologous groups"), which is a subset. It contains only those orthologous groups with at least three taxa present and/or those orthologous groups that where selected during the gene selection step. Note that the "Best orthologous groups" contain only one that sequence from each taxon most likely to be orthologous as well as computed sequence alignments. In contrast, the "all orthologous groups" can contain more than one sequence per taxon. No sequence alignments have been computed for these groups.
Index

View Sequences




For each OG, the following files are available:
(A) All sequences assigned to this group (nucleotides, unaligned)
(B) All sequences assigned to this group (proteins, unaligned)
(C) The alignment were the most probable orthologous/species has been selected
(D) same as (C), but with eliminated poorly aligned regions using Gblocks
(E) same as (D), but without sequences that are too short (according to given threshold)

Index

List Sequences




Overview of all sequences that have been assigned to this OG. For each sequences the table includes information about the expectation value from the initial blast search, the accession number from the best hit, the translation method used for translating the sequence along with expectation values from a bl2seq alignment between the best hit and the input sequence.
single sequence view.
Index

Single Sequence





The single sequence view provides information to which orthologous group the sequence has been assigned, the lengths of the input sequence and the translated protein sequences, as well as the method used for translation along with expectation values of the bl2seq alignment between the input sequence and the best hit from the initial blast search.
Index

Download




Here you can download your results. The folder structure of the results is described in the manual.