OrthoSelect
Orthologs in Phylogenomics!
Navigation
Where you are
View ortholog groups.Useful Links
View the documentation for this page hereor get to know OrthoSelect by taking a look at theTutorial
OrthoSelect Documentation
Input
Project Details
Expert Options
Sequence Upload
Name association
Gene selection
Results
Results Overview
Statistics
Annotations
Annotations Overview
Annotations List
Orthologous Groups
Difference btw. all and best orthologous groups
List of sequences in orthologous group
Single Sequences
Download
Expert Options
Sequence Upload
Name association
Gene selection
Statistics
Annotations
Annotations Overview
Annotations List
Orthologous Groups
Difference btw. all and best orthologous groups
List of sequences in orthologous group
Single Sequences
Download
Input
The following will explain how parameters and sequences can be entered and what additional options are available.
Parameter settings & User details
An overview of the web page for entering user details and selecting parameters.
Project Details

The form requires you to provide a name and a email-address.
Make sure the email-address is correct, because you will receive an email after the analysis has finished.
Default Options

By default, you can use OrthoSelect with the provided parameters. OrthoSelect uses a cut-off value of 1e-10 for the initial BLAST searches. If the translated sequences is shorter than 20 amino acids, the sequence will be rejected.
Based on the custom distance matrix one species will be selected as the one most likely to orthologous for each resulting orthologous group. The sequences will then be aligned using muscle. In the final step, Gblocks is used to remove poorly aligned regions from the alignments and those sequences with a character content of less than 50% are rejected. To change those parameters click on "Expert mode"
Expert Options

The "Expert mode" allows you to change the default parameters of OrthoSelect.
Note that the option "Gene Selection" can be selected after you have uploaded sequences in the next step.
Sequence Upload

To upload sequences, simply search your computer clicking the "browse"-button and uploading the file. You will see a list of currently uploaded sequences after an successfull upload.
Make sure that the sequences are in fasta format and that the fasta header is in a format readeable for OrthoSelect.
A correct format is ">accession_number|anything", e.g. ">NC98472|predicted catalase | Homo sapiens". The important part is before the "|". The remaining part will be ignored. Make sure to use only accession numbers containing digits or characters.
The files containing the sequences should be named after the species they belong to. Sequences from "Drosophila melanogaster" should be saved in a file "Drosophila_melanogaster.fa".
The following picture displays the general concept of sequence uploads.

The files should contain the taxon name and the first part of the fasta header should be an accession number.
Name association

Given that the file name equals the species name (see Section "Sequence Upload"), OrthoSelect will suggest a shortcut for the species name. The shortcut will have a typical 10-character phylip format. The shortcut for "Drosophila melanogaster" will then be "Drosoph_me". The user can change the shortcut to whatever he/she likes.
Gene Selection

The selection of those orthologous groups having at least one member of a pre-defined set of species or monophyla present is optional. To pre-define a set of species, simply enter a "1" in the corresponding fields under "Present".
To pre-define monophyla, simply assign the same number to those sequences that build a monophylum. Using different numbers, you can pre-define multiple monophyla. The effect is, that you will get a subset of orthologous groups having at least one sequences from one member from every monophylum present.
Results
You will receive an email with a link after the analysis has finished.
Results Overview

The overview page of the results section. After receiving an email, the provided link will lead to this page. This will be the starting page for all investigations of the results.
The user can use the navigation panel to easily access the most important results subsections.
These sections include an overview of all annotations per species, an overview of those orthologous groups that have at least sequences from three different species or have been selected during the "gene-selection step" (Best orthologous groups) and an overview of all orthologous groups sequences have been assigned to (All orthologous groups).
Statistics

The statistic page gives an overview of the functional classification of all sequences under study. The functional classification pie chart gives an overview of functional classes sequences have been assigned to. The single letter code corresponds to that used in NCBI's KOG database here.
Taxa present/absent table

The gene/taxa table gives information about the presence and absence of sequences for each species and gene. The table above lists those OGs that have a minimum of one sequence for at least two species present and/or were selected according to the species defined by the user for "gene-selection". OGs that do not match that criteria are marked as grey

This table simply shows all OGs. Even those for which there were only hits for one species.
Annotations

An overview of all sequences for each species in the analysis. Clicking the link leads to an overview page for that species
Annotations Overview

This overview shows a summary for the annotations a species under study. The overview shows the proportion of sequences that could be assigned to an OG and the functional classifications.
Annotations List

The list of sequences from a species that could be assigned to OG along with information about annotation, e-value, method used for translating the sequence, and e-value for that translation with the closest hit from the orthologous database.
Orthologous Groups

List of all orthologous groups with at least one sequence from two different species and/or those orthologous groups that have been selected according to the presence of species defined by the user.
Group Summary

The overview page for an OG. Here you can find information about the composition of taxa for this orthologous group. The information include the number of sequences, the number of different taxa, the annotation for this OG along with its functional class and an overview of the taxa for this group.
Difference btw. all and best orthologous groups
OrthoSelect outputs two sets of orthologous groups. One ("All orthologous groups") that contains all orthologous groups (as defined by the KOG database) with sequences assigned to and the second ("Best orthologous groups"), which is a subset. It contains only those orthologous groups with at least three taxa present and/or those orthologous groups that where selected during the gene selection step. Note that the "Best orthologous groups" contain only one that sequence from each taxon most likely to be orthologous as well as computed sequence alignments. In contrast, the "all orthologous groups" can contain more than one sequence per taxon. No sequence alignments have been computed for these groups.
View Sequences

For each OG, the following files are available:
(A) All sequences assigned to this group (nucleotides, unaligned)
(B) All sequences assigned to this group (proteins, unaligned)
(C) The alignment were the most probable orthologous/species has been selected
(D) same as (C), but with eliminated poorly aligned regions using Gblocks
(E) same as (D), but without sequences that are too short (according to given threshold)
List Sequences

Overview of all sequences that have been assigned to this OG. For each sequences the table includes information about the expectation value from the initial blast search, the accession number from the best hit, the translation method used for translating the sequence along with expectation values from a bl2seq alignment between the best hit and the input sequence.
single sequence view.
Single Sequence

The single sequence view provides information to which orthologous group the sequence has been assigned, the lengths of the input sequence and the translated protein sequences, as well as the method used for translation along with expectation values of the bl2seq alignment between the input sequence and the best hit from the initial blast search.
Download

Here you can download your results. The folder structure of the results is described in the manual.