Tools for systematics: 
Introductory glossary of cladistic terms

Michael D. Crisp
Botany and Zoology, School of Biology
Australian National University, Canberra ACT 2601


    For application to characters, see ordered. As applied to trees, it refers to whether distances measured along the branches of the tree add up to the observed distances (from a matrix of pair-wise distance comparisons among terminal taxa).
    The determination of positional homology for molecular sequences, involving the juxtaposition of amino acids or nucleotides in homologous molecules.
    See phylesis.
    See homoplasy.
Apomorphy (adj. apomorphic or apomorphous)
    A relatively derived or advanced or unique character state (cf. autapomorphy, synapomorphy, plesiomorphy, symplesiomorphy).
Area cladogram
    A tree that displays historical relationships among geographic areas, rather than phylogenetic relationships among taxa.
    The possession by an organism of a particular feature, e.g. this tree is rough-barked, that tree is half-barked (cf. character and character states).
    An apomorphy in a terminal taxon; diagnoses the terminal but is uninformative about relationships to other terminals; therefore of no use for cladistic tree-building.
back to top
    A character type with only two states (usually given as 0, 1), in which a change in either direction between the states is one step (cf. ordered, unordered, Dollo, irreversible).
back to top
    Any heritable attribute of organisms that varies among terminal taxa, and so is useful for phylogenetic reconstruction.
Character states
    Subdivisions of the variation among terminal taxa.
    A monophyletic group (= a branch on a cladogram, diagnosed by at least one synapomorphy).
    The evolutionary splitting of lineages, i.e. speciation (cf. phylesis).
    A branching diagram (tree) assumed to be an estimate of a phylogeny (cf. phylogram, dendrogram, phenogram).
    Arranging organisms into named groups (taxa), whether natural or artificial (see systematisation).
Congruence (adj. congruent)
    Agreement, as between characters and a tree, or between the topologies (shapes) of two trees, e.g. derived from different data sets, such as molecular and morphological. Some authors like to make separate phylogeny estimates from different data sets, and then test their congruence (cf. total evidence).
    A class of methods used to estimate the amount of agreement among incongruent or partially congruent trees. Usually represented as a tree that is less resolved than any of the input trees. (There are also consensus statistics.) A consensus tree is not an hypothesis of evolutionary history, and thus must not be confused with a phylogenetic tree. Therefore, it should not be used to trace evolution of characters, areas (biogeography), and so on. Most commonly used is the strict consensus tree, which shows only those clades that are common to all the input trees; a majority-rule consensus tree shows all clades that are found in > 50% of the input trees.
Consistency index (CI)
    A measure of the parsimony fit of a character to a tree, or of the average fit of all characters to a tree. Varies from 1.0 (perfect fit) to a value asymptotically approaching zero (poorest fit). It is inflated by autapomorphies which can only take the value 1.0; thus a totally uninformative data set (consisting only of autapomorphies) could return a CI equal to 1.0 (cf. retention index).
    See homoplasy.
back to top
Daughter taxa
    See sister groups.
    Any branching diagram (or tree) (cf. cladogram, phylogram, phenogram).
    Usually treated as a measure of evolutionary divergence, i.e. phylogenetic distance increases with increasing evolutionary divergence. Distances are usually expressed pair-wise among the terminal taxa, and can be calculated based on a specified evolutionary model; the model specifies the probabilities of character-state changes through evolutionary time. Distances are popular for building phylogenetic trees from molecular sequence data (cf. maximum likelihood, parsimony).
    A character type in which numerically increasing changes are allowed but each such change can only happen once on a tree; thus, multiple reverse changes (= losses) are allowed. This character type is favoured by those who feel that a complex structure (e.g. the insect wing) can only originate once, although it may be lost many times. This character type has been suggested for DNA restriction site data, because gain of a new site is much more improbable than loss of an existing one. By definition, a Dollo character is polarised in advance, making the use of an outgroup redundant (cf. ordered, unordered, irreversible).
back to top
Exact method
    Any analysis method that guarantees to find the optimal solution. For tree-building, the branch-and-bound strategy is a computationally-efficient exact method for finding the optimal tree that does not involve examining every possible tree (cf. heuristic method).
back to top
Gene tree
    A phylogeny of a gene, which may or may not accurately reflect the phylogeny of the organisms possessing that gene (see orthology).
back to top
Heuristic method
    Any analysis method involving computationally-efficient strategies that should produce a solution at least close to the optimal one even if it doesn't find the optimum (cf. exact method).
Homology (adj. homologous)
    Similarity due to common evolutionary origin, i.e. derived from the same ancestral character; thus, equivalent to synapomorphy. Morphologists also define homology by common developmental origin, which is quite a different concept, being based on a different process, although empirically the two homologies may be congruent. Non-cladists like to include symplesiomorphy in their concept of homology.
Homoplasy (adj. homoplastic or homoplasious)
    Similarity due to independent evolutionary change. Thus, homoplasy is a mistaken hypothesis of homology, which will confound cladistic analyses. Homoplasy is either parallelism (= independent gain) or reversal (= loss). Convergence (= analogy) is sometimes distinguished from parallelism, although the distinction may be arbitrary (and in practice the difference may be irrelevant). Convergent features are derived from distantly-related ancestors, e.g. the wings of bats and birds, or succulence in Cactaceae and Euphorbiaceae (i.e. independent evolution derived by a different mechanism, thus leading to superficial similarity). Parallelisms derive from closely-related ancestors, e.g. the nucleotide A derived independently in two descendant lineages from the same C in the same position in a DNA sequence in a common ancestor (i.e. independent evolution using the same mechanism). Convergent features can usually be distinguished by detailed examination (e.g. differences in internal anatomy), whereas in the nucleotide example this would be impossible.
back to top
    Refers to the part of the data that is actually used by a particular method for building trees (cf. uninformative).
    The study group whose phylogeny is being reconstructed (cf. outgroup).
Irreversible (Camin-Sokal)
    A character type in which numerically increasing changes are allowed and counted as for ordered characters, while decreasing changes are not allowed (i.e. counted as an infinite number of steps); thus, multiple reverse changes (= losses) are not allowed. By definition, an irreversible character is polarised in advance, making the use of an outgroup redundant. This character-type is very rarely used, as the assumption of irreversibility is very difficult to justify for any type of data, morphological or molecular. It was proposed by E.O. Wilson (1965, Systematic Zoology 14:214-220), with examples of its application (cf. ordered, unordered, Dollo).
back to top
    An historical sequence of ancestors and descendants.
back to top
Maximum likelihood
    One of several criteria that may be optimised in building phylogenetic trees from molecular sequence data. The optimal tree is the one that maximises the statistical likelihood that the specified evolutionary model produced the observed character-state data; the models specify the probabilities of character-state changes through evolutionary time (cf. distance, parsimony).
Monophyly (holophyly) (adj. monophyletic, holophyletic)
    On a phylogeny, a monophyletic group has a unique origin in a single ancestral species, and includes the ancestor and all of its descendants. It is recognised by a homologous character state (synapomorphy) in all of its members (cf. paraphyly, polyphyly).
back to top
    See unrooted tree.
    A branch-point on a tree / cladogram.
    See unordered.
back to top
Ordered (additive)
    A character type with > 2 states that follow an evolutionarily plausible sequence, e.g. petals many -> 5 -> 3 -> 0. Changes between adjacent states are counted as one step and changes between non-adjacent states are counted as (1 + no. of skipped states), e.g. from 5 petals to 0 (or vice versa) would be 2 steps (cf. unordered, Dollo, irreversible).
    True homology of molecular sequences, i.e. descended in toto from the same ancestral sequence. Orthologous sequences exist in only one copy per organism, and can accurately reflect the phylogenetic relationships of species (cf. paralogy, plerology, xenology).
    A terminal taxon (or group of taxa), preferably the sister-group of the ingroup, that is used to root a cladogram (cf. ingroup). The root is placed between the outgroup(s) and the ingroup. Multiple outgroups may be used.
back to top
    See homoplasy.
    Paralogous molecular sequences result from gene duplication (independent of organism speciation), exist in multiple copies per organism, and will reconstruct gene phylogeny rather than species phylogeny (which may not be congruent) (cf. orthology).
Paraphyly (adj. paraphyletic)
    A paraphyletic group originates from a single common ancestor, which is included in the group, but does not include all of the descendants of that ancestor (cf. monophyly, polyphyly). Its members share only ancestral character states (symplesiomorphies); they do not uniquely share any synapomorphies.
    One of several criteria that may be optimised in building phylogenetic trees, but a philosophically important one due to its simplicity; and the basis of the most-commonly used method of cladistic analysis, at least for morphological data. The central idea of cladistic parsimony analysis is that some trees will fit the character-state data better than other trees. Fit is measured by the number of evolutionary character-state changes implied by the tree. The fewer changes the better, e.g. there is no sense in choosing a phylogeny that has roots, flowers and xylem each evolving twice, if another tree exists on which one evolutionary origin for each of the apomorphic states would explain the observed distribution of states across taxa(cf. distance, maximum likelihood).
    Similarity of characters without regard to the distinction between synapomorphy, homoplasy and symplesiomorphy. Phenetic methods are poor at reconstructing phylogeny.
    A branching diagram (tree) showing the phenetic similarity among the terminal taxa (cf. cladogram, phylogram, dendrogram).
Phylesis (anagenesis)
    Evolutionary events that modify a taxon without causing speciation (cf. cladogenesis).
    The unique historical relationship (resulting from evolution) among terminal taxa, represented as a tree (cf. cladogram).
    A branching diagram (tree) assumed to be an estimate of a phylogeny; usually distinguished from a cladogram in that the branch lengths are proportional to the amount of inferred evolutionary change (cf. cladogram, phenogram, dendrogram).
    Partial homology of molecular sequences resulting from an inter-mixture of exons and introns; will only reconstruct a composite gene history (cf. orthology).
    A relatively primitive or ancestral character state (cf. apomorphy).
    Evolutionary ordering of character states, determined either independently of tree construction (direct method) or more usually from a rooted phylogenetic tree (indirect method).
Polyphyly (adj. polyphyletic)
    A polyphyletic group does not include a unique common ancestor, i.e. it has multiple evolutionary origins. This concept is best restricted to groups of hybrid origin, e.g. eukaryotes, allopolyploids; otherwise, the distinction from paraphyly is somewhat arbitrary, since inclusion / exclusion of the ancestor would be the only difference (cf. monophyly, paraphyly).
Polytomy (polychotomy)
    A branch-point in a tree with more than two descendant branches. A polytomy referred to as "hard" results from absence of data to resolve branching dichotomously, and may be interpreted as multiple speciation. A polytomy referred to as "soft" reflects uncertainty resulting from conflict (incongruence) among two or more fully-resolved cladograms.
back to top
Retention index (RI)
    Similar to the consistency index, but defined so that the highest possible value for any character is 1.0 and the lowest is 0.0; removes bias due to autapomorphies (cf. consistency index).
Reversal (= loss)
    Evolutionary reversion from an apomorphic to a plesiomorphic character state (cf. homoplasy).
Rooted tree
    A cladogram with a hypothetical ancestor, which equates to the root, which is the node at the base of the tree. When outgroups are used, this is the node that connects the outgroups to the ingroup, and which thus specifies the direction of evolutionary change among the character-states (cf. unrooted tree).
back to top
Sister groups (or taxa)
    The descendant branches from a node on a cladogram. In a phylogeny, the descendants of an ancestor are called daughters, while the siblings after a speciation event are called sisters (so a descendant is a daughter relative to its ancestor and is a sister relative to its other sibling). Note that if either of the daughters undergoes further speciation then the sister to a particular terminal taxon may actually be a group of terminal taxa.
    A plesiomorphy shared by two or more terminal taxa, only diagnostic of a paraphyletic group (cf. synapomorphy).
    An apomorphy shared by two or more terminal taxa; thus diagnoses a clade or monophyletic group (see also homology).
    The evolutionary splitting of lineages.
    Difficult to define rigorously in two or three lines. Defined very simply in a phylogenetic context, species are the smallest lineages that are mutually exclusive of other lineages. The internal branches of a phylogeny may be viewed as ancestral species. Note, however, that the unit lineages of a gene phylogeny are not species (see also terminal).
    A single character-state change.
    Reconstructing natural (i.e. phylogenetic) relationships among organisms (cf. classification).
back to top
Taxon (pl. taxa)
    A named group of organisms, not necessarily a natural (monophyletic) unit (cf. terminal).
Terminal (terminal taxon)
    One of the units whose collective phylogeny is reconstructed; in other words, the undivided tips of a tree (usually contemporary taxa). Terminals may be higher taxa, species, populations, individuals, fossils or even genes. There should be some rational basis for accepting the integrity of each terminal (for the purpose of the analysis), e.g. a monophyletic or diagnosable unit. Despite the claims by some authors, terminals do not need to be monophyletic; in fact, many species-level terminals are unavoidably paraphyletic. However, higher taxa used as terminals should be monophyletic.
    The branching sequence of a tree.
Total evidence
    Reconstructing phylogeny by analysing combined data of different kinds, e.g. morphology and gene sequences. A controversial issue, because gene phylogenies may be incongruent with organismal phylogenies (cf. congruence).
    Mathematically, an acyclic (cycle-free) line graph. Used to represent the evolutionary history of a set of taxa, with the leaves (or terminal branches) representing contemporary taxa and the internal branches representing hypothesised ancestors (see also rooted tree, unrooted tree).
back to top
    All tree-building methods discard some data, and therefore such data are "uninformative" for building trees using that method. For instance, in parsimony methods only characters whose number of steps can vary on trees are informative; autapomorphic and invariant characters are uninformative (these can be determined by inspection of the data). However, in UPGMA autapomorphic characters are informative. (cf. informative).
Unordered (non-additive)
    A character type with > 2 states that have no plausible evolutionary sequence, e.g. the nucleotides A, C, G and T. A change between any pair of states is counted as 1 step. This is by far the most common type of character state used in cladistic analyses (cf. ordered, Dollo, irreversible).
Unrooted tree (network)
    A cladogram for which the ancestor (= root) has not been hypothesized, and which thus does not specify the direction of evolutionary change among the character-states. An unrooted tree can be rooted on any of its branches, and so there are many rooted trees that can be derived from a single unrooted tree (cf. rooted tree).
back to top
    A polyphyletic relationship among molecular sequences resulting from horizontal gene transfer (cf. orthology).