Proc. Natl. Acad. Sci. USA
Vol. 94, pp. 13738–13742, December 1997
Evolution
Evolutionary specialization of the nuclear targeting apparatus
(nuclear transportya and b karyopherinsyArmadillo and HEAT motifs)
HARMIT S. MALIK, THOMAS H. EICKBUSH,
AND
DAVID S. GOLDFARB*
Department of Biology, University of Rochester, Rochester, NY 14627
Communicated by Masayasu Nomura, University of California, Irvine, CA, October 2, 1997 (received for review August 1, 1997)
containing HEAT motifs (11) (the HEAT acronym is based on
proteins in which these repeats were originally identified, ref.
11). These motifs are 38–45 amino acids in length and have
been proposed to characterize proteins involved in transport
processes. HEAT motifs appear in both eukaryotic and prokaryotic proteins (11).
The complexity of the extant ayb Kap nuclear targeting
apparatus raises interesting evolutionary questions about the
origin and function of these genetically diverse gene families.
This apparatus must have evolved in the earliest nucleated
eukaryotes given their need to compartmentalize proteins to
either the nucleus or cytoplasm. Here, we have investigated the
evolutionary relationships among these factors by using molecular phylogenetic techniques. This analysis suggests that the
various a- and b-Kaps have evolved from a common ancestor.
Three major specialization events have occurred during the
evolution of these eukaryotic gene families. First, the a-Kaps
arose together with the a-dependent b-Kap from an aindependent b-Kap progenitor. Second, the a-Kaps divided
into two groups, the a1-Kaps and a2-Kaps. Finally, gene
duplication and functional specialization has occurred within
the a2-Kap family.
ABSTRACT
The a- and b-karyopherins (Kaps), also
called importins, mediate the nuclear transport of proteins.
All a-Kaps contain a central domain composed of eight
approximately 40 amino acid, tandemly arranged, armadillolike (Arm) repeats. The number and order of these repeats
have not changed since the common origin of fungi, plants,
and mammals. Phylogenetic analysis suggests that the various
a-Kaps fall into two groups, a1 and a2. Whereas animals
encode both types, the yeast genome encodes only an a1-Kap.
The b-Kaps are characterized by 14–15 tandemly arranged
HEAT motifs. We show that the Arm repeats of a-Kaps and
the HEAT motifs of b-Kaps are similar, suggesting that the
a-Kaps and b-Kaps (and for that matter, all Arm and HEAT
repeat-containing proteins) are members of the same protein
superfamily. Phylogenetic analysis indicates that there are at
least three major groups of b-Kaps, consistent with their
proposed cargo specificities. We present a model in which an
a-independent b-Kap progenitor gave rise to the a-dependent
b-Kaps and the a-Kaps.
The nuclear pore complex is remarkable for its capacity to
catalyze the bi-directional translocation of diverse macromolecules (1–3). In contrast, the a and b karyopherins (here
designated ayb Kaps) families of nuclear targeting factors [also
called ayb importin, pore-targeting apparatus, nuclear localization signal (NLS) receptor-P97 complex, or Kap60y95]
appear to have become specialized to mediate the transport of
selected kinds of NLS cargo (4). The prototypic b-Kap95
factors of yeast and humans function as heterodimers with
a-Kap60, an NLS receptor, to mediate the docking of NLS
cargo at the cytoplasmic face of the nuclear pore complex.
a-Kap60 mediates the import of a wide range of karyophilic
proteins that contain simian virus 40 T-antigen-like or nucleoplasmin-like NLS sequences (1–3).
In contrast to a-Kap dependent b-Kap95, yeast b-Kap104
(5), and its human homologue transportin (here called human
Kap104) (6–8) are a-Kap independent transport factors that
exhibit both the NLS-binding and nucleoporin binding activities needed to import an abundant group of M9 signalcontaining heterogeneous nuclear ribonucleoproteins. Recent
results suggest that yeast b-Kap121 and b-Kap123 (9) and the
human b-Kap121 homologue (10) are also a-Kap independent
transport factors that mediate the import of ribosomal proteins. b-Kap95, -Kap104, -Kap121, and -Kap123 also have been
termed b1–4, respectively (8, 10). The in vivo substrate ranges
and binding properties of the different NLS receptors for their
cognate NLSs remain inadequately defined. For example, it is
still unclear if the different a-Kap and b-Kap NLS receptors
have discrete or overlapping NLS-cargo specificities.
Although little is known about the secondary structure of
the b-Kaps, an earlier report has identified these proteins as
RESULTS AND DISCUSSION
Phylogeny of the a-Kaps. Phylogenetic analysis was first
used to establish the relationship between different a-Kaps.
a-Kaps are '60-kDa polypeptides previously characterized as
being composed of eight tandem 38–45 residue Armadillo
(Arm) repeats (identified in Drosophila armadillo protein)
flanked by '100-aa N- and C-terminal domains (12, 13).
Alignment of all available a-Kap-like sequences revealed
extensive similarity within the Arm repeats, N-, and Cterminus domains (Fig. 1).
Because the a-Kaps contain internal tandem repeats, the
alignment in Fig. 1 cannot be used in constructing a phylogeny
unless it can be unambiguously determined that the order of
individual repeats within the domain has been conserved.
Otherwise, nonhomologous repeats would be aligned, leading
to incorrectly inferred phylogenies. A phylogenetic analysis of
each of the eight separate Arm repeats from three highly
divergent a-Kaps is shown in Fig. 2 and indicates that the order
of the eight Arm repeats has been conserved from yeast to
humans. For example, Arm repeat no. 1 in the yeast a-Kap is
more similar to repeat no. 1 in the two human a-Kaps than to
any other Arm repeat in the yeast and human proteins. As
shown in Fig. 2, a similar conclusion can be made for each of
the eight Arm repeats. We conclude that the order of the Arm
repeats has been conserved in all known a-Kaps. This result
predicts that the number and order of the Arm repeats of the
extant a-Kaps is similar to that of the progenitor a-Kap. The
precise conservation of Arm repeat order further suggests that
the eight repeats are not functionally interchangeable. This
The publication costs of this article were defrayed in part by page charge
payment. This article must therefore be hereby marked ‘‘advertisement’’ in
accordance with 18 U.S.C. §1734 solely to indicate this fact.
Abbreviations: Kap, karyopherin; Arm, armadillo; NLS, nuclear localization signal.
*To whom reprint requests should be addressed. e-mail: Goldfarb@
Nucleus.Biology.Rochester.edu.
© 1997 by The National Academy of Sciences 0027-8424y97y9413738-5$2.00y0
PNAS is available online at http:yywww.pnas.org.
13738
Evolution: Malik et al.
Proc. Natl. Acad. Sci. USA 94 (1997)
13739
FIG. 1. a-Kap alignment. Complete ORFs from the various a-Kaps were aligned by using the CLUSTALV package of programs (27) and high
gap penalties. p and • below the alignment indicate identities and similarities, respectively among all a-Kaps. The boxes indicate the most recent
definition of the Arm repeats (12, 13), and the horizontal arrows indicate the HEAT motifs. The PID (protein identification) numbers of each
sequence are indicated at the beginning of the alignment. Note that the Caenorhabditis elegans a-Kap, a composite of several expressed sequence
tags, has a considerably longer N terminus, which may be an artifact.
hypothesis may be tested by constructing recombinant a-Kaps
with shuffled repeat units.
The Arm repeat domains of all available a-Kap sequences
were used to construct a phylogeny by the Neighbor-Joining
method (Fig. 3). Based on the phylogeny shown in Fig. 3, we
propose that the members of the group containing the yeast
SRP1 encoded a-Kap be referred to as ‘‘a1-Kaps’’ (or a1importins), and that the members of the other groups be called
‘‘a2-Kaps’’ (or a2-importins). These two groups are likely to
FIG. 2. Conservation of the order of Arm repeats within the
a-Kaps. The eight individual Arm repeats from yeast a1-Kap, human
a1-Kap, and human a2B-Kap were aligned, and a phylogenetic
relationship was derived by using the Neighbor-Joining method (28).
The tree shown here is a 50% consensus tree with bootstraps represented as a percentage of 1,000 replications.
reflect a functional divergence of a-Kaps, because each group
is broadly distributed in different phyla. Note, however, that
within the a2 clade shown in Fig. 3 there are three distinct
sub-clades, each as different from one another as they are from
the a1 group. We have designated these three clades as
subdivisions of a single a2 group. This designation was done in
recognition of the fact that yeast contains only an a1-Kap (see
below). Thus, the a2A, a2B, and a2C Kaps all are likely to have
arisen from an archetypal a1-Kap. Alternatively, we could
have just as easily designated the three a2 clades as a2, a3, and
FIG. 3. Phylogeny of the a-Kaps. The alignment of the Arm
repeat-containing domains of the various a-Kaps was used to construct
a phylogeny by the Neighbor-Joining method (28). Shown here is the
consensus tree of 1,000 bootstrap replications. The tree was midpointrooted between the a1 and a2-Kaps. The phylogenetic relationship of
the proteins based on the Maximum Parsimony algorithm, by using the
PAUP package of programs (30), also was determined and gave the
identical topology. Nomenclatures based both on the submitted names
as well as a proposed scheme are indicated.
13740
Evolution: Malik et al.
a4. Previous number designations of a-Kaps have been made
somewhat arbitrarily, for example, according to the order of
their submission to the database (see Fig. 1). The previous
nomenclature incorrectly implies that the human and mouse
a2-Kaps are more closely related to the yeast a1-Kap than are
the human and mouse a1-Kaps. The approach we have taken
to assigning a-Kap nomenclature is more consistent with the
derived a-Kap phylogeny shown in Fig. 3.
The a-Kap phylogeny is notable for the existence of only a
single a1-Kap and no a2-Kap gene in Saccharomyces cerevisiae
(SRP1) as well as multiple a1- and a2-Kaps in humans.
Because of the availability of the entire yeast genome sequence, it is unlikely that another a-Kap exists in yeast. This
fact indicates that the yeast a1-Kap must be capable of
performing all of the transport duties required of an a-Kap in
this single-extensive organism. The a2-Kap family has undergone more extensive specialization than the a-Kaps. Three
highly divergent clades of the a2-Kaps, designated a2A, a2B,
and a2C, may serve the distinct developmental, tissue-type
specific, and NLS cargo-specific functions of more complex
organisms. This hypothesis is supported by the observations
that the expression of Drosophila a2A-Kap (Pendulin: refs. 14
and 15) and mouse a2B-Kap (16) are developmentally regulated and that a1 and a2B-Kaps in humans differ in their in
vitro NLS binding properties (17). Even within the three clades
of the a2-Kaps there has been significant sub-specialization.
Two similar a2B-Kaps identified in Xenopus (18), and the
Proc. Natl. Acad. Sci. USA 94 (1997)
human alpha3yQip1 (a2C-Kap clade) are examples of this
specialization. Future sequencing of a-Kap genes from divergent species should help further resolve the various a-Kap
groups and provide insight into when these groups appeared in
the eukaryotic lineage. For example, are a2-Kaps restricted to
metazoans? Do they occur in vascular plants? [To date, only
a1-Kap expressed sequence tags have been reported in plants
(not shown).] Do mammalian equivalents of the Drosophila
a2A-Kap (Pendulin) exist?
Repeat Structure and Phylogeny of the b-Kaps. b-Kaps in
yeast fall into four major classes—the a-dependent b-Kap95p,
the heterogeneous nuclear ribonucleoprotein-transporting bKap104p, and the ribosomal protein transporting factors bKap121p and b-Kap123p. As mentioned earlier, b-Kaps are
characterized by internal HEAT motifs. At the amino acid
sequence level, HEAT motifs are significantly more degenerate than Arm repeats, but each tandem repeat appears to
represent a conserved secondary structure motif consisting of
antiparallel a-helices with short connecting loops (11, 19). In
the b-Kaps, specific HEAT repeats have been proposed to bear
Ran-binding determinants (see ref. 20). We have attempted to
construct a comprehensive map of the HEAT and non-HEATcontaining domains of these b-Kaps. As shown in Fig. 4, the
b-Kaps are composed predominately of 14–15 tandem HEAT
repeats, flanked by 100–150 amino acid arms (the KAP95s are
missing the C-terminal arms). The HEAT repeats are in
tandem except for the interruption of 1–3 non-HEAT se-
FIG. 4. Distribution of HEAT motifs in the b-Kaps. HEAT motifs are indicated as boxes, and non-HEAT-containing sequences are represented
by thick lines. The entire human and yeast ORFs were aligned by CLUSTALV (27) in pairs. A window of about 40 amino acids, matching the HEAT
consensus (11), then was used to match the ORF pairs with a high gap penalty, to score for individual HEAT motifs. The alignment windows were
evaluated individually on the basis of matches to the residues identified previously as being important for the secondary structure. Mismatches of
one important residue per a-helix were allowed as long as the secondary structure prediction was unaffected. Secondary structure predictions were
used based on the SSPRED method (29) to confirm the primary sequence alignments. The darkly shaded boxes represent HEAT motifs that are
unambiguously homologous in the various b-Kaps (data not shown). The dashed boxes represent HEAT motifs that are not close matches to the
universal HEAT consensus (Fig. 3) but are predicted to assume a similar secondary structure (11, 19, 20). Each HEAT repeat is bounded by numbers
above and below the box that indicate the beginning and ending amino acid for that particular repeat. Accession numbersyprotein identification
numbers are indicated below the protein names.
Evolution: Malik et al.
quence or ‘‘partial’’ HEAT sequence-containing regions. Most
of the length variation of the different b-Kaps is derived from
non-HEAT motif expansion domains in the larger b-Kaps.
Because of the high sequence degeneracy among HEAT
motifs, it is difficult to match homologous HEAT repeats
among the b-Kaps. However, we were able to unambiguously
identify three sets of homologous HEAT repeats among the
four b-Kap groups—the first, sixth, and seventh (Fig. 4, dark
boxes). An approximately 79-residue sequence (Fig. 5) containing the sixth and seventh HEAT repeats, which may
contain part of a Ran-binding determinant (20), was used to
construct a phylogeny of the four b-Kap groups (Fig. 6). An
alignment of the entire protein sequences based on their
similar domain structures (Fig. 4) yielded a similar tree (not
shown). Not included in this analysis (because they were
missing the sixth and seventh HEAT repeats) are a number of
expressed sequence tags from vertebrates and invertebrates
that fall in the Kap95, Kap104, or Kap121 lineages. No
expressed sequence tags homologous to yeast Kap123p were
identified. Thus, KAP123, which is nonessential for growth,
may be unique to yeast or is expressed at low levels in
metazoans. The tree in Fig. 6 suggests that the Kap95p,
Kap104p, and Kap121y123p groups are very old and about
equally divergent, consistent with their proposed functions in
defining eukaryotic nuclear transport pathways (1–10). Although it is not possible to root this tree, the most parsimonious explanation to account for two of three major branches
being a-Kap independent is that the Kap95p ancestor arose
from an a-Kap independent progenitor.
Alignment of Arm and HEAT Repeats. If, as we postulate,
a-dependence is a derived character, then it is likely that the
a-Kaps would have appeared in tandem with the a-dependent
b-Kaps. This simultaneous appearance could have occurred by
a genetic event that separated the NLS-binding domain from
the nuclear pore complex docking domain of an a-independent
b-Kap-like ancestor into separate genes. Evidence that the
ancestral a-KAP gene arose from a b-KAP progenitor is that
the a-Kap Arm and b-Kap HEAT repeats can be aligned by
shifting by 10 residues the currently accepted definition of the
tandem Arm repeat junctions (refs. 12 and 13; Fig. 7). Indeed,
the definition of the central domain of a-Kaps as containing
HEAT motifs is better because the relatively poor first Arm
repeat becomes an excellent match to a HEAT repeat (Figs. 1
and 7). Second, the proposed secondary structure of the
HEAT repeat-containing proteins suggests they are in a
repeating helical packing structure (19). If, as we suggest, this
structure is present in a-Kaps and other Arm repeatcontaining proteins (19, 21), the currently defined (12, 13)
Arm repeat junction interrupts one of the two helical secondary structures found in every repeat (Figs. 1 and 7). The
original definition (22) of the Arm repeat junction is in
accordance with the HEAT junction, and correctly positions
the border of the sequence outside of the two helical regions.
It should be noted that the phylogenetic analysis of the a-Kaps
that used HEAT junctions yields the same tree topologies as
Proc. Natl. Acad. Sci. USA 94 (1997)
13741
FIG. 6. Phylogeny of the b-Kaps. Neighbor-Joining trees (28)
based on the alignment in Fig. 5, with bootstrap values of 1,000
replications. The phylogenetic relationship of the proteins based on
Maximum Parsimony (30) differs in that the human Kap121 falls
outside a branch containing both the yeast Kap121 and yeast Kap123.
shown in Figs. 2 and 3, which are based on the Arm junctions
(data not shown).
The finding that Arm and HEAT repeats define the same
two-helical secondary structure suggests that Arm and HEAT
repeat-containing proteins belong to the same protein superfamily and may share analogous functions. Both domains
appear to function as scaffolds to bind various protein ligands.
For example, the Arm repeats of b-catenin are required for
binding to a-catenin, cadherin, and adenomatous polyposis
coli (23). The HEAT motifs of the A subunit of PP2A associate
with tumor antigens and with the B and C subunits of PP2A
(19). The same is likely to be true for the Arm and HEAT
motifs of a-Kaps and b-Kaps, respectively, which form a
heterodimer in the targeting of NLS-bearing substrates to the
nuclear pore complex.
Model for the Evolution of the Kaps. Based on these results
we propose the following model for the evolution of b and ayb
Kap-mediated nuclear transport. Both b and ayb systems likely
evolved from an ancient a-Kap-independent b-Kap progenitor. The b-KAP95 gene family arose at about the time of the
separation between the b-KAP104 and b-KAP121y123 families. Recently, additional genes with partial homology to the
b-Kaps have been reported (24, 25), suggesting further specialization within this family. We propose that the ancestral
a-KAP gene, evolved in tandem with the b-KAP95 progenitor.
Because yeast has a single a1-Kap and the a2-Kaps appear to
be more adapted to specialized functions, we propose the first
a-Kap was probably of the a1 type. A second major phase of
evolution within the Kaps, the appearance and subspecialization of the a2-Kaps, occurred after the separation of
the yeast and human lineages to serve the specialized targeting
and nuclear functions of early metazoan cell types, associated
with development (26). For example, the loss of the Drosophila
FIG. 5. Alignment of conserved sixth and seventh HEAT motifs of b-Kap sequences. p and • at the bottom of the alignment represent identities
and similarities among the b-Kap sequences, respectively. Putative Ran-binding determinants (20) are highlighted in bold.
13742
Evolution: Malik et al.
Proc. Natl. Acad. Sci. USA 94 (1997)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
FIG. 7. Alignment of Arm and HEAT motifs. HEAT motifs from
the various b-Kaps and Arm repeat-containing segments from Yeast
a2-Kap and Drosophila Armadillo (b-catenin) were aligned to the
universal HEAT consensus (11). p and h indicate polar and hydrophobic residues, and HHH indicates a-helical secondary structure
prediction. The arrow indicates the Arm repeat junction identified in
Fig. 1 (12). Numbers at the beginning and end of each amino acid
stretch represent the number of amino acids to the end of the ORF.
Numbers in parentheses represent the number of amino acids between
the consecutive HEAT repeats.
a2-Kap Pendulin is an embryonic lethal mutation that causes
defects in the control of cell proliferation and malignancy of
hematopoietic organs (14, 15).
Note Added in Proof. Recently, the structure of the Arm-repeat
domain of murine b-catenin was solved (31). The junctions between
the 12 secondary structure motifs were found to be inconsistent with
the Arm repeat definition. These authors proposed an alternative
repeat junction that is in agreement with the HEAT motif definitions
that we suggest applies to the a-Kaps and b-catenins (Fig. 7).
This work was supported by National Institutes of Health Public
Health Service Grants GM40362 (D.S.G.) and GM42790 (T.H.E.).
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
Gorlich, D. S. & Mattaj, I. W. (1996) Science 271, 1513–1518.
Nigg, E. A. (1997) Nature (London) 386, 779–787.
Corbett, A. H. & Silver, P. A. (1997) Microbiol. Mol. Biol. Rev. 61,
193–211.
Goldfarb, D. S. (1997) Curr. Biol. 7, 14–17.
Aitchison, J. D., Blobel, G. & Rout, M. P. (1996) Science 274,
624–627.
Pollard, V. W., Michael, W. M., Nakielny, S., Siomi, M. C., Wang,
F. & Dreyfuss, G. (1996) Cell 86, 985–994.
Nakielny, S., Siomi, M. C., Siomi, H., Michael, W. M., Pollard, V.
& Dreyfuss, G. (1996) Exp. Cell Res. 229, 261–266.
Bonifaci, N., Moroianu, J., Radu, A. & Blobel, G. (1997) Proc.
Natl. Acad. Sci. USA 94, 5055–5060.
Rout, M. P., Blobel, G. & Aitchison, J. D. (1997) Cell 89, 715–725.
Yaseen, N. R. & Blobel, G. (1997) Proc. Natl. Acad. Sci. USA 94,
4451–4456.
Andrade, M. A. & Bork, P. (1995) Nat. Genet. 11, 115–116.
Peifer, M., Berg, S. & Reynolds, A. B. (1994) Cell 76, 789–791.
Kussel, P. & Frasch, M. (1995) J. Cell Biol. 129, 1491–1507.
Kussel, P. & Frasch, M. (1995) Mol. Gen. Genet. 248, 351–363.
Torok, I., Strand, D., Schmitt, R., Tick, G., Torok, T., Kiss, I. &
Mechler, B. M. (1995) J. Cell Biol. 129, 1473–1489.
Prieve, M. G., Guttridge, K. L., Munguia, J. E. & Waterman,
M. L. (1996) J. Biol. Chem. 271, 7654–7658.
Nadler, S. G., Tritschler, D., Haffar, O. K., Blake, J., Bruce, A. G.
& Cleaveland, J. S. (1997) J. Biol. Chem. 272, 4310–4315.
Gorlich, D., Prehn, S., Laskey, R. A., Hartmann, E. (1994) Cell
79, 767–778.
Ruediger, R., Hentz, M., Fait, J., Mumby, M. & Walter, G. (1994)
J. Virol. 68, 123–129.
Moroianu, J., Blobel, G. & Radu, A. (1996) Proc. Natl. Acad. Sci.
USA 93, 6572–6576.
Hirschl, D., Bayer, P., Muller, O. (1996) FEBS Lett. 383, 31–36.
Yano, R., Oakes, M., Yamaghishi, M., Dodd, J. A. & Nomura, M.
(1992) Mol. Cell. Biol. 12, 5640–5651.
Peifer, M. (1995) Trends Cell Biol. 5, 224–229.
Fornerod, M., van Deursen, J., van Baal, S., Reynolds, A., Davis,
D., Murti, K. G., Fransen, J. & Grosveld, G. (1997) EMBO J. 16,
807–816.
Gorlich, D., Dabrowski, M., Bischoff, F. R., Kutay, U., Bork, P.,
Hartmann, E., Prehn, S. & Izaurralde, E. (1997) J. Cell Biol. 138,
65–80.
Gorlich, D., Kraft, R., Kostka, S., Vogel, F., Hartmann, E.,
Laskey, R. A., Mattaj, I. W. & Izaurralde, E. (1996) Cell 87,
21–32.
Higgins, D. G., Bleasby, A. J. & Fuchs, R. (1992) Comput. Appl.
Biosci. 8, 189–191.
Saitou, N. & Nei, M. (1987) Mol. Biol. Evol. 4, 406–425.
Mehta, P. K., Heringa, J. & Argos, P. (1995) Protein Sci. 4,
2517–2525.
Swofford, D. L. (1991) Illinois Natural History Survey, Champaign, IL.
Huber, A. H., Nelson, W. J. & Weis, W. I. (1997) Cell 90,
871–882.