Skip to content

Commit fa0fae2

Browse files
committed
Change protein references to structure
Since the structural alignment algorithms work for all biological structures, not only proteins, their description should be generalized.
1 parent ac15b5d commit fa0fae2

File tree

1 file changed

+36
-21
lines changed

1 file changed

+36
-21
lines changed

structure/alignment.md

Lines changed: 36 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,42 @@
1-
Protein Structure Alignment
1+
Structure Alignment
22
===========================
33

4-
## What is a structure alignment?
4+
## What is a Structure Alignment?
55

6-
A **Structural alignment** attempts to establish equivalences between two or more polymer structures based on their shape and three-dimensional conformation. In contrast to simple structural superposition (see below), where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions.
6+
A **structural alignment** attempts to establish equivalences between two or more polymer structures based on their shape and three-dimensional conformation. In contrast to simple structural superposition (see below), where at least some equivalent residues of the two structures are known, structural alignment requires no a priori knowledge of equivalent positions.
77

8-
Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. Structural alignment can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be exercised when using the results as evidence for shared evolutionary ancestry, because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.
8+
**Structural alignment** is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard sequence alignment techniques. **Structural alignment** can therefore be used to imply evolutionary relationships between proteins that share very little common sequence. However, caution should be exercised when using the results as evidence for shared evolutionary ancestry, because of the possible confounding effects of convergent evolution by which multiple unrelated amino acid sequences converge on a common tertiary structure.
99

10-
For more info see the Wikipedia article on [protein structure alignment](http://en.wikipedia.org/wiki/Structural_alignment).
10+
**Structural alignment** of other biological structures can also be made in BioJava. For example, nucleic acids can
11+
be structurally aligned to find common structural motifs, independent of sequence simililarity. This is specially
12+
important for RNAs, because their 3D structure arrangement is important for their function.
13+
14+
For more info see the Wikipedia article on [structure alignment](http://en.wikipedia.org/wiki/Structural_alignment).
1115

1216
## Alignment Algorithms supported by BioJava
1317

1418
BioJava comes with a number of algorithms for aligning structures. The following
1519
five options are displayed by default in the graphical user interface (GUI),
1620
although others can be accessed programmatically using the methods in
17-
[StructureAlignmentFactory](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/StructureAlignmentFactory.html).
21+
[StructureAlignmentFactory]
22+
(http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/StructureAlignmentFactory.html).
1823

1924
1. Combinatorial Extension (CE)
2025
2. Combinatorial Extension with Circular Permutation (CE-CP)
2126
3. FATCAT - rigid
2227
4. FATCAT - flexible.
2328
5. Smith-Waterman superposition
2429

25-
CE and FATCAT both use structural similarity to align the proteins, while
30+
CE and FATCAT both use structural similarity to align the structures, while
2631
Smith-Waterman performs a local sequence alignment and then displays the result
2732
in 3D. See below for descriptions of the algorithms.
2833

34+
Since BioJava version 4.1.0, multiple structure alignments can be generated and visualized.
35+
The algorithm is described in detail below. As an overview, it uses any pairwise alignment
36+
algorithm and a reference structure to align all of the structures. Then, it runs a Monte
37+
Carlo optimization method to determine the residue equivalencies between all the strucutures,
38+
identifying conserved structural motifs.
39+
2940
## Alignment User Interface
3041

3142
Before going the details how to use the algorithms programmatically, let's take
@@ -39,7 +50,7 @@ This code shows the following user interface:
3950

4051
![Alignment GUI](img/alignment_gui.png)
4152

42-
You can manually select protein chains, domains, or custom files to be aligned.
53+
You can manually select structure chains, domains, or custom files to be aligned.
4354
Try to align 2hyn vs. 1zll. This will show the results in a graphical way, in
4455
3D:
4556

@@ -60,7 +71,7 @@ algorithms.
6071
The Combinatorial Extension (CE) algorithm was originally developed by
6172
[Shindyalov and Bourne in
6273
1998](http://peds.oxfordjournals.org/content/11/9/739.short) [![pubmed](http://img.shields.io/badge/in-pubmed-blue.svg?style=flat)](http://www.ncbi.nlm.nih.gov/pubmed/9796821).
63-
It works by identifying segments of the two proteins with similar local
74+
It works by identifying segments of the two structures with similar local
6475
structure, and then combining those to try to align the most residues possible
6576
while keeping the overall RMSD of the superposition low.
6677

@@ -77,15 +88,16 @@ BioJava class: [org.biojava.bio.structure.align.ce.CeMain](http://www.biojava.or
7788
### Combinatorial Extension with Circular Permutation (CE-CP)
7889

7990
CE and FATCAT both assume that aligned residues occur in the same order in both
80-
proteins (e.g. they are both *sequence-order dependent* algorithms). In proteins
91+
structures (e.g. they are both *sequence-order dependent* algorithms). In proteins
8192
related by a circular permutation, the N-terminal part of one protein is related
8293
to the C-terminal part of the other, and vice versa. CE-CP allows circularly
8394
permuted proteins to be compared. For more information on circular
8495
permutations, see the
8596
[Wikipedia](http://en.wikipedia.org/wiki/Circular_permutation_in_proteins) or
86-
[Molecule of the
87-
Month](http://www.pdb.org/pdb/101/motm.do?momID=124&evtc=Suggest&evta=Moleculeof%20the%20Month&evtl=TopBar)
88-
articles [![pubmed](http://img.shields.io/badge/in-pubmed-blue.svg?style=flat)](http://www.ncbi.nlm.nih.gov/pubmed/22496628).
97+
[Molecule of the Month]
98+
(http://www.pdb.org/pdb/101/motm.do?momID=124&evtc=Suggest&evta=Moleculeof%20the%20Month&evtl=TopBar)
99+
articles [![pubmed]
100+
(http://img.shields.io/badge/in-pubmed-blue.svg?style=flat)](http://www.ncbi.nlm.nih.gov/pubmed/22496628).
89101

90102

91103
For proteins without a circular permutation, CE-CP results look very similar to
@@ -97,23 +109,24 @@ proteins will be shown in different colors:
97109

98110
CE-CP was developed by Spencer E. Bliven, Philip E. Bourne, and Andreas Prlić.
99111

100-
BioJava class: [org.biojava.bio.structure.align.ce.CeCPMain](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/ce/CeCPMain.html)
112+
BioJava class: [org.biojava.nbio.structure.align.ce.CeCPMain](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/ce/CeCPMain.html)
101113

102114
### FATCAT - rigid
103115

104116
This is a Java implementation of the original FATCAT algorithm by [Yuzhen Ye
105117
& Adam Godzik in
106118
2003](http://bioinformatics.oxfordjournals.org/content/19/suppl_2/ii246.abstract)
107119
[![pubmed](http://img.shields.io/badge/in-pubmed-blue.svg?style=flat)](http://www.ncbi.nlm.nih.gov/pubmed/14534198).
108-
It performs similarly to CE for most proteins. The 'rigid' flavor uses a
120+
It performs similarly to CE for most structures. The 'rigid' flavor uses a
109121
rigid-body superposition and only considers alignments with matching sequence
110122
order.
111123

112-
BioJava class: [org.biojava.bio.structure.align.fatcat.FatCatRigid](www.biojava.org/docs/api/org/biojava/nbio/structure/align/fatcat/FatCatRigid.html)
124+
BioJava class: [org.biojava.nbio.structure.align.fatcat.FatCatRigid]
125+
(www.biojava.org/docs/api/org/biojava/nbio/structure/align/fatcat/FatCatRigid.html)
113126

114127
### FATCAT - flexible
115128

116-
FATCAT-flexible introduces 'twists' between different parts of the proteins
129+
FATCAT-flexible introduces 'twists' between different parts of the structures
117130
which are superimposed independently. This is ideal for proteins which undergo
118131
large conformational shifts, where a global superposition cannot capture the
119132
underlying similarity between domains. For instance, the structures of
@@ -124,21 +137,23 @@ this is that it can lead to additional false positives in unrelated structures.
124137
![(Left) Rigid and (Right) flexible alignments of
125138
calmodulin](img/1cfd_1cll_fatcat.png)
126139

127-
BioJava class: [org.biojava.bio.structure.align.fatcat.FatCatFlexible](www.biojava.org/docs/api/org/biojava/nbio/structure/align/fatcat/FatCatFlexible.html)
140+
BioJava class: [org.biojava.nbio.structure.align.fatcat.FatCatFlexible]
141+
(www.biojava.org/docs/api/org/biojava/nbio/structure/align/fatcat/FatCatFlexible.html)
128142

129143
### Smith-Waterman
130144

131145
This aligns residues based on Smith and Waterman's 1981 algorithm for local
132146
*sequence* alignment [![pubmed](http://img.shields.io/badge/in-pubmed-blue.svg?style=flat)](http://www.ncbi.nlm.nih.gov/pubmed/7265238). No structural information is included in the alignment, so
133-
this only works for proteins with significant sequence similarity. It uses the
147+
this only works for structures with significant sequence similarity. It uses the
134148
Blosum65 scoring matrix.
135149

136150
The two structures are superimposed based on this alignment. Be aware that errors
137151
locating gaps can lead to high RMSD in the resulting superposition due to a
138152
small number of badly aligned residues. However, this method is faster than
139153
the structure-based methods.
140154

141-
BioJava Class: [org.biojava.bio.structure.align.ce.CeCPMain](http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/ce/CeCPMain.html)
155+
BioJava Class: [org.biojava.nbio.structure.align.ce.CeCPMain]
156+
(http://www.biojava.org/docs/api/org/biojava/nbio/structure/align/ce/CeCPMain.html)
142157

143158
### Other methods
144159

@@ -253,7 +268,7 @@ file in various formats.
253268

254269
## See Also
255270

256-
For details about the structure alignment data models in biojava, see [Structure Alignment Data Models](alignment-data-model.md)
271+
For details about the structure alignment data models in biojava, see [Structure Alignment Data Model](alignment-data-model.md)
257272

258273
## Acknowledgements
259274

0 commit comments

Comments
 (0)