-
Notifications
You must be signed in to change notification settings - Fork 16
Expand file tree
/
Copy pathBioJava3:HowTo.html
More file actions
179 lines (145 loc) · 9.2 KB
/
BioJava3:HowTo.html
File metadata and controls
179 lines (145 loc) · 9.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
<p>This page is a work-in-progress, describing each of the key areas in
which you might want to work with the new BioJava3 code. It is
structured in the form of use-cases and is not a comprehensive resource.
Sections will be added and updated as new modules are added and existing
ones developed in more detail.</p>
<h1 id="symbols-and-alphabets">Symbols and Alphabets</h1>
<h2 id="a-dna-sequence">A DNA sequence</h2>
<p>All the examples in this section require the biojava-dna module.</p>
<h3 id="construction-and-basic-manipulation">Construction and basic manipulation</h3>
<p><code class="highlighter-rouge"> String mySeqString = "ATCGatcgATCG"; // Note that you can use mixed-case strings.</code><br />
<code class="highlighter-rouge"> List</code><symbol>` mySeq = SymbolListFormatter.parseSymbolList(mySeqString);`
` `
` // Is it a big list? Don't want to hold it all in memory? Use an iterator instead.`
` for (Iterator`<symbol>` myIterator = SymbolListFormater.parseSymbols(mySeqString);`
` myIterator.hasNext(); ) {`
` Symbol sym = myIterator.next();`
` }`
` `
` // You can now use any List method, from Java Collections, to manipulate the list of bases.`
` `
` // The List returned is actually a SymbolList, you can cast it to get some bio-specific`
` // functions that work with 1-indexed positions as opposed to Java's default 0-indexed positions.`
` `
` SymbolList symList = (SymbolList)mySeq; `
` Symbol symA = symList.get(0); // The first symbol, List-style.`
` Symbol symB = symList.get_bio(1) ; // The first symbol, bio-style. `
` if (symA==symB) { // Symbols are singletons, so == will work if they are identical including case.`
` System.out.println("Identical!");`
` }`
` `
` // Instead of using equals() or == to compare symbols, use the alphabet of your choice to`
` // compare them in multiple ways. It will return different values depending on whether one`
` // is a gap and the other isn't, whether they match exactly, or if they're the same symbol`
` // but in a different case, etc.`
` Alphabet dna = DNATools.DNA_ALPHABET;`
` SymbolMatchType matchType = dna.getSymbolMatchType(Symbol.get("A"), Symbol.get("a"));`</symbol></symbol></p>
<h3 id="reversing-and-complementing-dna">Reversing and Complementing DNA</h3>
<p><code class="highlighter-rouge"> // All methods in this section modify the list in-place.</code><br />
<code class="highlighter-rouge"> List</code><symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");`
` `
` // Reverse.`
` // Method A.`
` Collections.reverse(mySeq); // Using Java Collections.`
` // Method B.`
` DNATools.reverse(mySeq); // DNATools-style.`
` `
` // Complement.`
` DNATools.complement(mySeq);`
` `
` // Reverse-complement.`
` DNATools.reverseComplement(mySeq);`
` `
` // Reverse only the third and fourth bases, 0-indexed list style?`
` Collections.reverse(mySeq.subList(2,4)); // Java Collections API.`
` `
` // Do the same, 1-indexed bio style?`
` Collections.reverse(((SymbolList)mySeq).subList_bio(3,5));`</symbol></p>
<h3 id="editing-the-sequence">Editing the sequence</h3>
<p><code class="highlighter-rouge"> // Delete the second and third bases.</code><br />
<code class="highlighter-rouge"> List</code><symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");`
` mySeq.subList(1,3).clear();`
` `
` // Remove only 2nd base, bio-style.`
` ((SymbolList)mySeq).remove_bio(2);`
` `
` // Get another sequence and insert it after the 1st base.`
` List`<symbol>` otherSeq = SymbolListFormatter.parseSymbolList("GGGG");`
` mySeq.addAll(1, otherSeq);`</symbol></symbol></p>
<h2 id="a-quality-scored-dna-sequence">A quality-scored DNA sequence</h2>
<h3 id="constructing-a-quality-scored-dna-sequence">Constructing a quality-scored DNA sequence</h3>
<p><code class="highlighter-rouge"> // Construct a default unscored DNA sequence with capacity for integer scoring.</code><br />
<code class="highlighter-rouge"> List</code><symbol>` mySeq = SymbolListFormatter.parseSymbolList("ATCG");`
` TaggedSymbolList`<integer>` scoredSeq = new TaggedSymbolList`<integer>`(mySeq);`
` `
` // Tag all the bases with the same score of 5.`
` scoredSeq.setTagRange(0, scoredSeq.length(), 5);`
` `
` // Tag just the 3rd base (0-indexed) with a score of 3.`
` scoredSeq.setTag(2, 3);`
` `
` // Do the same, 1-indexed.`
` scoredSeq.setTag_bio(3, 3);`
` `
` // Get the score at base 4, 1-indexed.`
` Integer tag = scoredSeq.getTag_bio(4);`</integer></integer></symbol></p>
<h3 id="iterating-over-the-basescore-pairs">Iterating over the base/score pairs</h3>
<p><code class="highlighter-rouge"> // A 1-indexed iterator and ListIterators are also available.</code><br />
<code class="highlighter-rouge"> for (Iterator</code><TaggedSymbol<integer>`> iter = scoredSeq.taggedSymbolIterator();`
` iter.hasNext(); ) {`
` TaggedSymbol`<integer>` taggedSym = iter.next();`
` Symbol sym = taggedSym.getSymbol();`
` Integer score = taggedSym.getTag();`
` // Change the score whilst we're at it.`
` taggedSym.setTag(6); // Updates the score to 6 in the original set of tagged scores.`
` }`</integer></integer></p>
<h3 id="iterating-over-the-bases-only">Iterating over the bases only</h3>
<p><code class="highlighter-rouge"> // Use the default iterator.</code><br />
<code class="highlighter-rouge"> // A ListIterator is also available, as are 1-indexed iterators.</code><br />
<code class="highlighter-rouge"> Iterator</code><symbol>` iter = scoredSeq.iterator();`</symbol></p>
<h3 id="iterating-over-the-scores-only">Iterating over the scores only</h3>
<p><code class="highlighter-rouge"> // A ListIterator is also available, as are 1-indexed iterators.</code><br />
<code class="highlighter-rouge"> for (Iterator</code><integer>` iter = scoredSeq.tagIterator(); iter.hasNext(); ) {`
` Integer score = iter.next();`
` }`</integer></p>
<h1 id="file-parsing-and-converting">File parsing and converting</h1>
<h2 id="fasta">FASTA</h2>
<p>The examples in this section require the biojava-fasta module. The
examples that deal with converting to/from DNA sequences also require
the biojava-dna module.</p>
<p>Convenience wrapper classes are provided to make the parsing process
simpler for the most common use-cases.</p>
<h3 id="parsing-a-fasta-file-the-easy-way">Parsing a FASTA file (the easy way)</h3>
<p><code class="highlighter-rouge"> for (ThingParser</code><fasta>` parser = ThingParserFactory.`
` getReadParser(FASTA.format, new File("/path/to/my/fasta.fa"));`
` parser.hasNext(); ) {`
` FASTA fasta = parser.next(); `
` // fasta contains a complete FASTA record.`
` }`
` parser.close();`</fasta></p>
<h3 id="parsing-a-fasta-file-the-hard-way">Parsing a FASTA file (the hard way)</h3>
<p><code class="highlighter-rouge"> FASTAReader reader = new FASTAFileReader(new File("/path/to/my/fasta.fa"));</code><br />
<code class="highlighter-rouge"> FASTABuilder builder = new FASTABuilder();</code><br />
<code class="highlighter-rouge"> for (ThingParser</code><fasta>` parser = new ThingParser`<fasta>`(reader, builder);`
` parser.hasNext(); ) {`
` FASTA fasta = parser.next(); `
` // fasta contains a complete FASTA record.`
` }`
` parser.close();`</fasta></fasta></p>
<h3 id="converting-the-fasta-sequence-into-dna-sequence">Converting the FASTA sequence into DNA sequence</h3>
<p><code class="highlighter-rouge"> List</code><symbol>` mySeq = SymbolListFormatter.parseSymbolList(fasta.getSequence());`</symbol></p>
<h3 id="converting-a-dna-sequence-back-into-fasta">Converting a DNA sequence back into FASTA</h3>
<p><code class="highlighter-rouge"> FASTA fasta = new FASTA();</code><br />
<code class="highlighter-rouge"> fasta.setDescription("My Description Line");</code><br />
<code class="highlighter-rouge"> fasta.setSequence(SymbolListFormatter.formatSymbols(mySeq));</code></p>
<h3 id="writing-a-fasta-file-the-easy-way">Writing a FASTA file (the easy way)</h3>
<p><code class="highlighter-rouge"> ThingParser</code><fasta>` parser = ThingParserFactory.`
` getWriteParser(FASTA.format, new File("/path/to/my/fasta.fa"), fasta);`
` parser.parseAll();`
` parser.close();`</fasta></p>
<h3 id="writing-a-fasta-file-the-hard-way">Writing a FASTA file (the hard way)</h3>
<p><code class="highlighter-rouge"> FASTAEmitter emitter = new FASTAEmitter(fasta);</code><br />
<code class="highlighter-rouge"> FASTAWriter writer = new FASTAFileWriter(new File("/path/to/new/fasta.fa"));</code><br />
<code class="highlighter-rouge"> ThingParser</code><fasta>` parser = new ThingParser`<fasta>`(emitter, writer);`
` parser.parseAll();`
` parser.close();`</fasta></fasta></p>