Skip to content

Commit 3a70845

Browse files
dicknetherlandsandreasprlic
authored andcommitted
New page: ==Executive Summary== It is suggested that development stop on the existing BioJava/BioJavaX/BioJava2 aggregation and start afresh as BioJava3. ==Reasoning== * The existing code is di...
1 parent 2579784 commit 3a70845

File tree

2 files changed

+196
-0
lines changed

2 files changed

+196
-0
lines changed

_wikis/BioJava3_Proposal.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
---
2+
title: BioJava3 Proposal
3+
---
4+
5+
Executive Summary
6+
-----------------
7+
8+
It is suggested that development stop on the existing
9+
BioJava/BioJavaX/BioJava2 aggregation and start afresh as BioJava3.
10+
11+
Reasoning
12+
---------
13+
14+
- The existing code is disorganised, poorly commented, and hard to
15+
maintain due to the use of numerous different coding styles.
16+
- Existing documentation is poor and it would be hard to try and write
17+
any given the lack of code comments.
18+
- Unit testing is limited and hard to tack on to existing code.
19+
- The build scripts are out of date and the release process is hard.
20+
- There is demand for a number of smaller jars as opposed to one
21+
monolithic one.
22+
- We do not make use of any Java features since Java 4. Generics is
23+
the obvious one.
24+
- There is no support for changing file formats. It supports one
25+
version or another, but cannot handle both.
26+
- The only database support is for BioSQL, which uses Hibernate but
27+
not in a fully flexible manner (i.e. cannot connect to more than one
28+
db at a time).
29+
- It is sequence-focused. Users have moved on.
30+
31+
Proposal
32+
--------
33+
34+
- To start from scratch, creating a number of smaller jars as
35+
sub-projects within an umbrella BioJava3 project. Each jar would
36+
provide tools for a specific purpose. Additional jars would provide
37+
cross-purpose tools such as format converters or text-to-object
38+
interfaces.
39+
40+
<!-- -->
41+
42+
- Although starting from scratch, much existing code could be reused
43+
or refactored to suit the new design.
44+
45+
<!-- -->
46+
47+
- We would take full advantage of Java 6, including generics,
48+
annotations, the built-in property change support. Everything would
49+
be a bean - absolutely everything.
50+
51+
<!-- -->
52+
53+
- We would aim to be fully J2EE compliant, with the majority of
54+
components fully reusable as a bean in any other application, just
55+
like Spring's components are.
56+
57+
<!-- -->
58+
59+
- We would write a JUnit test for every single class, writing the test
60+
first then the class afterwards. We would also write documentation
61+
for every single class with additional full documentation for each
62+
separate jar.
63+
64+
<!-- -->
65+
66+
- We would adhere rigidly to a common coding style and heavily comment
67+
the code.
68+
69+
<!-- -->
70+
71+
- We should make it able to focus on any aspect the user requires and
72+
keep its efficiency, removing its dependency on everything being
73+
sequence-related.
74+
75+
<!-- -->
76+
77+
- SymbolLists and Alphabets to be rethought as these are the most
78+
common stumbling block.
79+
80+
Data structure
81+
--------------
82+
83+
- RecordSource is an object which provides data. It can represent a
84+
file, a directory of files, a database, a web search engine, etc.
85+
etc. etc.. It has a RecordFormat which reads/writes Records to/from
86+
the RecordSource. It provides an iterator over Records which match a
87+
given RecordSearch.
88+
89+
<!-- -->
90+
91+
- A RecordFormat is version-specific to the format, as are the Record
92+
objects it produces.
93+
94+
<!-- -->
95+
96+
- RecordSearch defines search criteria to be applied to a RecordSource
97+
(or group thereof). It provides an iterator which returns all the
98+
combined Records from all RecordSources the RecordSearch was applied
99+
to. It uses RDF or something similar to map fields between different
100+
kinds of Records and the search parameters.
101+
102+
<!-- -->
103+
104+
- Record is a piece of data in any format, as a bean. It should be as
105+
lightweight as possible - lazyloading of all non-key data would be
106+
ideal. Each different kind of Record has an object structure
107+
suitably matched to the RecordFormat that produced it - e.g. Genbank
108+
Record objects should be structured internally in almost exactly the
109+
same way as the Genbank file. This allows minimal loss of
110+
information and maximum flexibility.
111+
112+
<!-- -->
113+
114+
- RecordConverters convert Record objects between different formats,
115+
e.g. Genbank Record to FASTA Record. They allow sensible defaults to
116+
be provided where one format does not supply enough info to satisfy
117+
the minimum requirements of another. Some kind of bean conversion
118+
system based on RDF would be suitable for this.
119+
120+
<!-- -->
121+
122+
- A set of tools for converting flat data (e.g. sequence strings,
123+
taxononmy strings) into BioJava-like objects (e.g. SymbolLists,
124+
NCBITaxon). These BioJava-like objects could then be used for more
125+
advanced applications.
126+
127+
<!-- -->
128+
129+
- A set of tools for manipulating the BioJava-like objects.
130+
131+
Action plan
132+
-----------
133+
134+
1. Please modify this page as you see fit in order to flesh out details
135+
and/or make new points.
136+
137+
<!-- -->
138+
139+
1. Tentative Singapore meeting to get the ball rolling on the final
140+
design and initial coding front.
141+

_wikis/BioJava3_Proposal.mediawiki

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
==Executive Summary==
2+
3+
It is suggested that development stop on the existing BioJava/BioJavaX/BioJava2 aggregation and start afresh as BioJava3.
4+
5+
==Reasoning==
6+
7+
* The existing code is disorganised, poorly commented, and hard to maintain due to the use of numerous different coding styles.
8+
* Existing documentation is poor and it would be hard to try and write any given the lack of code comments.
9+
* Unit testing is limited and hard to tack on to existing code.
10+
* The build scripts are out of date and the release process is hard.
11+
* There is demand for a number of smaller jars as opposed to one monolithic one.
12+
* We do not make use of any Java features since Java 4. Generics is the obvious one.
13+
* There is no support for changing file formats. It supports one version or another, but cannot handle both.
14+
* The only database support is for BioSQL, which uses Hibernate but not in a fully flexible manner (i.e. cannot connect to more than one db at a time).
15+
* It is sequence-focused. Users have moved on.
16+
17+
==Proposal==
18+
19+
* To start from scratch, creating a number of smaller jars as sub-projects within an umbrella BioJava3 project. Each jar would provide tools for a specific purpose. Additional jars would provide cross-purpose tools such as format converters or text-to-object interfaces.
20+
21+
* Although starting from scratch, much existing code could be reused or refactored to suit the new design.
22+
23+
* We would take full advantage of Java 6, including generics, annotations, the built-in property change support. Everything would be a bean - absolutely everything.
24+
25+
* We would aim to be fully J2EE compliant, with the majority of components fully reusable as a bean in any other application, just like Spring's components are.
26+
27+
* We would write a JUnit test for every single class, writing the test first then the class afterwards. We would also write documentation for every single class with additional full documentation for each separate jar.
28+
29+
* We would adhere rigidly to a common coding style and heavily comment the code.
30+
31+
* We should make it able to focus on any aspect the user requires and keep its efficiency, removing its dependency on everything being sequence-related.
32+
33+
* SymbolLists and Alphabets to be rethought as these are the most common stumbling block.
34+
35+
==Data structure==
36+
37+
* RecordSource is an object which provides data. It can represent a file, a directory of files, a database, a web search engine, etc. etc. etc.. It has a RecordFormat which reads/writes Records to/from the RecordSource. It provides an iterator over Records which match a given RecordSearch.
38+
39+
* A RecordFormat is version-specific to the format, as are the Record objects it produces.
40+
41+
* RecordSearch defines search criteria to be applied to a RecordSource (or group thereof). It provides an iterator which returns all the combined Records from all RecordSources the RecordSearch was applied to. It uses RDF or something similar to map fields between different kinds of Records and the search parameters.
42+
43+
* Record is a piece of data in any format, as a bean. It should be as lightweight as possible - lazyloading of all non-key data would be ideal. Each different kind of Record has an object structure suitably matched to the RecordFormat that produced it - e.g. Genbank Record objects should be structured internally in almost exactly the same way as the Genbank file. This allows minimal loss of information and maximum flexibility.
44+
45+
* RecordConverters convert Record objects between different formats, e.g. Genbank Record to FASTA Record. They allow sensible defaults to be provided where one format does not supply enough info to satisfy the minimum requirements of another. Some kind of bean conversion system based on RDF would be suitable for this.
46+
47+
* A set of tools for converting flat data (e.g. sequence strings, taxononmy strings) into BioJava-like objects (e.g. SymbolLists, NCBITaxon). These BioJava-like objects could then be used for more advanced applications.
48+
49+
* A set of tools for manipulating the BioJava-like objects.
50+
51+
==Action plan==
52+
53+
# Please modify this page as you see fit in order to flesh out details and/or make new points.
54+
55+
# Tentative Singapore meeting to get the ball rolling on the final design and initial coding front.

0 commit comments

Comments
 (0)