|
| 1 | +--- |
| 2 | +title: BioJava3 Proposal |
| 3 | +--- |
| 4 | + |
| 5 | +Executive Summary |
| 6 | +----------------- |
| 7 | + |
| 8 | +It is suggested that development stop on the existing |
| 9 | +BioJava/BioJavaX/BioJava2 aggregation and start afresh as BioJava3. |
| 10 | + |
| 11 | +Reasoning |
| 12 | +--------- |
| 13 | + |
| 14 | +- The existing code is disorganised, poorly commented, and hard to |
| 15 | + maintain due to the use of numerous different coding styles. |
| 16 | +- Existing documentation is poor and it would be hard to try and write |
| 17 | + any given the lack of code comments. |
| 18 | +- Unit testing is limited and hard to tack on to existing code. |
| 19 | +- The build scripts are out of date and the release process is hard. |
| 20 | +- There is demand for a number of smaller jars as opposed to one |
| 21 | + monolithic one. |
| 22 | +- We do not make use of any Java features since Java 4. Generics is |
| 23 | + the obvious one. |
| 24 | +- There is no support for changing file formats. It supports one |
| 25 | + version or another, but cannot handle both. |
| 26 | +- The only database support is for BioSQL, which uses Hibernate but |
| 27 | + not in a fully flexible manner (i.e. cannot connect to more than one |
| 28 | + db at a time). |
| 29 | +- It is sequence-focused. Users have moved on. |
| 30 | + |
| 31 | +Proposal |
| 32 | +-------- |
| 33 | + |
| 34 | +- To start from scratch, creating a number of smaller jars as |
| 35 | + sub-projects within an umbrella BioJava3 project. Each jar would |
| 36 | + provide tools for a specific purpose. Additional jars would provide |
| 37 | + cross-purpose tools such as format converters or text-to-object |
| 38 | + interfaces. |
| 39 | + |
| 40 | +<!-- --> |
| 41 | + |
| 42 | +- Although starting from scratch, much existing code could be reused |
| 43 | + or refactored to suit the new design. |
| 44 | + |
| 45 | +<!-- --> |
| 46 | + |
| 47 | +- We would take full advantage of Java 6, including generics, |
| 48 | + annotations, the built-in property change support. Everything would |
| 49 | + be a bean - absolutely everything. |
| 50 | + |
| 51 | +<!-- --> |
| 52 | + |
| 53 | +- We would aim to be fully J2EE compliant, with the majority of |
| 54 | + components fully reusable as a bean in any other application, just |
| 55 | + like Spring's components are. |
| 56 | + |
| 57 | +<!-- --> |
| 58 | + |
| 59 | +- We would write a JUnit test for every single class, writing the test |
| 60 | + first then the class afterwards. We would also write documentation |
| 61 | + for every single class with additional full documentation for each |
| 62 | + separate jar. |
| 63 | + |
| 64 | +<!-- --> |
| 65 | + |
| 66 | +- We would adhere rigidly to a common coding style and heavily comment |
| 67 | + the code. |
| 68 | + |
| 69 | +<!-- --> |
| 70 | + |
| 71 | +- We should make it able to focus on any aspect the user requires and |
| 72 | + keep its efficiency, removing its dependency on everything being |
| 73 | + sequence-related. |
| 74 | + |
| 75 | +<!-- --> |
| 76 | + |
| 77 | +- SymbolLists and Alphabets to be rethought as these are the most |
| 78 | + common stumbling block. |
| 79 | + |
| 80 | +Data structure |
| 81 | +-------------- |
| 82 | + |
| 83 | +- RecordSource is an object which provides data. It can represent a |
| 84 | + file, a directory of files, a database, a web search engine, etc. |
| 85 | + etc. etc.. It has a RecordFormat which reads/writes Records to/from |
| 86 | + the RecordSource. It provides an iterator over Records which match a |
| 87 | + given RecordSearch. |
| 88 | + |
| 89 | +<!-- --> |
| 90 | + |
| 91 | +- A RecordFormat is version-specific to the format, as are the Record |
| 92 | + objects it produces. |
| 93 | + |
| 94 | +<!-- --> |
| 95 | + |
| 96 | +- RecordSearch defines search criteria to be applied to a RecordSource |
| 97 | + (or group thereof). It provides an iterator which returns all the |
| 98 | + combined Records from all RecordSources the RecordSearch was applied |
| 99 | + to. It uses RDF or something similar to map fields between different |
| 100 | + kinds of Records and the search parameters. |
| 101 | + |
| 102 | +<!-- --> |
| 103 | + |
| 104 | +- Record is a piece of data in any format, as a bean. It should be as |
| 105 | + lightweight as possible - lazyloading of all non-key data would be |
| 106 | + ideal. Each different kind of Record has an object structure |
| 107 | + suitably matched to the RecordFormat that produced it - e.g. Genbank |
| 108 | + Record objects should be structured internally in almost exactly the |
| 109 | + same way as the Genbank file. This allows minimal loss of |
| 110 | + information and maximum flexibility. |
| 111 | + |
| 112 | +<!-- --> |
| 113 | + |
| 114 | +- RecordConverters convert Record objects between different formats, |
| 115 | + e.g. Genbank Record to FASTA Record. They allow sensible defaults to |
| 116 | + be provided where one format does not supply enough info to satisfy |
| 117 | + the minimum requirements of another. Some kind of bean conversion |
| 118 | + system based on RDF would be suitable for this. |
| 119 | + |
| 120 | +<!-- --> |
| 121 | + |
| 122 | +- A set of tools for converting flat data (e.g. sequence strings, |
| 123 | + taxononmy strings) into BioJava-like objects (e.g. SymbolLists, |
| 124 | + NCBITaxon). These BioJava-like objects could then be used for more |
| 125 | + advanced applications. |
| 126 | + |
| 127 | +<!-- --> |
| 128 | + |
| 129 | +- A set of tools for manipulating the BioJava-like objects. |
| 130 | + |
| 131 | +Action plan |
| 132 | +----------- |
| 133 | + |
| 134 | +1. Please modify this page as you see fit in order to flesh out details |
| 135 | + and/or make new points. |
| 136 | + |
| 137 | +<!-- --> |
| 138 | + |
| 139 | +1. Tentative Singapore meeting to get the ball rolling on the final |
| 140 | + design and initial coding front. |
| 141 | + |
0 commit comments