Improve BSD-Licensed Text Processing Tools
Student
Mentor
Project description
My proposal for the FreeBSD project is to optimize and/or complete the BSD-licensed text processing tools diff, diff3, sdiff, and mdocml. Much of what I am proposing is to complete the work done by Ben Fiedler in the 2010 GSoC
Approach to solving the problem
When completing the diff family, I will consider (1) compatibility with POSIX, (2) with the GNU variant that is used in FreeBSD at the moment and (3) performance.
Deliverables
- Feature-complete diff/diff3/sdiff.
- Add various features to mdocml so that it can replace groff in the FreeBSD tree.
Milestones
May 21 - June 17
- Implement missing features of mdocml, including legacy features.
- Testing of mdocml.
June 18 - July 1
- Complete diff
- Debugging and testing of diff
July 2 - July 18
- Mid-term evaluations.
- Complete sdiff
- Debugging and testing of sdiff
July 19 - August 5
- Complete diff3
- Debugging and testing of diff3
August 6 – August 12
- Thouroughly test and benchmark all utilities.
August 13 - August 20
- "Pencils down" period.
- Finish cleaning up code and do any testing that might be left.
- Write documentation.
Test Plan
To test functionality, I will make use of automated test scripts to test all of the functions I am implementing. I will also test functions that are already implemented to be completely sure my changes to the software have not effected the other functions of the particular utility. For every reported problem, I will add a new test case for regression testing later.
For testing compatibility with POSIX standards and the GNU utilities currently in FreeBSD, I will compare the test script outputs of both the BSD-licensed tools and the GNU tools to make sure they produce identically formatted outputs.
I will also benchmark the BSD-licensed tools against the GNU tools for testing performance.
The Code
https://socsvn.freebsd.org/socsvn/soc2012/jhagewood/
Status Report (continued from SoC 2010)
diff
Item |
Status |
Notes |
Missing --speed-large-files |
INCOMPLETE |
Argument is accepted, but makes no functional change |
Missing --ignore-file-name-case |
COMPLETED |
|
Missing --no-ignore-file-name-case |
COMPLETED |
|
Missing --strip-trailing-cr |
COMPLETED |
|
Missing --normal |
COMPLETED |
|
Missing --tabsize |
COMPLETE |
|
Missing --unidirectional-new-file |
COMPLETED |
|
Missing --from-file |
COMPLETED |
|
Missing --to-file |
COMPLETED |
|
Missing --help |
COMPLETED |
|
Missing --ignore-blank-lines |
INCOMPLETE |
|
Missing --ignore-tab-expansion |
IN PROGRESS |
|
Missing -v / --version |
COMPLETED |
|
Eliminate warnings |
COMPLETED |
|
Comment the code |
INCOMPLETE |
|
Check GNU compatibility |
COMPLETED |
All implemented features GNU compatible as of 6/17/2012 |
Check POSIX conformance |
COMPLETED |
|
Missing --line-format options |
IN PROGRESS |
regex support is available |
Missing --group-format options |
COMPLETE |
|
Missing --group-format |
INCOMPLETE |
|
Adapt source to FreeBSD style guidelines |
COMPLETE |
|
Tighter integration between diff utilities |
IN PROGRESS |
zdiff integration in diff. |
sdiff
Item |
Status |
Notes |
Fix -c99 build warnings/errors |
COMPLETE |
|
Combine diff-specific args and pipe to diff process |
COMPLETE |
|
Fix output indention |
COMPLETE |
|
Binary file support |
COMPLETE |
|
Adapt source to FreeBSD style guidelines |
COMPLETE |
|
.gz file support |
COMPLETE |
zsdiff |
diff3
Item |
Status |
Notes |
Replaced ksh script with sh |
COMPLETED |
|
-i flag |
COMPLETED |
|
-T flag |
COMPLETED |
|
-a flag |
COMPLETED |
|
--show-all option |
INCOMPLETE |
|
--easy-only |
INCOMPLETE |
|
--merge |
INCOMPLETE |
|
--label |
INCOMPLETE |
|
--strip-trailing-cr |
COMPLETED |
|
--diff-program |
INCOMPLETE |
|
--version |
COMPLETED |
|
--help |
COMPLETED |
|
Adapt source to FreeBSD style guidelines |
COMPLETED |
mdocml
June 15 UPDATE - For my first two weeks working on mdocml, I tried implementing these features as roff requests, but that was more difficult than I anticipated, and I used the next two weeks studying man/mdoc and began implementing .ns/.rs (no-space mode) and .ti (temporary indent) as man/mdoc macros. As of June 15th, I have not fully completed implementing these macros, but mdocml will still be a secondary focus during this project and I will be trying to finish these macros. Also, I was able to get a list of all man pages that will not compile under mandoc. The list is in my SVN repository.
Item |
Status |
Notes |
.ns (no-space mode) |
IN PROGRESS |
|
.rs (no-space mode off) |
IN PROGRESS |
|
.ti (temporary indent) |
IN PROGRESS |
|
.ta (tab settings) |
INCOMPLETE |
Implementing mdocml macros
man source files
man.h
Defines man's structs and enums, the relevant ones being the index for macros, enum mant.
man_macro.c
Begin parsing man macros, seperately for implicit-block macros or explicit-block macros, and creates a man node for each macros.
man_validate.c
Post-processing functions for several man macros.
man_html.c
Pre-processing functions for many man macros.
man_term.c
Pre-processing functions for some macros, and post-processing functions for macros that had pre-processing in man_html.
mdoc source files
mdoc.h
mdoc_macro.c
mdoc_validate.c
mdoc_html.c
mdoc_term.c
The structure of mdoc is the same as man.
Other files
term.c
html.c
Functions for formatted output, such as inserting new lines, horizontal/vertical spacing, functions for fonts, etc.
Benchmarks
diff
GNU diff
Minor pagefaults - 101
BSD diff
Minor pagefaults - 93
sdiff
GNU sdiff
Minor pagefaults - 181
BSD sdiff
Minor pagefaults - 232