Skip to main content
Log in

AnO(ND) difference algorithm and its variations

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

The problems of finding a longest common subsequence of two sequencesA andB and a shortest edit script for transformingA intoB have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simpleO(ND) time and space algorithm is developed whereN is the sum of the lengths ofA andB andD is the size of the minimum edit script forA andB. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to haveO(N+D 2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires onlyO(N) space, and the use of suffix trees leads to anO(N logN+D 2) time variation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A. V. Aho, D. S. Hirschberg, and J. D. Ullman. Bounds on the complexity of the longest common subsequence problem.J. ACM,23, 1 (1976), 1–12.

    Article  MATH  MathSciNet  Google Scholar 

  2. A. V. Aho, J. E. Hopcroft, and J. D. Ullman.Data Structures and Algorithms. Addison-Wesley: Reading, MA, 1983, pp. 203–208.

    MATH  Google Scholar 

  3. E. W. Dijkstra. A note on two problems in connexion with graphs.Numer. Math. 1 (1959), 269–271.

    Article  MATH  MathSciNet  Google Scholar 

  4. J. Gosling. A redisplay algorithm.Proceedings ACM SIGPLAN/SIGOA Symposium on Text Manipulation, 1981, pp.

  5. P. A. V. Hall and G. R. Dowling. Approximate string matching.Comput. Surv. 12, 4 (1980), 381–402.

    Article  MathSciNet  Google Scholar 

  6. D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors.SIAM J. Comput.,13, 2 (1984), 338–355.

    Article  MATH  MathSciNet  Google Scholar 

  7. D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences.Commun. ACM,18, 6 (1975), 341–343.

    Article  MATH  MathSciNet  Google Scholar 

  8. D. S. Hirschberg. Algorithms for the longest common subsequence problem.J. ACM 24, 4 (1977), 664–675.

    Article  MATH  MathSciNet  Google Scholar 

  9. D. S. Hirschberg. An information-theoretic lower bound for the longest common subsequence problem.Inform. Process. Lett. 7, 1 (1978), 40–41.

    Article  MATH  MathSciNet  Google Scholar 

  10. J. W. Hunt and M. D. McIlroy. An algorithm for differential file comparison. Computing Science Technical Report 41, Bell Laboratories (1975).

  11. J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest common subsequences.Commun. ACM,20, 5 (1977), 350–353.

    Article  MATH  MathSciNet  Google Scholar 

  12. D. E. Knuth.The Art of Computer Programming, Vol. 3: Sorting and Searching. Addison-Wesley: Reading, MA, 1983, pp. 490–493.

    Google Scholar 

  13. W. J. Masek and M. S. Paterson. A faster algorithm for computing string edit distances.J. Comput. System Sci. 20, 1 (1980) 18–31.

    Article  MATH  MathSciNet  Google Scholar 

  14. E. M. McCreight. A space-economical suffix tree construction algorithm.J. ACM 23, 2 (1976), 262–272.

    Article  MATH  MathSciNet  Google Scholar 

  15. W. Miller and E. W. Myers. A file comparison program.Software—Practice and Experience,15, 11 (1985), 1025–1040.

    Article  Google Scholar 

  16. N. Nakatsu, Y. Kambayashi and S. Yajima. A longest common subsequence algorithm suitable for similar text strings.Acta Inform.,18 (1982), 171–179.

    Article  MATH  MathSciNet  Google Scholar 

  17. M. J. Rochkind. The source code control system.IEEE Trans. Software Engrg.,1, 4 (1975), 364–370.

    Google Scholar 

  18. D. Sankoff and J. B. Kruskal.Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley: Reading, MA, 1983.

    Google Scholar 

  19. W. Tichy. The string-to-string correction problem with block moves.ACM Trans. Comput. Systems,2, (1984), 309–321.

    Article  Google Scholar 

  20. R. A. Wagner and M. J. Fischer. The string-to-string correction problem.J. ACM 21, 1 (1974), 168–173.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by David Dobkin.

This work was supported in part by the National Science Foundation under Grant MCS82-10096.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Myers, E.W. AnO(ND) difference algorithm and its variations. Algorithmica 1, 251–266 (1986). https://doi.org/10.1007/BF01840446

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01840446

Key words

Navigation