Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
Lua HTML Parser


* What is this

Lua HTML parser is an HTML parser written only in Lua.
It processes HTML input and produces a table which represents an HTML tree.

For example, if the following input is given:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<html>
<body>
<!-- CSS Stylesheet -->
<p>
  Click <a href="http://example.com/">here!</a>
<p>
  Hello
</p>
</body></html>

Then, the parser produces the following table:

{
  _tag = "#document",
  _attr = {},
  {
    {
       _st = true,
       _stText = "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd\">",
       "\n"
    }
    _tag = "html",
    _attr = {},
    _st = true,
    _stText = "",
    {
      _tag = "body",
      _attr = {},
      "\n",
      {
        {
          _st = true,
          _stText = "<!-- CSS Stylesheet -->",
          "\n"
        }
        _tag = "p",
        _attr = {},
        "\n  Click ",
        {
          _tag = "a",
          _attr = {href = "http://example.com/"}
          "here!",
        },
        "\n",
      },
      {
        _tag = "p",
        _attr = {},
        "\n  Hello\n",
      },
      "\n",
    }
  }
}


* Usage

Parsing file:

require "html"
html.parse(io.stdin)

Parsing string:

require "html"
html.parsestr("<html></html>")


* Author

T. Kobayashi
ether @nospam@ users.sourceforge.jp