node-boilerpipe

A node.js wrapper for Boilerpipe, an excellent Java library for boilerplate removal and fulltext extraction from HTML pages.

Installation

node-boilerpipe depends on Boilerpipe v1.2.0 or higher.

WARNING: Don't forget to set JAVA variable referred to node-java.

Via npm:

$ npm install boilerpipe

Source code project

$ mvn compile
$ mvn package

Usage

Load in the module

  var Boilerpipe = require('boilerpipe');

Create a new instance

The constructor takes a extractor, being one of the available boilerpipe extractor types:

DefaultExtractor
ArticleExtractor
ArticleSentencesExtractor
KeepEverythingExtractor
KeepEverythingWithMinKWordsExtractor
LargestContentExtractor
NumWordsRulesExtractor
CanolaExtractor

If no extractor is passed the DefaultExtractor will be used by default. Additional keyword arguments are either html for HTML text or url.

  var boilerpipe = new Boilerpipe();

  var boilerpipe = new Boilerpipe({
    extractor: Boilerpipe.Extractor.Canola
  });

  var boilerpipe = new Boilerpipe({
    extractor: Boilerpipe.Extractor.Article,
    url: 'http://...'
  });

  var boilerpipe = new Boilerpipe({
    extractor: Boilerpipe.Extractor.ArticleSentences,
    html: '<html>...</html>'
  }, function(err) {
    ...
  });

Set URL or HTML

If you set both URL and HTML then only URL will work for you. HTML will be ignored at this case.

  boilerpipe.setUrl('http://...');

  boilerpipe.setHtml('<html>...</html>');

Get text, html and images

  boilerpipe.getText(function(err, text) {
    ...
  });

  boilerpipe.getHtml(function(err, html) {
    ...
  });

  boilerpipe.getImages(function(err, images) {
    ...
  });

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
boilerpipe-jar		boilerpipe-jar
boilerpipe		boilerpipe
coffee_version		coffee_version
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
boilerpipe.js		boilerpipe.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

node-boilerpipe

Installation

Source code project

Usage

Load in the module

Create a new instance

Set URL or HTML

Get text, html and images

License

About

Releases

Packages

Languages

License

carson0321/node-boilerpipe

Folders and files

Latest commit

History

Repository files navigation

node-boilerpipe

Installation

Source code project

Usage

Load in the module

Create a new instance

Set URL or HTML

Get text, html and images

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages