DOM Distiller provides a better reading experience for articles and article-like web pages by extracting the core text and stripping non-essential from the page.
Projects and features powered by DOM Distiller:
DOM Distiller is loosely based on “Boilerpipe” by Christian Kohlschütter, Peter Fankhauser and Wolfgang Nejdl.
Bugs and feature requests are tracked in Chromium's issue tracker, crbug. DOM Distiller bugs are filed under component:UI>Browser>ReaderMode
.
Examples of bugs that should be reported:
Reader Mode has launched on Android and should be available on any up-to-date version of Chrome. Simply visit a non-mobile-friendly article and tap on the “Show simplified view” infobar when it appears at the bottom of the screen. You may need to first enable the feature via accessibility settings.
Reader Mode for Chrome on desktop is still in development. As of M80, an experimental preview of the feature can be activated by following these steps:
You must install the build dependencies before building for the first time. The following are required on all platforms:
Download and install Google Chrome.
Install the git hooks:
./create-hook-symlinks
Change to the directory where you want the code.
Clone this git repo:
git clone https://chromium.googlesource.com/chromium/dom-distiller
The code will be located inside the newly created dom-distiller
folder.
Install the dependencies by entering the dom-distiller
folder and running:
sudo ./install-build-deps.sh
Install JDK 7 with your organization's software management tool, or download it from Oracle.
Install Homebrew.
Install ant
and python
using Homebrew:
brew install ant python
Install the protocol buffer compiler with Python bindings:
brew install protobuf --with-python
Create a folder named buildtools
inside your DOM Distiller checkout.
Download ChromeDriver.
Unzip the chromedriver_mac32.zip
and ensure the binary ends up in your buildtools
folder.
Install the PyPI package management tool pip
:
sudo easy_install pip
Install selenium
using pip
:
pip install --user selenium
This guide sometimes references a tool called xvfb
, specifically when running shell commands with xvfb-run
. You can remove that part of the command when developing on Mac OS X. For example, xvfb-run echo
becomes echo
.
Development is supported only on the above operating systems. We recommend using Vagrant for development on other systems, such as Windows or Red Hat Linux.
Install Vagrant on your system. Version 1.7.2 or higher is recommended.
Launch the Vagrant VM instance
vagrant up
SSH to the VM
vagrant ssh
DOM Distiller uses Chromium's collaboration tools. Code reviews are hosted on Chromium Gerrit, and you must install depot_tools
by following the guide at Chrome infrastructure documentation for depot_tools.
You can run git cl format
to update your code to follow DOM Distiller's code formatting guidelines. You must add the following symbolic links to the buildtools
folder in your checkout for the command to work correctly:
clang_format
→ /path/to/chromium/src/buildtools/clang_format/
linux64
→ /path/to/chromium/src/buildtools/linux64/
mac
→ /path/to/chromium/mac/buildtools/linux64/
ant
is the tool we use to build. All available targets can be listed using ant -p
.
Some important targets that you are likely to use while working on the project:
ant test
: Run all tests.ant test -Dtest.filter=$FILTER_PATTERN
: Run a subset of tests. For example, *.FilterTest.*:*Foo*-*Bar*
would run all tests containing .FilterTest.
and Foo
, but not those with Bar
.ant gwtc
: Compile .class + .java files to JavaScript. Standalone JavaScript is available at war/domdistiller/domdistiller.nocache.js
.ant gwtc.jstests
: Create a standalone JavaScript for the tests.ant extractjs
: Create standalone JavaScript from output of ant gwtc. The compiled JavaScript file is available at out/domdistiller.js
.ant extractjs.jstests
: Create a standalone JavaScript for the tests.ant package
: Copy the main build artifacts into the out/package
folder, typically the extracted JS and protocol buffer files.You can use most regular git
commands during development and git cl
for collaboration.
Create a new local branch and commit the changes you want to make. When you are done, please run git cl format
to standardize the code format before uploading.
Checkout your local branch with the changes you want to have reviewed and run git cl upload
to create a change list (CL) at Chromium Gerrit.
The first time you do this, you will have to provide a username and password.
machine code.google.com login
line to your ~/.netrc
file.Once your reviewer approves your changes, you can click “Submit to CQ” to land your changes.
Verify that the following environment variables are set:
export CHROME_SRC=/path/to/chromium/src export DOM_DISTILLER_DIR=/path/to/dom-distiller
Run ant package
and copy the generated files into Chrome. You can use this bash function to automate the process:
roll-distiller () { ( (cd $DOM_DISTILLER_DIR && ant package) && \ rm -rf $CHROME_SRC/third_party/dom_distiller_js/dist/* && \ cp -rf $DOM_DISTILLER_DIR/out/package/* $CHROME_SRC/third_party/dom_distiller_js/dist/ && \ touch $CHROME_SRC/components/resources/dom_distiller_resources.grdp ) }
From $CHROME_SRC
run GN to setup ninja build files using
gn args out/Debug
Build Chrome with the chrome
target and run it with DOM Distiller enabled:
autoninja -C out/Debug chrome && out/Debug/chrome --enable-dom-distiller
You can distill web pages in any of the following ways:
Toggle distilled page contents
.To have a unique user profile every time you run Chrome, you can add --user-data-dir=/tmp/$(mktemp -d)
as a command line parameter. On Mac OS X, you can instead write --user-data-dir=$(mktemp -d 2>/dev/null || mktemp -d -t 'chromeprofile')
.
Build the components_browsertests
target:
autoninja -C out/Debug components_browsertests
Run the components_browsertests
binary to execute the tests:
out/Debug/components_browsertests
Some additional tips for running tests:
Prefix the command with xvfb-run
to avoid pop-up windows:
xvfb-run out/Debug/components_browsertests
Select which tests to run using --gtest_filter=<pattern>
:
out/Debug/components_browsertests --gtest_filter=\*Distiller\*
Run tests as isolates by building components_browsertests_run
and executing them with the swarming tool:
autoninja -C out/Debug components_browsertests_run python tools/swarming_client/isolate.py run -s out/Debug/components_browsertests.isolated
Additional documentation about testing in Chromium can be found on Google Test's GitHub page.
To extract the content from a web page directly, you can run
xvfb-run out/Debug/components_browsertests \ --gtest_filter='*MANUAL_ExtractUrl' \ --run-manual \ --test-tiny-timeout=600000 \ --output-file=./extract.out \ --url=http://www.example.com \ > ./extract.log 2>&1
extract.out
has the extracted HTML, extract.log
has the console logging.
If you need more logging, you can add the following arguments to the command:
--vmodule=*distiller*=2
--debug-level=99
If this is something you often do, you can put the following function in a bash file you include (for example ~/.bashrc
) and use it for iterative development:
distill() { ( roll-distiller && \ autoninja -C out/Debug components_browsertests && xvfb-run out/Debug/components_browsertests \ --gtest_filter='*MANUAL_ExtractUrl' \ --run-manual \ --test-tiny-timeout=600000 \ --output-file=./extract.out \ --url=$1 \ > ./extract.log 2>&1 ) }
Usage when running from $CHROME_SRC
:
distill http://example.com/article.html
You can use the Chrome Developer Tools to debug DOM Distiller:
Update the test JavaScript by running ant extractjs.jstests
or ant test
.
Open war/test.html
in Chrome desktop
Open the Console
panel in Developer Tools (Ctrl-Shift-J). On Mac OS X you can use ⌥-⌘-I (uppercase I
) as the shortcut.
Run all tests by calling:
org.chromium.distiller.JsTestEntry.run()
To run only a subset of tests, you can use a regular expression that matches a single test or multiple tests:
org.chromium.distiller.JsTestEntry.runWithFilter('MyTestClass.testSomething')
The Sources
panel contains both the extracted JavaScript and the Java source files, as long as you haven't disabled JavaScript source maps in Developer Tools. You can set breakpoints in the Java source files to stop the code execution and examine a variety of useful information, such as variable values.
When a test fails, you will see several stack traces. One of these contains clickable links to the corresponding Java source files for the stack frames.
ant package
generates an unpacked Chrome extension under out/extension
, which you can add to the browser with the following steps:
chrome://extensions
out/extension
folder.The extension currently supports profiling the extraction code.
It also adds a panel to the Developer Tools which you can use to trigger extraction on the inspected page. This can be used to trigger and profile extraction on a mobile device which you are currently inspecting using chrome://inspect
.
Use LogUtil.logToConsole()
to log information for debugging. Where the log output is stored varies with how DOM Distiller is run:
ant test
: Terminal. To get more verbose output, use ant test -Dtest.debug_level=99
.$CHROME_LOG_FILE
. A release mode build of Chrome will log all JavaScript INFO
there if you start Chrome with --enable-logging
. You can add --enable-logging=stderr
to have the log go to stderr instead of a file.extract.log
above.For an example, see $DOM_DISTILLER_DIR/java/org/chromium/distiller/PagingLinksFinder.java
.
Use ant package '-Dgwt.custom.args=-style PRETTY'
for easier JavaScript debugging.
Device
and reload the page. Verify that you get what you expect. For example a Nexus 4 might get a mobile site, whereas Nexus 7 might get the desktop site.UA
field to the clipboard. This field does require reload after changing device, but it is good practice to verify that you get what you expect.--user-agent="$USER_AGENT_FROM_CLIPBOARD"
. Remember to also add --enable-dom-distiller
.Toggle distilled page contents
from the menu to display the distilled page.If you want you can copy some of these User-Agent aliases into normal bash aliases for easy access later. For example, Nexus 4 would be:
--user-agent="Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19"