@@ -18,8 +18,8 @@ lxml and Requests
1818-----------------
1919
2020`lxml <http://lxml.de/ >`_ is a pretty extensive library written for parsing
21- XML and HTML documents really fast. It even handles messed up tags. We will
22- also be using the `Requests <http://docs.python-requests.org/en/latest/ >`_
21+ XML and HTML documents very quickly, even handling messed up tags in the
22+ process. We will also be using the `Requests <http://docs.python-requests.org/en/latest/ >`_
2323module instead of the already built-in urllib2 module due to improvements in speed and
2424readability. You can easily install both using ``pip install lxml `` and
2525``pip install requests ``.
@@ -31,16 +31,16 @@ Let's start with the imports:
3131 from lxml import html
3232 import requests
3333
34- Next we will use ``requests.get `` to retrieve the web page with our data
35- and parse it using the ``html `` module and save the results in ``tree ``:
34+ Next we will use ``requests.get `` to retrieve the web page with our data,
35+ parse it using the ``html `` module and save the results in ``tree ``:
3636
3737.. code-block :: python
3838
3939 page = requests.get(' http://econpy.pythonanywhere.com/ex/001.html' )
4040 tree = html.fromstring(page.text)
4141
4242 ``tree `` now contains the whole HTML file in a nice tree structure which
43- we can go over two different ways: XPath and CSSSelect. In this example, I
43+ we can go over two different ways: XPath and CSSSelect. In this example, we
4444will focus on the former.
4545
4646XPath is a way of locating information in structured documents such as
@@ -96,6 +96,6 @@ a web page using lxml and Requests. We have it stored in memory as two
9696lists. Now we can do all sorts of cool stuff with it: we can analyze it
9797using Python or we can save it to a file and share it with the world.
9898
99- A cool idea to think about is modifying this script to iterate through
100- the rest of the pages of this example dataset or rewriting this
99+ Some more cool ideas to think about are modifying this script to iterate
100+ through the rest of the pages of this example dataset, or rewriting this
101101application to use threads for improved speed.
0 commit comments