Skip to content

Commit c053dc1

Browse files
committed
Add a recipe for a look-ahead generator to allow modifications during tree iteration.
1 parent b083124 commit c053dc1

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

doc/FAQ.txt

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ ElementTree_.
6363
7.2 Why doesn't ``findall()`` support full XPath expressions?
6464
7.3 How can I find out which namespace prefixes are used in a document?
6565
7.4 How can I specify a default namespace for XPath expressions?
66+
7.5 How can I modify the tree during iteration?
6667

6768

6869
The code examples below use the `'lxml.etree`` module:
@@ -1241,3 +1242,38 @@ How can I specify a default namespace for XPath expressions?
12411242
You can't. In XPath, there is no such thing as a default namespace. Just use
12421243
an arbitrary prefix and let the namespace dictionary of the XPath evaluators
12431244
map it to your namespace. See also the question above.
1245+
1246+
1247+
How can I modify the tree during iteration?
1248+
-------------------------------------------
1249+
1250+
lxml's iterators need to hold on to an element in the tree in order to remember
1251+
their current position. Therefore, tree modifications between two calls into the
1252+
iterator can lead to surprising results if such an element is deleted or moved
1253+
around, for example.
1254+
1255+
If your code risks modifying elements that the iterator might still need, and
1256+
you know that the number of elements returned by the iterator is small, then just
1257+
read them all into a list (or use ``.findall()``), and iterate over that list.
1258+
1259+
If the number of elements can be larger and you really want to process the tree
1260+
incrementally, you can often use a read-ahead generator to make the iterator
1261+
advance beyond the critical point before touching the tree structure.
1262+
1263+
For example:
1264+
1265+
.. sourcecode:: python
1266+
1267+
from itertools import islice
1268+
from collections import deque
1269+
1270+
def readahead(iterator, count=1):
1271+
iterator = iter(iterator) # allow iterables as well
1272+
elements = deque(islice(iterator, 0, count))
1273+
for element in iterator:
1274+
elements.append(element)
1275+
yield elements.popleft()
1276+
yield from elements
1277+
1278+
for element in readahead(root.iterfind("path/to/children")):
1279+
element.getparent().remove(element)

0 commit comments

Comments
 (0)