nltk.collections.LazyZip

class documentation

class LazyZip(LazyMap): (source)

Known subclasses: nltk.collections.LazyEnumerate

Constructor: LazyZip(*lists)

A lazy sequence whose elements are tuples, each containing the i-th element from each of the argument sequences. The returned list is truncated in length to the length of the shortest argument sequence. The tuples are constructed lazily -- i.e., when you read a value from the list, LazyZip will calculate that value by forming a tuple from the i-th element of each of the argument sequences.

LazyZip is essentially a lazy version of the Python primitive function zip. In particular, an evaluated LazyZip is equivalent to a zip:

>>> from nltk.collections import LazyZip
>>> sequence1, sequence2 = [1, 2, 3], ['a', 'b', 'c']
>>> zip(sequence1, sequence2) # doctest: +SKIP
[(1, 'a'), (2, 'b'), (3, 'c')]
>>> list(LazyZip(sequence1, sequence2))
[(1, 'a'), (2, 'b'), (3, 'c')]
>>> sequences = [sequence1, sequence2, [6,7,8,9]]
>>> list(zip(*sequences)) == list(LazyZip(*sequences))
True

Lazy zips can be useful for conserving memory in cases where the argument sequences are particularly long.

A typical example of a use case for this class is combining long sequences of gold standard and predicted values in a classification or tagging task in order to calculate accuracy. By constructing tuples lazily and avoiding the creation of an additional long sequence, memory usage can be significantly reduced.

Method	`__init__`	No summary
Method	`__len__`	Return the number of tokens in the corpus file underlying this corpus view.
Method	`iterate_from`	Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number `start`. If `start>=len(self)`, then this iterator will generate no tokens.

Inherited from LazyMap:

Method	`__getitem__`	Return the i th token in the corpus file underlying this corpus view. Negative indices and spans are both supported.
Instance Variable	`_all_lazy`	Undocumented
Instance Variable	`_cache`	Undocumented
Instance Variable	`_cache_size`	Undocumented
Instance Variable	`_func`	Undocumented
Instance Variable	`_lists`	Undocumented

Inherited from AbstractLazySequence (via LazyMap):

Method	`__add__`	Return a list concatenating self with other.
Method	`__contains__`	Return true if this list contains `value`.
Method	`__eq__`	Undocumented
Method	`__hash__`	No summary
Method	`__iter__`	Return an iterator that generates the tokens in the corpus file underlying this corpus view.
Method	`__lt__`	Undocumented
Method	`__mul__`	Return a list concatenating self with itself `count` times.
Method	`__ne__`	Undocumented
Method	`__radd__`	Return a list concatenating other with self.
Method	`__repr__`	Return a string representation for this corpus view that is similar to a list's representation; but if it would be more than 60 characters long, it is truncated.
Method	`__rmul__`	Return a list concatenating self with itself `count` times.
Method	`count`	Return the number of times this list contains `value`.
Method	`index`	Return the index of the first occurrence of `value` in this list that is greater than or equal to `start` and less than `stop`. Negative start and stop values are treated like negative slice bounds -- i.e., they count from the end of the list.
Constant	`_MAX_REPR_SIZE`	Undocumented

def __init__(self, *lists): (source) ¶

overrides nltk.collections.LazyMap.__init__

overridden in nltk.collections.LazyEnumerate

Parameters
*lists:list(list)	the underlying lists

def __len__(self): (source) ¶

overrides nltk.collections.LazyMap.__len__

Return the number of tokens in the corpus file underlying this corpus view.

def iterate_from(self, index): (source) ¶

overrides nltk.collections.LazyMap.iterate_from

Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start. If start>=len(self), then this iterator will generate no tokens.