class documentation

class Paice(object): (source)

Constructor: Paice(lemmas, stems)

View In Hierarchy

Class for storing lemmas, stems and evaluation metrics.

Method __init__ or lists of words corresponding to that lemma. :param stems: A dictionary where keys are stems and values are sets or lists of words corresponding to that stem. :type lemmas: dict(str): list(str) :type stems: dict(str): set(str)...
Method __str__ Undocumented
Method update Update statistics after lemmas and stems have been set.
Instance Variable coords Undocumented
Instance Variable errt Undocumented
Instance Variable gdmt Undocumented
Instance Variable gdnt Undocumented
Instance Variable gumt Undocumented
Instance Variable gwmt Undocumented
Instance Variable lemmas Undocumented
Instance Variable oi Undocumented
Instance Variable stems Undocumented
Instance Variable sw Undocumented
Instance Variable ui Undocumented
Method _errt Count Error-Rate Relative to Truncation (ERRT).
Method _get_truncation_coordinates Count (UI, OI) pairs for truncation points until we find the segment where (ui, oi) crosses the truncation line.
Method _get_truncation_indexes Count (UI, OI) when stemming is done by truncating words at 'cutlength'.
def __init__(self, lemmas, stems): (source)

or lists of words corresponding to that lemma. :param stems: A dictionary where keys are stems and values are sets or lists of words corresponding to that stem. :type lemmas: dict(str): list(str) :type stems: dict(str): set(str)

Parameters
lemmasA dictionary where keys are lemmas and values are sets
stemsUndocumented
def __str__(self): (source)

Undocumented

def update(self): (source)

Update statistics after lemmas and stems have been set.

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

def _errt(self): (source)

Count Error-Rate Relative to Truncation (ERRT).

the length of the line from origo to the point defined by the same line when extended until the truncation line. :rtype: float

Returns
ERRT, length of the line from origo to (UI, OI) divided by
def _get_truncation_coordinates(self, cutlength=0): (source)

Count (UI, OI) pairs for truncation points until we find the segment where (ui, oi) crosses the truncation line.

coordinates gotten by stemming at this length. Useful for speeding up the calculations when you know the approximate location of the intersection. :type cutlength: int :return: List of coordinate pairs that define the truncation line :rtype: list(tuple(float, float))

Parameters
cutlengthOptional parameter to start counting from (ui, oi)
def _get_truncation_indexes(self, words, cutlength): (source)

Count (UI, OI) when stemming is done by truncating words at 'cutlength'.

Parameters
words:set(str) or list(str)Words used for the analysis
cutlength:intWords are stemmed by cutting them at this length
Returns
tuple(int, int)Understemming and overstemming indexes