nltk.tokenize.punkt.PunktToken

class documentation

class PunktToken(object): (source)

Constructor: PunktToken(tok, **params)

Stores a token of text with annotations produced during sentence boundary detection.

Method	`__init__`	Undocumented
Method	`__repr__`	A string representation of the token that can reproduce it with eval(), which lists all the token's non-default annotations.
Method	`__str__`	A string representation akin to that used by Kiss and Strunk.
Class Variable	`__slots__`	Undocumented
Instance Variable	`period_final`	Undocumented
Instance Variable	`tok`	Undocumented
Instance Variable	`type`	Undocumented
Property	`first_case`	Undocumented
Property	`first_lower`	True if the token's first character is lowercase.
Property	`first_upper`	True if the token's first character is uppercase.
Property	`is_alpha`	True if the token text is all alphabetic.
Property	`is_ellipsis`	True if the token text is that of an ellipsis.
Property	`is_initial`	True if the token text is that of an initial.
Property	`is_non_punct`	True if the token is either a number or is alphabetic.
Property	`is_number`	True if the token text is that of a number.
Property	`type_no_period`	The type with its final period removed if it has one.
Property	`type_no_sentperiod`	The type with its final period removed if it is marked as a sentence break.
Method	`_get_type`	Returns a case-normalized representation of the token.
Constant	`_RE_ALPHA`	Undocumented
Constant	`_RE_ELLIPSIS`	Undocumented
Constant	`_RE_INITIAL`	Undocumented
Constant	`_RE_NUMERIC`	Undocumented
Class Variable	`_properties`	Undocumented

def __init__(self, tok, **params): (source) ¶

Undocumented

def __repr__(self): (source) ¶

A string representation of the token that can reproduce it with eval(), which lists all the token's non-default annotations.

def __str__(self): (source) ¶

A string representation akin to that used by Kiss and Strunk.

__slots__ = (source) ¶

Undocumented

period_final = (source) ¶

Undocumented

tok = (source) ¶

Undocumented

type = (source) ¶

Undocumented

@property
first_case = (source) ¶

Undocumented

@property
first_lower = (source) ¶

True if the token's first character is lowercase.

@property
first_upper = (source) ¶

True if the token's first character is uppercase.

@property
is_alpha = (source) ¶

True if the token text is all alphabetic.

@property
is_ellipsis = (source) ¶

True if the token text is that of an ellipsis.

@property
is_initial = (source) ¶

True if the token text is that of an initial.

@property
is_non_punct = (source) ¶

True if the token is either a number or is alphabetic.

@property
is_number = (source) ¶

True if the token text is that of a number.

@property
type_no_period = (source) ¶

The type with its final period removed if it has one.

@property
type_no_sentperiod = (source) ¶

The type with its final period removed if it is marked as a sentence break.

def _get_type(self, tok): (source) ¶

Returns a case-normalized representation of the token.

_RE_ALPHA = (source) ¶

Undocumented

Value

re.compile(r'[^\W\d]+$',
           re.UNICODE)

_RE_ELLIPSIS = (source) ¶

Undocumented

Value

re.compile(r'\.\.+$')

_RE_INITIAL = (source) ¶

Undocumented

Value

re.compile(r'[^\W\d]\.$',
           re.UNICODE)

_RE_NUMERIC = (source) ¶

Undocumented

Value

re.compile(r'^-?[\.,]?\d[\d,\.-]*\.?$')

_properties: list[str] = (source) ¶

Undocumented