nltk.tokenize.texttiling

module documentation

(source)

Undocumented

Class	`TextTilingTokenizer`	Tokenize a document into topical sections using the TextTiling algorithm. This algorithm detects subtopic shifts based on the analysis of lexical co-occurrence patterns.
Class	`TokenSequence`	A token list with its original length and its index
Class	`TokenTableField`	A field in the token table holding parameters for each token, used later in the process
Function	`demo`	Undocumented
Function	`smooth`	smooth the data using a window with requested size.
Constant	`DEFAULT_SMOOTHING`	Undocumented
Variable	`BLOCK_COMPARISON`	Undocumented
Variable	`HC`	Undocumented
Variable	`LC`	Undocumented
Variable	`VOCABULARY_INTRODUCTION`	Undocumented

def demo(text=None): (source) ¶

Undocumented

def smooth(x, window_len=11, window='flat'): (source) ¶

smooth the data using a window with requested size.

This method is based on the convolution of a scaled window with the signal. The signal is prepared by introducing reflected copies of the signal (with the window size) in both ends so that transient parts are minimized in the beginning and end part of the output signal.

example:

t=linspace(-2,2,0.1)
x=sin(t)+randn(len(t))*0.1
y=smooth(x)

TODO: the window parameter could be the window itself if an array instead of a string

Parameters
x	the input signal
window_len	the dimension of the smoothing window; should be an odd integer
window	the type of window from 'flat', 'hanning', 'hamming', 'bartlett', 'blackman' flat window will produce a moving average smoothing.
Returns
the smoothed signal
See Also
numpy.hanning, numpy.hamming, numpy.bartlett, numpy.blackman, numpy.convolve, scipy.signal.lfilter

DEFAULT_SMOOTHING: list[int] = (source) ¶

Undocumented

Value

[0]

BLOCK_COMPARISON = (source) ¶

Undocumented

HC = (source) ¶

Undocumented

LC = (source) ¶

Undocumented

VOCABULARY_INTRODUCTION = (source) ¶

Undocumented