nltk.corpus.reader.TimitCorpusReader

class documentation

class TimitCorpusReader(CorpusReader): (source)

Constructor: TimitCorpusReader(root, encoding)

Reader for the TIMIT corpus (or any other corpus with the same file layout and use of file formats). The corpus root directory should contain the following files:

timitdic.txt: dictionary of standard transcriptions

spkrinfo.txt: table of speaker information

In addition, the root directory should contain one subdirectory for each speaker, containing three files for each utterance:

<utterance-id>.txt: text content of utterances

<utterance-id>.wrd: tokenized text content of utterances

<utterance-id>.phn: phonetic transcription of utterances

<utterance-id>.wav: utterance sound file

Method	`__init__`	Construct a new TIMIT corpus reader in the given directory. :param root: The root directory for this corpus.
Method	`audiodata`	Undocumented
Method	`fileids`	Return a list of file identifiers for the files that make up this corpus.
Method	`phone_times`	offset is represented as a number of 16kHz samples!
Method	`phone_trees`	Undocumented
Method	`phones`	Undocumented
Method	`play`	Play the given audio sample.
Method	`sent_times`	Undocumented
Method	`sentid`	Undocumented
Method	`sents`	Undocumented
Method	`spkrid`	Undocumented
Method	`spkrinfo`	No summary
Method	`spkrutteranceids`	speaker.
Method	`transcription_dict`	each word.
Method	`utterance`	Undocumented
Method	`utteranceids`	utterances in this corpus, or for the given speaker, dialect region, gender, sentence type, or sentence number, if specified.
Method	`wav`	Undocumented
Method	`word_times`	Undocumented
Method	`words`	Undocumented
Instance Variable	`speakers`	Undocumented
Method	`_utterance_fileids`	Undocumented
Constant	`_FILE_RE`	A regexp matching fileids that are used by this corpus reader.
Constant	`_UTTERANCE_RE`	Undocumented
Instance Variable	`_root`	The root directory for this corpus.
Instance Variable	`_speakerinfo`	Undocumented
Instance Variable	`_utterances`	A list of the utterance identifiers for all utterances in this corpus.

Inherited from CorpusReader:

Method	`__repr__`	Undocumented
Method	`abspath`	Return the absolute path for the given file.
Method	`abspaths`	Return a list of the absolute paths for all fileids in this corpus; or for the given list of fileids, if specified.
Method	`citation`	Return the contents of the corpus citation.bib file, if it exists.
Method	`encoding`	Return the unicode encoding for the given corpus file, if known. If the encoding is unknown, or if the given file should be processed using byte strings (str), then return None.
Method	`ensure_loaded`	Load this corpus (if it has not already been loaded). This is used by LazyCorpusLoader as a simple method that can be used to make sure a corpus is loaded -- e.g., in case a user wants to do help(some_corpus).
Method	`license`	Return the contents of the corpus LICENSE file, if it exists.
Method	`open`	Return an open stream that can be used to read the given file. If the file's encoding is not None, then the stream will automatically decode the file's contents into unicode.
Method	`readme`	Return the contents of the corpus README file, if it exists.
Class Variable	`root`	Undocumented
Method	`_get_root`	Undocumented
Instance Variable	`_encoding`	The default unicode encoding for the fileids that make up this corpus. If `encoding` is None, then the file contents are processed using byte strings.
Instance Variable	`_fileids`	A list of the relative paths for the fileids that make up this corpus.
Instance Variable	`_tagset`	Undocumented

def __init__(self, root, encoding='utf8'): (source) ¶

overrides nltk.corpus.reader.CorpusReader.__init__

Construct a new TIMIT corpus reader in the given directory. :param root: The root directory for this corpus.

def audiodata(self, utterance, start=0, end=None): (source) ¶

Undocumented

def fileids(self, filetype=None): (source) ¶

overrides nltk.corpus.reader.CorpusReader.fileids

Return a list of file identifiers for the files that make up this corpus.

Parameters
filetype	If specified, then `filetype` indicates that only the files that have the given type should be returned. Accepted values are: `txt`, `wrd`, `phn`, `wav`, or `metadata`,

def phone_times(self, utterances=None): (source) ¶

offset is represented as a number of 16kHz samples!

def phone_trees(self, utterances=None): (source) ¶

Undocumented

def phones(self, utterances=None): (source) ¶

Undocumented

def play(self, utterance, start=0, end=None): (source) ¶

Play the given audio sample.

Parameters
utterance	The utterance id of the sample to play
start	Undocumented
end	Undocumented

def sent_times(self, utterances=None): (source) ¶

Undocumented

def sentid(self, utterance): (source) ¶

Undocumented

def sents(self, utterances=None): (source) ¶

Undocumented

def spkrid(self, utterance): (source) ¶

Undocumented

def spkrinfo(self, speaker): (source) ¶

Returns
A dictionary mapping .. something.

def spkrutteranceids(self, speaker): (source) ¶

speaker.

Returns
A list of all utterances associated with a given

def transcription_dict(self): (source) ¶

each word.

Returns
A dictionary giving the 'standard' transcription for

def utterance(self, spkrid, sentid): (source) ¶

Undocumented

def utteranceids(self, dialect=None, sex=None, spkrid=None, sent_type=None, sentid=None): (source) ¶

utterances in this corpus, or for the given speaker, dialect region, gender, sentence type, or sentence number, if specified.

Returns
A list of the utterance identifiers for all

def wav(self, utterance, start=0, end=None): (source) ¶

Undocumented

def word_times(self, utterances=None): (source) ¶

Undocumented

def words(self, utterances=None): (source) ¶

Undocumented

speakers = (source) ¶

Undocumented

def _utterance_fileids(self, utterances, extension): (source) ¶

Undocumented

_FILE_RE = (source) ¶

A regexp matching fileids that are used by this corpus reader.

Value

'(\\w+-\\w+/\\w+\\.(phn|txt|wav|wrd))|' + 'timitdic\\.txt|spkrinfo\\.txt'

_UTTERANCE_RE: str = (source) ¶

Undocumented

Value

'\\w+-\\w+/\\w+\\.txt'

_root = (source) ¶

overrides nltk.corpus.reader.CorpusReader._root

The root directory for this corpus.

_speakerinfo: dict = (source) ¶

Undocumented

_utterances = (source) ¶

A list of the utterance identifiers for all utterances in this corpus.