class documentation

Reader for the TIMIT corpus (or any other corpus with the same file layout and use of file formats). The corpus root directory should contain the following files:

  • timitdic.txt: dictionary of standard transcriptions
  • spkrinfo.txt: table of speaker information

In addition, the root directory should contain one subdirectory for each speaker, containing three files for each utterance:

  • <utterance-id>.txt: text content of utterances
  • <utterance-id>.wrd: tokenized text content of utterances
  • <utterance-id>.phn: phonetic transcription of utterances
  • <utterance-id>.wav: utterance sound file
Method __init__ Construct a new TIMIT corpus reader in the given directory. :param root: The root directory for this corpus.
Method audiodata Undocumented
Method fileids Return a list of file identifiers for the files that make up this corpus.
Method phone_times offset is represented as a number of 16kHz samples!
Method phone_trees Undocumented
Method phones Undocumented
Method play Play the given audio sample.
Method sent_times Undocumented
Method sentid Undocumented
Method sents Undocumented
Method spkrid Undocumented
Method spkrinfo No summary
Method spkrutteranceids speaker.
Method transcription_dict each word.
Method utterance Undocumented
Method utteranceids utterances in this corpus, or for the given speaker, dialect region, gender, sentence type, or sentence number, if specified.
Method wav Undocumented
Method word_times Undocumented
Method words Undocumented
Instance Variable speakers Undocumented
Method _utterance_fileids Undocumented
Constant _FILE_RE A regexp matching fileids that are used by this corpus reader.
Constant _UTTERANCE_RE Undocumented
Instance Variable _root The root directory for this corpus.
Instance Variable _speakerinfo Undocumented
Instance Variable _utterances A list of the utterance identifiers for all utterances in this corpus.

Inherited from CorpusReader:

Method __repr__ Undocumented
Method abspath Return the absolute path for the given file.
Method abspaths Return a list of the absolute paths for all fileids in this corpus; or for the given list of fileids, if specified.
Method citation Return the contents of the corpus citation.bib file, if it exists.
Method encoding Return the unicode encoding for the given corpus file, if known. If the encoding is unknown, or if the given file should be processed using byte strings (str), then return None.
Method ensure_loaded Load this corpus (if it has not already been loaded). This is used by LazyCorpusLoader as a simple method that can be used to make sure a corpus is loaded -- e.g., in case a user wants to do help(some_corpus).
Method license Return the contents of the corpus LICENSE file, if it exists.
Method open Return an open stream that can be used to read the given file. If the file's encoding is not None, then the stream will automatically decode the file's contents into unicode.
Method readme Return the contents of the corpus README file, if it exists.
Class Variable root Undocumented
Method _get_root Undocumented
Instance Variable _encoding The default unicode encoding for the fileids that make up this corpus. If encoding is None, then the file contents are processed using byte strings.
Instance Variable _fileids A list of the relative paths for the fileids that make up this corpus.
Instance Variable _tagset Undocumented
def __init__(self, root, encoding='utf8'): (source)

Construct a new TIMIT corpus reader in the given directory. :param root: The root directory for this corpus.

def audiodata(self, utterance, start=0, end=None): (source)

Undocumented

def fileids(self, filetype=None): (source)

Return a list of file identifiers for the files that make up this corpus.

Parameters
filetypeIf specified, then filetype indicates that only the files that have the given type should be returned. Accepted values are: txt, wrd, phn, wav, or metadata,
def phone_times(self, utterances=None): (source)

offset is represented as a number of 16kHz samples!

def phone_trees(self, utterances=None): (source)

Undocumented

def phones(self, utterances=None): (source)

Undocumented

def play(self, utterance, start=0, end=None): (source)

Play the given audio sample.

Parameters
utteranceThe utterance id of the sample to play
startUndocumented
endUndocumented
def sent_times(self, utterances=None): (source)

Undocumented

def sentid(self, utterance): (source)

Undocumented

def sents(self, utterances=None): (source)

Undocumented

def spkrid(self, utterance): (source)

Undocumented

def spkrinfo(self, speaker): (source)
Returns
A dictionary mapping .. something.
def spkrutteranceids(self, speaker): (source)

speaker.

Returns
A list of all utterances associated with a given
def transcription_dict(self): (source)

each word.

Returns
A dictionary giving the 'standard' transcription for
def utterance(self, spkrid, sentid): (source)

Undocumented

def utteranceids(self, dialect=None, sex=None, spkrid=None, sent_type=None, sentid=None): (source)

utterances in this corpus, or for the given speaker, dialect region, gender, sentence type, or sentence number, if specified.

Returns
A list of the utterance identifiers for all
def wav(self, utterance, start=0, end=None): (source)

Undocumented

def word_times(self, utterances=None): (source)

Undocumented

def words(self, utterances=None): (source)

Undocumented

speakers = (source)

Undocumented

def _utterance_fileids(self, utterances, extension): (source)

Undocumented

_FILE_RE = (source)

A regexp matching fileids that are used by this corpus reader.

Value
'(\\w+-\\w+/\\w+\\.(phn|txt|wav|wrd))|' + 'timitdic\\.txt|spkrinfo\\.txt'
_UTTERANCE_RE: str = (source)

Undocumented

Value
'\\w+-\\w+/\\w+\\.txt'

The root directory for this corpus.

_speakerinfo: dict = (source)

Undocumented

_utterances = (source)

A list of the utterance identifiers for all utterances in this corpus.