Package com.sun.speech.freetts.en
Class TokenizerImpl
java.lang.Object
com.sun.speech.freetts.en.TokenizerImpl
- All Implemented Interfaces:
Tokenizer
Implements the tokenizer interface. Breaks an input sequence of
characters into a set of tokens.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
A string containing the default post-punctuation characters.static final String
A string containing the default pre-punctuation characters.static final String
A string containing the default single characters.static final String
A string containing the default whitespace characters.static final int
A constant indicating that the end of the stream has been read. -
Constructor Summary
ConstructorsConstructorDescriptionConstructs a Tokenizer.TokenizerImpl
(Reader file) Creates a tokenizer that will return tokens from the given file.TokenizerImpl
(String string) Creates a tokenizer that will return tokens from the given string. -
Method Summary
Modifier and TypeMethodDescriptionif hasErrors returnstrue
, this will return a description of the error encountered, otherwise it will returnnull
Returns the next token.boolean
Returnstrue
if there were errors while reading tokensboolean
Returnstrue
if there are more tokens,false
otherwise.boolean
isBreak()
Determines if the current token should start a new sentence.void
setInputReader
(Reader reader) Sets the input readervoid
setInputText
(String inputString) Sets the text to tokenize.void
setPostpunctuationSymbols
(String symbols) Sets the postpunctuation symbols of this Tokenizer to the given symbols.void
setPrepunctuationSymbols
(String symbols) Sets the prepunctuation symbols of this Tokenizer to the given symbols.void
setSingleCharSymbols
(String symbols) Sets the single character symbols of this Tokenizer to the given symbols.void
setWhitespaceSymbols
(String symbols) Sets the whitespace symbols of this Tokenizer to the given symbols.
-
Field Details
-
EOF
public static final int EOFA constant indicating that the end of the stream has been read.- See Also:
-
DEFAULT_WHITESPACE_SYMBOLS
A string containing the default whitespace characters.- See Also:
-
DEFAULT_SINGLE_CHAR_SYMBOLS
A string containing the default single characters.- See Also:
-
DEFAULT_PREPUNCTUATION_SYMBOLS
A string containing the default pre-punctuation characters.- See Also:
-
DEFAULT_POSTPUNCTUATION_SYMBOLS
A string containing the default post-punctuation characters.- See Also:
-
-
Constructor Details
-
TokenizerImpl
public TokenizerImpl()Constructs a Tokenizer. -
TokenizerImpl
Creates a tokenizer that will return tokens from the given string.- Parameters:
string
- the string to tokenize
-
TokenizerImpl
Creates a tokenizer that will return tokens from the given file.- Parameters:
file
- where to read the input from
-
-
Method Details
-
setWhitespaceSymbols
Sets the whitespace symbols of this Tokenizer to the given symbols.- Specified by:
setWhitespaceSymbols
in interfaceTokenizer
- Parameters:
symbols
- the whitespace symbols
-
setSingleCharSymbols
Sets the single character symbols of this Tokenizer to the given symbols.- Specified by:
setSingleCharSymbols
in interfaceTokenizer
- Parameters:
symbols
- the single character symbols
-
setPrepunctuationSymbols
Sets the prepunctuation symbols of this Tokenizer to the given symbols.- Specified by:
setPrepunctuationSymbols
in interfaceTokenizer
- Parameters:
symbols
- the prepunctuation symbols
-
setPostpunctuationSymbols
Sets the postpunctuation symbols of this Tokenizer to the given symbols.- Specified by:
setPostpunctuationSymbols
in interfaceTokenizer
- Parameters:
symbols
- the postpunctuation symbols
-
setInputText
Sets the text to tokenize.- Specified by:
setInputText
in interfaceTokenizer
- Parameters:
inputString
- the string to tokenize
-
setInputReader
Sets the input reader- Specified by:
setInputReader
in interfaceTokenizer
- Parameters:
reader
- the input source
-
getNextToken
Returns the next token.- Specified by:
getNextToken
in interfaceTokenizer
- Returns:
- the next token if it exists,
null
if no more tokens
-
hasMoreTokens
public boolean hasMoreTokens()Returnstrue
if there are more tokens,false
otherwise.- Specified by:
hasMoreTokens
in interfaceTokenizer
- Returns:
true
if there are more tokensfalse
otherwise
-
hasErrors
public boolean hasErrors()Returnstrue
if there were errors while reading tokens -
getErrorDescription
if hasErrors returnstrue
, this will return a description of the error encountered, otherwise it will returnnull
- Specified by:
getErrorDescription
in interfaceTokenizer
- Returns:
- a description of the last error that occurred.
-
isBreak
public boolean isBreak()Determines if the current token should start a new sentence.
-