Class TokenizerImpl

java.lang.Object
com.sun.speech.freetts.en.TokenizerImpl
All Implemented Interfaces:
Tokenizer

public class TokenizerImpl extends Object implements Tokenizer
Implements the tokenizer interface. Breaks an input sequence of characters into a set of tokens.
  • Field Details

    • EOF

      public static final int EOF
      A constant indicating that the end of the stream has been read.
      See Also:
    • DEFAULT_WHITESPACE_SYMBOLS

      public static final String DEFAULT_WHITESPACE_SYMBOLS
      A string containing the default whitespace characters.
      See Also:
    • DEFAULT_SINGLE_CHAR_SYMBOLS

      public static final String DEFAULT_SINGLE_CHAR_SYMBOLS
      A string containing the default single characters.
      See Also:
    • DEFAULT_PREPUNCTUATION_SYMBOLS

      public static final String DEFAULT_PREPUNCTUATION_SYMBOLS
      A string containing the default pre-punctuation characters.
      See Also:
    • DEFAULT_POSTPUNCTUATION_SYMBOLS

      public static final String DEFAULT_POSTPUNCTUATION_SYMBOLS
      A string containing the default post-punctuation characters.
      See Also:
  • Constructor Details

    • TokenizerImpl

      public TokenizerImpl()
      Constructs a Tokenizer.
    • TokenizerImpl

      public TokenizerImpl(String string)
      Creates a tokenizer that will return tokens from the given string.
      Parameters:
      string - the string to tokenize
    • TokenizerImpl

      public TokenizerImpl(Reader file)
      Creates a tokenizer that will return tokens from the given file.
      Parameters:
      file - where to read the input from
  • Method Details

    • setWhitespaceSymbols

      public void setWhitespaceSymbols(String symbols)
      Sets the whitespace symbols of this Tokenizer to the given symbols.
      Specified by:
      setWhitespaceSymbols in interface Tokenizer
      Parameters:
      symbols - the whitespace symbols
    • setSingleCharSymbols

      public void setSingleCharSymbols(String symbols)
      Sets the single character symbols of this Tokenizer to the given symbols.
      Specified by:
      setSingleCharSymbols in interface Tokenizer
      Parameters:
      symbols - the single character symbols
    • setPrepunctuationSymbols

      public void setPrepunctuationSymbols(String symbols)
      Sets the prepunctuation symbols of this Tokenizer to the given symbols.
      Specified by:
      setPrepunctuationSymbols in interface Tokenizer
      Parameters:
      symbols - the prepunctuation symbols
    • setPostpunctuationSymbols

      public void setPostpunctuationSymbols(String symbols)
      Sets the postpunctuation symbols of this Tokenizer to the given symbols.
      Specified by:
      setPostpunctuationSymbols in interface Tokenizer
      Parameters:
      symbols - the postpunctuation symbols
    • setInputText

      public void setInputText(String inputString)
      Sets the text to tokenize.
      Specified by:
      setInputText in interface Tokenizer
      Parameters:
      inputString - the string to tokenize
    • setInputReader

      public void setInputReader(Reader reader)
      Sets the input reader
      Specified by:
      setInputReader in interface Tokenizer
      Parameters:
      reader - the input source
    • getNextToken

      public Token getNextToken()
      Returns the next token.
      Specified by:
      getNextToken in interface Tokenizer
      Returns:
      the next token if it exists, null if no more tokens
    • hasMoreTokens

      public boolean hasMoreTokens()
      Returns true if there are more tokens, false otherwise.
      Specified by:
      hasMoreTokens in interface Tokenizer
      Returns:
      true if there are more tokens false otherwise
    • hasErrors

      public boolean hasErrors()
      Returns true if there were errors while reading tokens
      Specified by:
      hasErrors in interface Tokenizer
      Returns:
      true if there were errors; false otherwise
    • getErrorDescription

      public String getErrorDescription()
      if hasErrors returns true, this will return a description of the error encountered, otherwise it will return null
      Specified by:
      getErrorDescription in interface Tokenizer
      Returns:
      a description of the last error that occurred.
    • isBreak

      public boolean isBreak()
      Determines if the current token should start a new sentence.
      Specified by:
      isBreak in interface Tokenizer
      Returns:
      true if a new sentence should be started