Class TextTokenizer

  • All Implemented Interfaces:
    java.lang.Iterable<javolution.text.Text>, java.util.Enumeration<javolution.text.Text>, java.util.Iterator<javolution.text.Text>, javolution.lang.Realtime, javolution.lang.Reusable

    public final class TextTokenizer
    extends java.lang.Object
    implements java.util.Enumeration<javolution.text.Text>, java.util.Iterator<javolution.text.Text>, java.lang.Iterable<javolution.text.Text>, javolution.lang.Realtime, javolution.lang.Reusable
    The text tokenizer class allows an application to break a Text object into tokens. The tokenization method is much simpler than the one used by the StreamTokenizer class. The TextTokenizer methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.

    The set of delimiters (the characters that separate tokens) may be specified either at creation time or on a per-token basis.

    An instance of TextTokenizer behaves in one of two ways, depending on whether it was created with the returnDelims flag having the value true or false:

    • If the flag is false, delimiter characters serve to separate tokens. A token is a maximal sequence of consecutive characters that are not delimiters.
    • If the flag is true, delimiter characters are themselves considered to be tokens. A token is thus either one delimiter character, or a maximal sequence of consecutive characters that are not delimiters.

    A TextTokenizer object internally maintains a current position within the text to be tokenized. Some operations advance this current position past the characters processed.

    A token is returned by taking a subtext of the text that was used to create the TextTokenizer object.

    The following is one example of the use of the tokenizer. The code:

         TextTokenizer tt = TextTokenizer.valueOf("this is a test");
         while (tt.hasMoreTokens()) {
             System.out.println(tt.nextToken());
         }
     

    prints the following output:

         this
         is
         a
         test
     

    TextTokenizer is heavily based on java.util.StringTokenizer. However, there are some improvements and additional methods and capabilities.

    Modified by: Joseph A. Huwaldt

    Version:
    February 17, 2025
    Author:
    Joseph A. Huwaldt Date: March 12, 2009
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int countTokens()
      Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception.
      int countTokens​(java.lang.CharSequence delims)
      Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception using the given set of delimiters.
      boolean getHonorQuotes()
      Returns true if this tokenizer honors quoted text (counts it as a single token).
      boolean hasMoreElements()
      Returns the same value as the hasMoreTokens method.
      boolean hasMoreTokens()
      Tests if there are more tokens available from this tokenizer's text.
      boolean hasNext()
      Returns the same value as the hasMoreTokens() method.
      java.util.Iterator<javolution.text.Text> iterator()
      Returns an iterator over the tokens returned by this tokenizer.
      static void main​(java.lang.String[] args)
      Testing code for this class.
      static TextTokenizer newInstance()
      Return a text tokenizer with an initially empty string of text and with no delimiters.
      javolution.text.Text next()
      Returns the same value as the nextToken() method.
      javolution.text.Text nextElement()
      Returns the same value as the nextToken method.
      javolution.text.Text nextToken()
      Returns the next token from this text tokenizer.
      javolution.text.Text nextToken​(java.lang.CharSequence delim)
      Returns the next token in this text tokenizer's text.
      static void recycle​(TextTokenizer instance)
      Recycles a TextTokenizer instance immediately (on the stack when executing in a StackContext).
      void remove()
      This implementation always throws UnsupportedOperationException.
      void reset()
      Resets the internal state of this object to its default values.
      javolution.text.Text restOfText()
      Retrieves the rest of the text as a single token.
      void setDelimiters​(java.lang.CharSequence delim)
      Set the delimiters for this TextTokenizer.
      void setHonorQuotes​(boolean honorQuotes)
      Sets whether or not this tokenizer recognizes quoted text using the specified quote character.
      void setQuoteChar​(char quote)
      Set the character to use as the "quote" character.
      void setReturnEmptyTokens​(boolean returnEmptyTokens)
      Set whether empty tokens should be returned from this point in in the tokenizing process onward.
      void setText​(java.lang.CharSequence text)
      Set the text to be tokenized in this TextTokenizer.
      javolution.text.Text toText()
      Returns the same value as the nextToken() method.
      static TextTokenizer valueOf​(java.lang.CharSequence text)
      Return a text tokenizer for the specified character sequence.
      static TextTokenizer valueOf​(java.lang.CharSequence text, java.lang.CharSequence delim)
      Return a text tokenizer for the specified character sequence.
      static TextTokenizer valueOf​(java.lang.CharSequence text, java.lang.CharSequence delim, boolean returnDelims)
      Return a text tokenizer for the specified character sequence.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.util.Enumeration

        asIterator
      • Methods inherited from interface java.lang.Iterable

        forEach, spliterator
      • Methods inherited from interface java.util.Iterator

        forEachRemaining
    • Method Detail

      • reset

        public void reset()
        Resets the internal state of this object to its default values.
        Specified by:
        reset in interface javolution.lang.Reusable
      • valueOf

        public static TextTokenizer valueOf​(java.lang.CharSequence text,
                                            java.lang.CharSequence delim,
                                            boolean returnDelims)
        Return a text tokenizer for the specified character sequence. All characters in the delim argument are the delimiters for separating tokens.

        If the returnDelims flag is true, then the delimiter characters are also returned as tokens. Each delimiter is returned as a string of length one. If the flag is false, the delimiter characters are skipped and only serve as separators between tokens.

        Note that if delim is null, this constructor does not throw an exception. However, trying to invoke other methods on the resulting TextTokenizer may result in a NullPointerException.

        Parameters:
        text - the text to be parsed.
        delim - the delimiters.
        returnDelims - flag indicating whether to return the delimiters as tokens.
      • valueOf

        public static TextTokenizer valueOf​(java.lang.CharSequence text,
                                            java.lang.CharSequence delim)
        Return a text tokenizer for the specified character sequence. The characters in the delim argument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.
        Parameters:
        text - the text to be parsed.
        delim - the delimiters.
      • valueOf

        public static TextTokenizer valueOf​(java.lang.CharSequence text)
        Return a text tokenizer for the specified character sequence. The tokenizer uses the default delimiter set, which is " \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character. Delimiter characters themselves will not be treated as tokens.
        Parameters:
        text - the text to be parsed.
      • setText

        public void setText​(java.lang.CharSequence text)
        Set the text to be tokenized in this TextTokenizer.

        This is useful when for TextTokenizer re-use so that new string tokenizers do not have to be created for each string you want to tokenizer.

        The text will be tokenized from the beginning of the text.

        Parameters:
        text - the text to be parsed.
      • setDelimiters

        public void setDelimiters​(java.lang.CharSequence delim)
        Set the delimiters for this TextTokenizer. The position must be initialized before this method is used (setText does this and it is called from the constructor).
        Parameters:
        delim - the delimiters
      • setQuoteChar

        public void setQuoteChar​(char quote)
        Set the character to use as the "quote" character. All text between quote characters is considered a single token. The default quote character is '"'.
        See Also:
        setHonorQuotes(boolean)
      • setHonorQuotes

        public void setHonorQuotes​(boolean honorQuotes)
        Sets whether or not this tokenizer recognizes quoted text using the specified quote character. If true is passed, this tokenizer will consider any text between the specified quote characters as a single token. Honoring of quotes defaults to false.
        See Also:
        setQuoteChar(char)
      • getHonorQuotes

        public boolean getHonorQuotes()
        Returns true if this tokenizer honors quoted text (counts it as a single token).
      • setReturnEmptyTokens

        public void setReturnEmptyTokens​(boolean returnEmptyTokens)
        Set whether empty tokens should be returned from this point in in the tokenizing process onward.

        Empty tokens occur when two delimiters are next to each other or a delimiter occurs at the beginning or end of a string. If empty tokens are set to be returned, and a comma is the non token delimiter, the following table shows how many tokens are in each string.

        StringNumber of tokens
        "one,two"2 - normal case with no empty tokens.
        "one,,three"3 including the empty token in the middle.
        "one,"2 including the empty token at the end.
        ",two"2 including the empty token at the beginning.
        ","2 including the empty tokens at the beginning and the ends.
        ""1 - all strings will have at least one token if empty tokens are returned.
        Parameters:
        returnEmptyTokens - true if and only if empty tokens should be returned.
      • hasMoreTokens

        public boolean hasMoreTokens()
        Tests if there are more tokens available from this tokenizer's text. If this method returns true, then a subsequent call to nextToken with no argument will successfully return a token.
        Returns:
        true if and only if there is at least one token in the text after the current position; false otherwise.
      • nextToken

        public javolution.text.Text nextToken()
        Returns the next token from this text tokenizer.
        Returns:
        the next token from this text tokenizer.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's text.
      • nextToken

        public javolution.text.Text nextToken​(java.lang.CharSequence delim)
        Returns the next token in this text tokenizer's text. First, the set of characters considered to be delimiters by this TextTokenizer object is changed to be the characters in the string delim. Then the next token in the text after the current position is returned. The current position is advanced beyond the recognized token. The new delimiter set remains the default after this call.
        Parameters:
        delim - the new delimiters.
        Returns:
        the next token, after switching to the new delimiter set.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's text.
      • hasMoreElements

        public boolean hasMoreElements()
        Returns the same value as the hasMoreTokens method. It exists so that this class can implement the Enumeration interface.
        Specified by:
        hasMoreElements in interface java.util.Enumeration<javolution.text.Text>
        Returns:
        true if there are more tokens; false otherwise.
        See Also:
        Enumeration, hasMoreTokens()
      • nextElement

        public javolution.text.Text nextElement()
        Returns the same value as the nextToken method. It exists so that this class can implement the Enumeration interface.
        Specified by:
        nextElement in interface java.util.Enumeration<javolution.text.Text>
        Returns:
        the next token in the text.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's text.
        See Also:
        Enumeration, nextToken()
      • iterator

        public java.util.Iterator<javolution.text.Text> iterator()
        Returns an iterator over the tokens returned by this tokenizer.
        Specified by:
        iterator in interface java.lang.Iterable<javolution.text.Text>
      • hasNext

        public boolean hasNext()
        Returns the same value as the hasMoreTokens() method. It exists so that this class can implement the Iterator interface.
        Specified by:
        hasNext in interface java.util.Iterator<javolution.text.Text>
        Returns:
        true if there are more tokens; false otherwise.
        See Also:
        Iterator, hasMoreTokens()
      • next

        public javolution.text.Text next()
        Returns the same value as the nextToken() method. It exists so that this class can implement the Iterator interface.
        Specified by:
        next in interface java.util.Iterator<javolution.text.Text>
        Returns:
        the next token in the text.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's text.
        See Also:
        Iterator, nextToken()
      • remove

        public void remove()
        This implementation always throws UnsupportedOperationException. It exists so that this class can implement the Iterator interface.
        Specified by:
        remove in interface java.util.Iterator<javolution.text.Text>
        Throws:
        java.lang.UnsupportedOperationException - always is thrown.
        See Also:
        Iterator
      • countTokens

        public int countTokens()
        Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception. The current position is not advanced.
        Returns:
        the number of tokens remaining in the text using the current delimiter set.
        See Also:
        nextToken()
      • countTokens

        public int countTokens​(java.lang.CharSequence delims)
        Calculates the number of times that this tokenizer's nextToken method can be called before it generates an exception using the given set of delimiters. The delimiters given will be used for future calls to nextToken() unless new delimiters are given. The current position is not advanced.
        Parameters:
        delims - the new set of delimiters.
        Returns:
        the number of tokens remaining in the text using the new delimiter set.
        See Also:
        countTokens()
      • restOfText

        public javolution.text.Text restOfText()
        Retrieves the rest of the text as a single token. After calling this method hasMoreTokens() will always return false.
        Returns:
        any part of the text that has not yet been tokenized.
      • toText

        public javolution.text.Text toText()
        Returns the same value as the nextToken() method. It exists so that this class can implement the Realtime interface.
        Specified by:
        toText in interface javolution.lang.Realtime
        Returns:
        the next token in the text.
        Throws:
        java.util.NoSuchElementException - if there are no more tokens in this tokenizer's text.
        See Also:
        Realtime, nextToken()
      • recycle

        public static void recycle​(TextTokenizer instance)
        Recycles a TextTokenizer instance immediately (on the stack when executing in a StackContext).
      • main

        public static void main​(java.lang.String[] args)
        Testing code for this class.