Class TextTokenizer
- java.lang.Object
-
- jahuwaldt.js.util.TextTokenizer
-
- All Implemented Interfaces:
java.lang.Iterable<javolution.text.Text>,java.util.Enumeration<javolution.text.Text>,java.util.Iterator<javolution.text.Text>,javolution.lang.Realtime,javolution.lang.Reusable
public final class TextTokenizer extends java.lang.Object implements java.util.Enumeration<javolution.text.Text>, java.util.Iterator<javolution.text.Text>, java.lang.Iterable<javolution.text.Text>, javolution.lang.Realtime, javolution.lang.Reusable
The text tokenizer class allows an application to break aTextobject into tokens. The tokenization method is much simpler than the one used by theStreamTokenizerclass. TheTextTokenizermethods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.The set of delimiters (the characters that separate tokens) may be specified either at creation time or on a per-token basis.
An instance of
TextTokenizerbehaves in one of two ways, depending on whether it was created with thereturnDelimsflag having the valuetrueorfalse:- If the flag is
false, delimiter characters serve to separate tokens. A token is a maximal sequence of consecutive characters that are not delimiters. - If the flag is
true, delimiter characters are themselves considered to be tokens. A token is thus either one delimiter character, or a maximal sequence of consecutive characters that are not delimiters.
A TextTokenizer object internally maintains a current position within the text to be tokenized. Some operations advance this current position past the characters processed.
A token is returned by taking a subtext of the text that was used to create the TextTokenizer object.
The following is one example of the use of the tokenizer. The code:
TextTokenizer tt = TextTokenizer.valueOf("this is a test"); while (tt.hasMoreTokens()) { System.out.println(tt.nextToken()); }prints the following output:
this is a testTextTokenizer is heavily based on
java.util.StringTokenizer. However, there are some improvements and additional methods and capabilities.Modified by: Joseph A. Huwaldt
- Version:
- February 23, 2025
- Author:
- Joseph A. Huwaldt Date: March 12, 2009
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description intcountTokens()Calculates the number of times that this tokenizer'snextTokenmethod can be called before it generates an exception.intcountTokens(java.lang.CharSequence delims)Calculates the number of times that this tokenizer'snextTokenmethod can be called before it generates an exception using the given set of delimiters.booleangetHonorQuotes()Returnstrueif this tokenizer honors quoted text (counts it as a single token).booleanhasMoreElements()Returns the same value as thehasMoreTokensmethod.booleanhasMoreTokens()Tests if there are more tokens available from this tokenizer's text.booleanhasNext()Returns the same value as thehasMoreTokens()method.java.util.Iterator<javolution.text.Text>iterator()Returns an iterator over the tokens returned by this tokenizer.static voidmain(java.lang.String[] args)Testing code for this class.static TextTokenizernewInstance()Return a text tokenizer with an initially empty string of text and with no delimiters.javolution.text.Textnext()Returns the same value as thenextToken()method.javolution.text.TextnextElement()Returns the same value as thenextTokenmethod.javolution.text.TextnextToken()Returns the next token from this text tokenizer.javolution.text.TextnextToken(java.lang.CharSequence delim)Returns the next token in this text tokenizer's text.static voidrecycle(TextTokenizer instance)Recycles aTextTokenizerinstance immediately (on the stack when executing in aStackContext).voidremove()This implementation always throwsUnsupportedOperationException.voidreset()Resets the internal state of this object to its default values.javolution.text.TextrestOfText()Retrieves the rest of the text as a single token.voidsetDelimiters(java.lang.CharSequence delim)Set the delimiters for this TextTokenizer.voidsetHonorQuotes(boolean honorQuotes)Sets whether or not this tokenizer recognizes quoted text using the specified quote character.voidsetQuoteChar(char quote)Set the character to use as the "quote" character.voidsetReturnEmptyTokens(boolean returnEmptyTokens)Set whether empty tokens should be returned from this point in in the tokenizing process onward.voidsetText(java.lang.CharSequence text)Set the text to be tokenized in this TextTokenizer.javolution.text.TexttoText()Returns the same value as thenextToken()method.static TextTokenizervalueOf(java.lang.CharSequence text)Return a text tokenizer for the specified character sequence.static TextTokenizervalueOf(java.lang.CharSequence text, java.lang.CharSequence delim)Return a text tokenizer for the specified character sequence.static TextTokenizervalueOf(java.lang.CharSequence text, java.lang.CharSequence delim, boolean returnDelims)Return a text tokenizer for the specified character sequence.
-
-
-
Method Detail
-
newInstance
public static TextTokenizer newInstance()
Return a text tokenizer with an initially empty string of text and with no delimiters. UsesetText(java.lang.CharSequence)andsetDelimiters(java.lang.CharSequence)to make this instance useful.- Returns:
- A text tokenizer with an initially empty string of text and with no delimiters.
-
reset
public void reset()
Resets the internal state of this object to its default values.- Specified by:
resetin interfacejavolution.lang.Reusable
-
valueOf
public static TextTokenizer valueOf(java.lang.CharSequence text, java.lang.CharSequence delim, boolean returnDelims)
Return a text tokenizer for the specified character sequence. All characters in thedelimargument are the delimiters for separating tokens.If the
returnDelimsflag istrue, then the delimiter characters are also returned as tokens. Each delimiter is returned as a string of length one. If the flag isfalse, the delimiter characters are skipped and only serve as separators between tokens.Note that if delim is null, this constructor does not throw an exception. However, trying to invoke other methods on the resulting TextTokenizer may result in a NullPointerException.
- Parameters:
text- the text to be parsed.delim- the delimiters.returnDelims- flag indicating whether to return the delimiters as tokens.- Returns:
- A text tokenizer for the specified character sequence.
-
valueOf
public static TextTokenizer valueOf(java.lang.CharSequence text, java.lang.CharSequence delim)
Return a text tokenizer for the specified character sequence. The characters in thedelimargument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.- Parameters:
text- the text to be parsed.delim- the delimiters.- Returns:
- A text tokenizer for the specified character sequence.
-
valueOf
public static TextTokenizer valueOf(java.lang.CharSequence text)
Return a text tokenizer for the specified character sequence. The tokenizer uses the default delimiter set, which is" \t\n\r\f": the space character, the tab character, the newline character, the carriage-return character, and the form-feed character. Delimiter characters themselves will not be treated as tokens.- Parameters:
text- the text to be parsed.- Returns:
- A text tokenizer for the specified character sequence.
-
setText
public void setText(java.lang.CharSequence text)
Set the text to be tokenized in this TextTokenizer.This is useful when for TextTokenizer re-use so that new string tokenizers do not have to be created for each string you want to tokenizer.
The text will be tokenized from the beginning of the text.
- Parameters:
text- the text to be parsed.
-
setDelimiters
public void setDelimiters(java.lang.CharSequence delim)
Set the delimiters for this TextTokenizer. The position must be initialized before this method is used (setText does this and it is called from the constructor).- Parameters:
delim- the delimiters
-
setQuoteChar
public void setQuoteChar(char quote)
Set the character to use as the "quote" character. All text between quote characters is considered a single token. The default quote character is'"'.- Parameters:
quote- The character to use as the "quote" character.- See Also:
setHonorQuotes(boolean)
-
setHonorQuotes
public void setHonorQuotes(boolean honorQuotes)
Sets whether or not this tokenizer recognizes quoted text using the specified quote character. Iftrueis passed, this tokenizer will consider any text between the specified quote characters as a single token. Honoring of quotes defaults to false.- Parameters:
honorQuotes- Treat quoted text as a single token iftrue.- See Also:
setQuoteChar(char)
-
getHonorQuotes
public boolean getHonorQuotes()
Returnstrueif this tokenizer honors quoted text (counts it as a single token).- Returns:
trueif this tokenizer honors quoted text.
-
setReturnEmptyTokens
public void setReturnEmptyTokens(boolean returnEmptyTokens)
Set whether empty tokens should be returned from this point in in the tokenizing process onward.Empty tokens occur when two delimiters are next to each other or a delimiter occurs at the beginning or end of a string. If empty tokens are set to be returned, and a comma is the non token delimiter, the following table shows how many tokens are in each string.
String Number of tokens "one,two" 2 - normal case with no empty tokens. "one,,three" 3 including the empty token in the middle. "one," 2 including the empty token at the end. ",two" 2 including the empty token at the beginning. "," 2 including the empty tokens at the beginning and the ends. "" 1 - all strings will have at least one token if empty tokens are returned. - Parameters:
returnEmptyTokens- true if and only if empty tokens should be returned.
-
hasMoreTokens
public boolean hasMoreTokens()
Tests if there are more tokens available from this tokenizer's text. If this method returns true, then a subsequent call to nextToken with no argument will successfully return a token.- Returns:
trueif and only if there is at least one token in the text after the current position;falseotherwise.
-
nextToken
public javolution.text.Text nextToken()
Returns the next token from this text tokenizer.- Returns:
- the next token from this text tokenizer.
- Throws:
java.util.NoSuchElementException- if there are no more tokens in this tokenizer's text.
-
nextToken
public javolution.text.Text nextToken(java.lang.CharSequence delim)
Returns the next token in this text tokenizer's text. First, the set of characters considered to be delimiters by this TextTokenizer object is changed to be the characters in the string delim. Then the next token in the text after the current position is returned. The current position is advanced beyond the recognized token. The new delimiter set remains the default after this call.- Parameters:
delim- the new delimiters.- Returns:
- the next token, after switching to the new delimiter set.
- Throws:
java.util.NoSuchElementException- if there are no more tokens in this tokenizer's text.
-
hasMoreElements
public boolean hasMoreElements()
Returns the same value as thehasMoreTokensmethod. It exists so that this class can implement theEnumerationinterface.- Specified by:
hasMoreElementsin interfacejava.util.Enumeration<javolution.text.Text>- Returns:
trueif there are more tokens;falseotherwise.- See Also:
Enumeration,hasMoreTokens()
-
nextElement
public javolution.text.Text nextElement()
Returns the same value as thenextTokenmethod. It exists so that this class can implement theEnumerationinterface.- Specified by:
nextElementin interfacejava.util.Enumeration<javolution.text.Text>- Returns:
- the next token in the text.
- Throws:
java.util.NoSuchElementException- if there are no more tokens in this tokenizer's text.- See Also:
Enumeration,nextToken()
-
iterator
public java.util.Iterator<javolution.text.Text> iterator()
Returns an iterator over the tokens returned by this tokenizer.- Specified by:
iteratorin interfacejava.lang.Iterable<javolution.text.Text>
-
hasNext
public boolean hasNext()
Returns the same value as thehasMoreTokens()method. It exists so that this class can implement theIteratorinterface.- Specified by:
hasNextin interfacejava.util.Iterator<javolution.text.Text>- Returns:
trueif there are more tokens;falseotherwise.- See Also:
Iterator,hasMoreTokens()
-
next
public javolution.text.Text next()
Returns the same value as thenextToken()method. It exists so that this class can implement theIteratorinterface.- Specified by:
nextin interfacejava.util.Iterator<javolution.text.Text>- Returns:
- the next token in the text.
- Throws:
java.util.NoSuchElementException- if there are no more tokens in this tokenizer's text.- See Also:
Iterator,nextToken()
-
remove
public void remove()
This implementation always throwsUnsupportedOperationException. It exists so that this class can implement theIteratorinterface.- Specified by:
removein interfacejava.util.Iterator<javolution.text.Text>- Throws:
java.lang.UnsupportedOperationException- always is thrown.- See Also:
Iterator
-
countTokens
public int countTokens()
Calculates the number of times that this tokenizer'snextTokenmethod can be called before it generates an exception. The current position is not advanced.- Returns:
- the number of tokens remaining in the text using the current delimiter set.
- See Also:
nextToken()
-
countTokens
public int countTokens(java.lang.CharSequence delims)
Calculates the number of times that this tokenizer'snextTokenmethod can be called before it generates an exception using the given set of delimiters. The delimiters given will be used for future calls to nextToken() unless new delimiters are given. The current position is not advanced.- Parameters:
delims- the new set of delimiters.- Returns:
- the number of tokens remaining in the text using the new delimiter set.
- See Also:
countTokens()
-
restOfText
public javolution.text.Text restOfText()
Retrieves the rest of the text as a single token. After calling this method hasMoreTokens() will always return false.- Returns:
- any part of the text that has not yet been tokenized.
-
toText
public javolution.text.Text toText()
Returns the same value as thenextToken()method. It exists so that this class can implement theRealtimeinterface.- Specified by:
toTextin interfacejavolution.lang.Realtime- Returns:
- the next token in the text.
- Throws:
java.util.NoSuchElementException- if there are no more tokens in this tokenizer's text.- See Also:
Realtime,nextToken()
-
recycle
public static void recycle(TextTokenizer instance)
Recycles aTextTokenizerinstance immediately (on the stack when executing in aStackContext).- Parameters:
instance- the instance of this class to recycle.
-
main
public static void main(java.lang.String[] args)
Testing code for this class.- Parameters:
args- the command-line arguments
-
-