Class TextTokenizer
- java.lang.Object
-
- jahuwaldt.js.util.TextTokenizer
-
- All Implemented Interfaces:
java.lang.Iterable<javolution.text.Text>
,java.util.Enumeration<javolution.text.Text>
,java.util.Iterator<javolution.text.Text>
,javolution.lang.Realtime
,javolution.lang.Reusable
public final class TextTokenizer extends java.lang.Object implements java.util.Enumeration<javolution.text.Text>, java.util.Iterator<javolution.text.Text>, java.lang.Iterable<javolution.text.Text>, javolution.lang.Realtime, javolution.lang.Reusable
The text tokenizer class allows an application to break aText
object into tokens. The tokenization method is much simpler than the one used by theStreamTokenizer
class. TheTextTokenizer
methods do not distinguish among identifiers, numbers, and quoted strings, nor do they recognize and skip comments.The set of delimiters (the characters that separate tokens) may be specified either at creation time or on a per-token basis.
An instance of
TextTokenizer
behaves in one of two ways, depending on whether it was created with thereturnDelims
flag having the valuetrue
orfalse
:- If the flag is
false
, delimiter characters serve to separate tokens. A token is a maximal sequence of consecutive characters that are not delimiters. - If the flag is
true
, delimiter characters are themselves considered to be tokens. A token is thus either one delimiter character, or a maximal sequence of consecutive characters that are not delimiters.
A TextTokenizer object internally maintains a current position within the text to be tokenized. Some operations advance this current position past the characters processed.
A token is returned by taking a subtext of the text that was used to create the TextTokenizer object.
The following is one example of the use of the tokenizer. The code:
TextTokenizer tt = TextTokenizer.valueOf("this is a test"); while (tt.hasMoreTokens()) { System.out.println(tt.nextToken()); }
prints the following output:
this is a test
TextTokenizer is heavily based on
java.util.StringTokenizer
. However, there are some improvements and additional methods and capabilities.Modified by: Joseph A. Huwaldt
- Version:
- February 17, 2025
- Author:
- Joseph A. Huwaldt Date: March 12, 2009
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
countTokens()
Calculates the number of times that this tokenizer'snextToken
method can be called before it generates an exception.int
countTokens(java.lang.CharSequence delims)
Calculates the number of times that this tokenizer'snextToken
method can be called before it generates an exception using the given set of delimiters.boolean
getHonorQuotes()
Returnstrue
if this tokenizer honors quoted text (counts it as a single token).boolean
hasMoreElements()
Returns the same value as thehasMoreTokens
method.boolean
hasMoreTokens()
Tests if there are more tokens available from this tokenizer's text.boolean
hasNext()
Returns the same value as thehasMoreTokens()
method.java.util.Iterator<javolution.text.Text>
iterator()
Returns an iterator over the tokens returned by this tokenizer.static void
main(java.lang.String[] args)
Testing code for this class.static TextTokenizer
newInstance()
Return a text tokenizer with an initially empty string of text and with no delimiters.javolution.text.Text
next()
Returns the same value as thenextToken()
method.javolution.text.Text
nextElement()
Returns the same value as thenextToken
method.javolution.text.Text
nextToken()
Returns the next token from this text tokenizer.javolution.text.Text
nextToken(java.lang.CharSequence delim)
Returns the next token in this text tokenizer's text.static void
recycle(TextTokenizer instance)
Recycles aTextTokenizer
instance immediately (on the stack when executing in aStackContext
).void
remove()
This implementation always throwsUnsupportedOperationException
.void
reset()
Resets the internal state of this object to its default values.javolution.text.Text
restOfText()
Retrieves the rest of the text as a single token.void
setDelimiters(java.lang.CharSequence delim)
Set the delimiters for this TextTokenizer.void
setHonorQuotes(boolean honorQuotes)
Sets whether or not this tokenizer recognizes quoted text using the specified quote character.void
setQuoteChar(char quote)
Set the character to use as the "quote" character.void
setReturnEmptyTokens(boolean returnEmptyTokens)
Set whether empty tokens should be returned from this point in in the tokenizing process onward.void
setText(java.lang.CharSequence text)
Set the text to be tokenized in this TextTokenizer.javolution.text.Text
toText()
Returns the same value as thenextToken()
method.static TextTokenizer
valueOf(java.lang.CharSequence text)
Return a text tokenizer for the specified character sequence.static TextTokenizer
valueOf(java.lang.CharSequence text, java.lang.CharSequence delim)
Return a text tokenizer for the specified character sequence.static TextTokenizer
valueOf(java.lang.CharSequence text, java.lang.CharSequence delim, boolean returnDelims)
Return a text tokenizer for the specified character sequence.
-
-
-
Method Detail
-
newInstance
public static TextTokenizer newInstance()
Return a text tokenizer with an initially empty string of text and with no delimiters. UsesetText(java.lang.CharSequence)
andsetDelimiters(java.lang.CharSequence)
to make this instance useful.
-
reset
public void reset()
Resets the internal state of this object to its default values.- Specified by:
reset
in interfacejavolution.lang.Reusable
-
valueOf
public static TextTokenizer valueOf(java.lang.CharSequence text, java.lang.CharSequence delim, boolean returnDelims)
Return a text tokenizer for the specified character sequence. All characters in thedelim
argument are the delimiters for separating tokens.If the
returnDelims
flag istrue
, then the delimiter characters are also returned as tokens. Each delimiter is returned as a string of length one. If the flag isfalse
, the delimiter characters are skipped and only serve as separators between tokens.Note that if delim is null, this constructor does not throw an exception. However, trying to invoke other methods on the resulting TextTokenizer may result in a NullPointerException.
- Parameters:
text
- the text to be parsed.delim
- the delimiters.returnDelims
- flag indicating whether to return the delimiters as tokens.
-
valueOf
public static TextTokenizer valueOf(java.lang.CharSequence text, java.lang.CharSequence delim)
Return a text tokenizer for the specified character sequence. The characters in thedelim
argument are the delimiters for separating tokens. Delimiter characters themselves will not be treated as tokens.- Parameters:
text
- the text to be parsed.delim
- the delimiters.
-
valueOf
public static TextTokenizer valueOf(java.lang.CharSequence text)
Return a text tokenizer for the specified character sequence. The tokenizer uses the default delimiter set, which is" \t\n\r\f"
: the space character, the tab character, the newline character, the carriage-return character, and the form-feed character. Delimiter characters themselves will not be treated as tokens.- Parameters:
text
- the text to be parsed.
-
setText
public void setText(java.lang.CharSequence text)
Set the text to be tokenized in this TextTokenizer.This is useful when for TextTokenizer re-use so that new string tokenizers do not have to be created for each string you want to tokenizer.
The text will be tokenized from the beginning of the text.
- Parameters:
text
- the text to be parsed.
-
setDelimiters
public void setDelimiters(java.lang.CharSequence delim)
Set the delimiters for this TextTokenizer. The position must be initialized before this method is used (setText does this and it is called from the constructor).- Parameters:
delim
- the delimiters
-
setQuoteChar
public void setQuoteChar(char quote)
Set the character to use as the "quote" character. All text between quote characters is considered a single token. The default quote character is'"'
.- See Also:
setHonorQuotes(boolean)
-
setHonorQuotes
public void setHonorQuotes(boolean honorQuotes)
Sets whether or not this tokenizer recognizes quoted text using the specified quote character. Iftrue
is passed, this tokenizer will consider any text between the specified quote characters as a single token. Honoring of quotes defaults to false.- See Also:
setQuoteChar(char)
-
getHonorQuotes
public boolean getHonorQuotes()
Returnstrue
if this tokenizer honors quoted text (counts it as a single token).
-
setReturnEmptyTokens
public void setReturnEmptyTokens(boolean returnEmptyTokens)
Set whether empty tokens should be returned from this point in in the tokenizing process onward.Empty tokens occur when two delimiters are next to each other or a delimiter occurs at the beginning or end of a string. If empty tokens are set to be returned, and a comma is the non token delimiter, the following table shows how many tokens are in each string.
String Number of tokens "one,two" 2 - normal case with no empty tokens. "one,,three" 3 including the empty token in the middle. "one," 2 including the empty token at the end. ",two" 2 including the empty token at the beginning. "," 2 including the empty tokens at the beginning and the ends. "" 1 - all strings will have at least one token if empty tokens are returned. - Parameters:
returnEmptyTokens
- true if and only if empty tokens should be returned.
-
hasMoreTokens
public boolean hasMoreTokens()
Tests if there are more tokens available from this tokenizer's text. If this method returns true, then a subsequent call to nextToken with no argument will successfully return a token.- Returns:
true
if and only if there is at least one token in the text after the current position;false
otherwise.
-
nextToken
public javolution.text.Text nextToken()
Returns the next token from this text tokenizer.- Returns:
- the next token from this text tokenizer.
- Throws:
java.util.NoSuchElementException
- if there are no more tokens in this tokenizer's text.
-
nextToken
public javolution.text.Text nextToken(java.lang.CharSequence delim)
Returns the next token in this text tokenizer's text. First, the set of characters considered to be delimiters by this TextTokenizer object is changed to be the characters in the string delim. Then the next token in the text after the current position is returned. The current position is advanced beyond the recognized token. The new delimiter set remains the default after this call.- Parameters:
delim
- the new delimiters.- Returns:
- the next token, after switching to the new delimiter set.
- Throws:
java.util.NoSuchElementException
- if there are no more tokens in this tokenizer's text.
-
hasMoreElements
public boolean hasMoreElements()
Returns the same value as thehasMoreTokens
method. It exists so that this class can implement theEnumeration
interface.- Specified by:
hasMoreElements
in interfacejava.util.Enumeration<javolution.text.Text>
- Returns:
true
if there are more tokens;false
otherwise.- See Also:
Enumeration
,hasMoreTokens()
-
nextElement
public javolution.text.Text nextElement()
Returns the same value as thenextToken
method. It exists so that this class can implement theEnumeration
interface.- Specified by:
nextElement
in interfacejava.util.Enumeration<javolution.text.Text>
- Returns:
- the next token in the text.
- Throws:
java.util.NoSuchElementException
- if there are no more tokens in this tokenizer's text.- See Also:
Enumeration
,nextToken()
-
iterator
public java.util.Iterator<javolution.text.Text> iterator()
Returns an iterator over the tokens returned by this tokenizer.- Specified by:
iterator
in interfacejava.lang.Iterable<javolution.text.Text>
-
hasNext
public boolean hasNext()
Returns the same value as thehasMoreTokens()
method. It exists so that this class can implement theIterator
interface.- Specified by:
hasNext
in interfacejava.util.Iterator<javolution.text.Text>
- Returns:
true
if there are more tokens;false
otherwise.- See Also:
Iterator
,hasMoreTokens()
-
next
public javolution.text.Text next()
Returns the same value as thenextToken()
method. It exists so that this class can implement theIterator
interface.- Specified by:
next
in interfacejava.util.Iterator<javolution.text.Text>
- Returns:
- the next token in the text.
- Throws:
java.util.NoSuchElementException
- if there are no more tokens in this tokenizer's text.- See Also:
Iterator
,nextToken()
-
remove
public void remove()
This implementation always throwsUnsupportedOperationException
. It exists so that this class can implement theIterator
interface.- Specified by:
remove
in interfacejava.util.Iterator<javolution.text.Text>
- Throws:
java.lang.UnsupportedOperationException
- always is thrown.- See Also:
Iterator
-
countTokens
public int countTokens()
Calculates the number of times that this tokenizer'snextToken
method can be called before it generates an exception. The current position is not advanced.- Returns:
- the number of tokens remaining in the text using the current delimiter set.
- See Also:
nextToken()
-
countTokens
public int countTokens(java.lang.CharSequence delims)
Calculates the number of times that this tokenizer'snextToken
method can be called before it generates an exception using the given set of delimiters. The delimiters given will be used for future calls to nextToken() unless new delimiters are given. The current position is not advanced.- Parameters:
delims
- the new set of delimiters.- Returns:
- the number of tokens remaining in the text using the new delimiter set.
- See Also:
countTokens()
-
restOfText
public javolution.text.Text restOfText()
Retrieves the rest of the text as a single token. After calling this method hasMoreTokens() will always return false.- Returns:
- any part of the text that has not yet been tokenized.
-
toText
public javolution.text.Text toText()
Returns the same value as thenextToken()
method. It exists so that this class can implement theRealtime
interface.- Specified by:
toText
in interfacejavolution.lang.Realtime
- Returns:
- the next token in the text.
- Throws:
java.util.NoSuchElementException
- if there are no more tokens in this tokenizer's text.- See Also:
Realtime
,nextToken()
-
recycle
public static void recycle(TextTokenizer instance)
Recycles aTextTokenizer
instance immediately (on the stack when executing in aStackContext
).
-
main
public static void main(java.lang.String[] args)
Testing code for this class.
-
-