public final class TextTokenizer extends java.lang.Object implements java.util.Enumeration<javolution.text.Text>, java.util.Iterator<javolution.text.Text>, java.lang.Iterable<javolution.text.Text>, javolution.lang.Realtime, javolution.lang.Reusable
Text
object into tokens. The tokenization method is much simpler than
the one used by the StreamTokenizer
class. The
TextTokenizer
methods do not distinguish among
identifiers, numbers, and quoted strings, nor do they recognize
and skip comments.
The set of delimiters (the characters that separate tokens) may be specified either at creation time or on a per-token basis.
An instance of TextTokenizer
behaves in one of two
ways, depending on whether it was created with the
returnDelims
flag having the value true
or false
:
false
, delimiter characters serve to
separate tokens. A token is a maximal sequence of consecutive
characters that are not delimiters.
true
, delimiter characters are themselves
considered to be tokens. A token is thus either one delimiter
character, or a maximal sequence of consecutive characters that are
not delimiters.
A TextTokenizer object internally maintains a current position within the text to be tokenized. Some operations advance this current position past the characters processed.
A token is returned by taking a subtext of the text that was used to create the TextTokenizer object.
The following is one example of the use of the tokenizer. The code:
TextTokenizer tt = TextTokenizer.valueOf("this is a test"); while (tt.hasMoreTokens()) { System.out.println(tt.nextToken()); }
prints the following output:
this is a test
TextTokenizer is heavily based on java.util.StringTokenizer
.
However, there are some improvements and additional methods and capabilities.
Modified by: Joseph A. Huwaldt
Modifier and Type | Method and Description |
---|---|
int |
countTokens()
Calculates the number of times that this tokenizer's
nextToken method can be called before it generates an
exception. |
int |
countTokens(java.lang.CharSequence delims)
Calculates the number of times that this tokenizer's
nextToken
method can be called before it generates an exception using the given set of
delimiters. |
boolean |
getHonorQuotes()
Returns
true if this tokenizer honors quoted text (counts it as a single token). |
boolean |
hasMoreElements()
Returns the same value as the
hasMoreTokens
method. |
boolean |
hasMoreTokens()
Tests if there are more tokens available from this tokenizer's text.
|
boolean |
hasNext()
Returns the same value as the
hasMoreTokens() method. |
java.util.Iterator<javolution.text.Text> |
iterator()
Returns an iterator over the tokens returned by this tokenizer.
|
static void |
main(java.lang.String[] args)
Testing code for this class.
|
static TextTokenizer |
newInstance()
Return a text tokenizer with an initially empty string of text and with
no delimiters.
|
javolution.text.Text |
next()
Returns the same value as the
nextToken() method. |
javolution.text.Text |
nextElement()
Returns the same value as the
nextToken method. |
javolution.text.Text |
nextToken()
Returns the next token from this text tokenizer.
|
javolution.text.Text |
nextToken(java.lang.CharSequence delim)
Returns the next token in this text tokenizer's text.
|
static void |
recycle(TextTokenizer instance)
Recycles a
TextTokenizer instance immediately
(on the stack when executing in a StackContext ). |
void |
remove()
This implementation always throws
UnsupportedOperationException . |
void |
reset()
Resets the internal state of this object to its default values.
|
javolution.text.Text |
restOfText()
Retrieves the rest of the text as a single token.
|
void |
setDelimiters(java.lang.CharSequence delim)
Set the delimiters for this TextTokenizer.
|
void |
setHonorQuotes(boolean honorQuotes)
Sets whether or not this tokenizer recognizes quoted text using the specified
quote character.
|
void |
setQuoteChar(char quote)
Set the character to use as the "quote" character.
|
void |
setReturnEmptyTokens(boolean returnEmptyTokens)
Set whether empty tokens should be returned from this point in
in the tokenizing process onward.
|
void |
setText(java.lang.CharSequence text)
Set the text to be tokenized in this TextTokenizer.
|
javolution.text.Text |
toText()
Returns the same value as the
nextToken() method. |
static TextTokenizer |
valueOf(java.lang.CharSequence text)
Return a text tokenizer for the specified character sequence.
|
static TextTokenizer |
valueOf(java.lang.CharSequence text,
java.lang.CharSequence delim)
Return a text tokenizer for the specified character sequence.
|
static TextTokenizer |
valueOf(java.lang.CharSequence text,
java.lang.CharSequence delim,
boolean returnDelims)
Return a text tokenizer for the specified character sequence.
|
public static TextTokenizer newInstance()
setText(java.lang.CharSequence)
and setDelimiters(java.lang.CharSequence)
to make
this instance useful.public void reset()
reset
in interface javolution.lang.Reusable
public static TextTokenizer valueOf(java.lang.CharSequence text, java.lang.CharSequence delim, boolean returnDelims)
delim
argument are the delimiters
for separating tokens.
If the returnDelims
flag is true
, then
the delimiter characters are also returned as tokens. Each
delimiter is returned as a string of length one. If the flag is
false
, the delimiter characters are skipped and only
serve as separators between tokens.
Note that if delim is null, this constructor does not throw an exception. However, trying to invoke other methods on the resulting TextTokenizer may result in a NullPointerException.
text
- the text to be parsed.delim
- the delimiters.returnDelims
- flag indicating whether to return the delimiters
as tokens.public static TextTokenizer valueOf(java.lang.CharSequence text, java.lang.CharSequence delim)
delim
argument are the delimiters
for separating tokens. Delimiter characters themselves will not
be treated as tokens.text
- the text to be parsed.delim
- the delimiters.public static TextTokenizer valueOf(java.lang.CharSequence text)
" \t\n\r\f"
: the space character,
the tab character, the newline character, the carriage-return character,
and the form-feed character. Delimiter characters themselves will
not be treated as tokens.text
- the text to be parsed.public void setText(java.lang.CharSequence text)
This is useful when for TextTokenizer re-use so that new string tokenizers do not have to be created for each string you want to tokenizer.
The text will be tokenized from the beginning of the text.
text
- the text to be parsed.public void setDelimiters(java.lang.CharSequence delim)
delim
- the delimiterspublic void setQuoteChar(char quote)
'"'
.setHonorQuotes(boolean)
public void setHonorQuotes(boolean honorQuotes)
true
is passed, this tokenizer will consider any
text between the specified quote characters as a single token. Honoring of
quotes defaults to false.setQuoteChar(char)
public boolean getHonorQuotes()
true
if this tokenizer honors quoted text (counts it as a single token).public void setReturnEmptyTokens(boolean returnEmptyTokens)
Empty tokens occur when two delimiters are next to each other
or a delimiter occurs at the beginning or end of a string. If
empty tokens are set to be returned, and a comma is the non token
delimiter, the following table shows how many tokens are in each
string.
String | Number of tokens | ||
---|---|---|---|
"one,two" | 2 - normal case with no empty tokens. | ||
"one,,three" | 3 including the empty token in the middle. | ||
"one," | 2 including the empty token at the end. | ||
",two" | 2 including the empty token at the beginning. | ||
"," | 2 including the empty tokens at the beginning and the ends. | ||
"" | 1 - all strings will have at least one token if empty tokens are returned. |
returnEmptyTokens
- true if and only if empty tokens should be returned.public boolean hasMoreTokens()
true
if and only if there is at least one token
in the text after the current position; false
otherwise.public javolution.text.Text nextToken()
java.util.NoSuchElementException
- if there are no more tokens in this
tokenizer's text.public javolution.text.Text nextToken(java.lang.CharSequence delim)
delim
- the new delimiters.java.util.NoSuchElementException
- if there are no more tokens in this
tokenizer's text.public boolean hasMoreElements()
hasMoreTokens
method. It exists so that this class can implement the
Enumeration
interface.hasMoreElements
in interface java.util.Enumeration<javolution.text.Text>
true
if there are more tokens;
false
otherwise.Enumeration
,
hasMoreTokens()
public javolution.text.Text nextElement()
nextToken
method.
It exists so that this class can implement the
Enumeration
interface.nextElement
in interface java.util.Enumeration<javolution.text.Text>
java.util.NoSuchElementException
- if there are no more tokens in this
tokenizer's text.Enumeration
,
nextToken()
public java.util.Iterator<javolution.text.Text> iterator()
iterator
in interface java.lang.Iterable<javolution.text.Text>
public boolean hasNext()
hasMoreTokens()
method. It exists
so that this class can implement the Iterator
interface.hasNext
in interface java.util.Iterator<javolution.text.Text>
true
if there are more tokens;
false
otherwise.Iterator
,
hasMoreTokens()
public javolution.text.Text next()
nextToken()
method.
It exists so that this class can implement the
Iterator
interface.next
in interface java.util.Iterator<javolution.text.Text>
java.util.NoSuchElementException
- if there are no more tokens in this tokenizer's text.Iterator
,
nextToken()
public void remove()
UnsupportedOperationException
.
It exists so that this class can implement the Iterator
interface.remove
in interface java.util.Iterator<javolution.text.Text>
java.lang.UnsupportedOperationException
- always is thrown.Iterator
public int countTokens()
nextToken
method can be called before it generates an
exception. The current position is not advanced.nextToken()
public int countTokens(java.lang.CharSequence delims)
nextToken
method can be called before it generates an exception using the given set of
delimiters. The delimiters given will be used for future calls to
nextToken() unless new delimiters are given. The current position
is not advanced.delims
- the new set of delimiters.countTokens()
public javolution.text.Text restOfText()
public javolution.text.Text toText()
nextToken()
method.
It exists so that this class can implement the
Realtime
interface.toText
in interface javolution.lang.Realtime
java.util.NoSuchElementException
- if there are no more tokens in this tokenizer's text.Realtime
,
nextToken()
public static void recycle(TextTokenizer instance)
TextTokenizer
instance immediately
(on the stack when executing in a StackContext
).public static void main(java.lang.String[] args)