@Beta @GwtCompatible public final class Utf8 extends java.lang.Object
The variant of UTF-8 implemented by this class is the restricted definition of UTF-8 introduced in Unicode 3.1. One implication of this is that it rejects "non-shortest form" byte sequences, even though the JDK decoder may accept them.
Modifier | Constructor and Description |
---|---|
private |
Utf8() |
Modifier and Type | Method and Description |
---|---|
static int |
encodedLength(java.lang.CharSequence sequence)
Returns the number of bytes in the UTF-8-encoded form of
sequence . |
private static int |
encodedLengthGeneral(java.lang.CharSequence sequence,
int start) |
static boolean |
isWellFormed(byte[] bytes)
Returns
true if bytes is a well-formed UTF-8 byte sequence according to
Unicode 6.0. |
static boolean |
isWellFormed(byte[] bytes,
int off,
int len)
Returns whether the given byte array slice is a well-formed UTF-8 byte sequence, as defined by
isWellFormed(byte[]) . |
private static boolean |
isWellFormedSlowPath(byte[] bytes,
int off,
int end) |
private static java.lang.String |
unpairedSurrogateMsg(int i) |
public static int encodedLength(java.lang.CharSequence sequence)
sequence
. For a string, this
method is equivalent to string.getBytes(UTF_8).length
, but is more efficient in both
time and space.java.lang.IllegalArgumentException
- if sequence
contains ill-formed UTF-16 (unpaired
surrogates)private static int encodedLengthGeneral(java.lang.CharSequence sequence, int start)
public static boolean isWellFormed(byte[] bytes)
true
if bytes
is a well-formed UTF-8 byte sequence according to
Unicode 6.0. Note that this is a stronger criterion than simply whether the bytes can be
decoded. For example, some versions of the JDK decoder will accept "non-shortest form" byte
sequences, but encoding never reproduces these. Such byte sequences are not considered
well-formed.
This method returns true
if and only if Arrays.equals(bytes, new
String(bytes, UTF_8).getBytes(UTF_8))
does, but is more efficient in both time and space.
public static boolean isWellFormed(byte[] bytes, int off, int len)
isWellFormed(byte[])
. Note that this can be false even when isWellFormed(bytes)
is true.bytes
- the input bufferoff
- the offset in the buffer of the first byte to readlen
- the number of bytes to read from the bufferprivate static boolean isWellFormedSlowPath(byte[] bytes, int off, int end)
private static java.lang.String unpairedSurrogateMsg(int i)