The DataInput interface provides
for reading bytes from a binary stream and
reconstructing from them data in any of
the Java primitive types. There is also
a
facility for reconstructing a String
from data in
modified UTF-8
format.
It is generally true of all the reading
routines in this interface that if end of
file is reached before the desired number
of bytes has been read, an EOFException
(which is a kind of IOException)
is thrown. If any byte cannot be read for
any reason other than end of file, an IOException
other than EOFException is
thrown. In particular, an IOException
may be thrown if the input stream has been
closed.
Implementations of the DataInput and DataOutput interfaces represent
Unicode strings in a format that is a slight modification of UTF-8.
(For information regarding the standard UTF-8 format, see section
3.9 Unicode Encoding Forms of The Unicode Standard, Version
4.0).
Note that in the following tables, the most significant bit appears in the
far left-hand column.
All characters in the range '\u0001' to
'\u007F' are represented by a single byte:
Bit Values
Byte 1
0
bits 6-0
The null character '\u0000' and characters in the
range '\u0080' to '\u07FF' are
represented by a pair of bytes:
Bit Values
Byte 1
1
1
0
bits 10-6
Byte 2
1
0
bits 5-0
char values in the range '\u0800' to
'\uFFFF' are represented by three bytes:
Bit Values
Byte 1
1
1
1
0
bits 15-12
Byte 2
1
0
bits 11-6
Byte 3
1
0
bits 5-0
The differences between this format and the
standard UTF-8 format are the following:
The null byte '\u0000' is encoded in 2-byte format
rather than 1-byte, so that the encoded strings never have
embedded nulls.
Only the 1-byte, 2-byte, and 3-byte formats are used.
DataInputinterface provides for reading bytes from a binary stream and reconstructing from them data in any of the Java primitive types. There is also a facility for reconstructing aStringfrom data in modified UTF-8 format.It is generally true of all the reading routines in this interface that if end of file is reached before the desired number of bytes has been read, an
EOFException(which is a kind ofIOException) is thrown. If any byte cannot be read for any reason other than end of file, anIOExceptionother thanEOFExceptionis thrown. In particular, anIOExceptionmay be thrown if the input stream has been closed.Modified UTF-8
Implementations of the DataInput and DataOutput interfaces represent Unicode strings in a format that is a slight modification of UTF-8. (For information regarding the standard UTF-8 format, see section 3.9 Unicode Encoding Forms of The Unicode Standard, Version 4.0). Note that in the following tables, the most significant bit appears in the far left-hand column.
All characters in the range
'\u0001'to'\u007F'are represented by a single byte:The null character
'\u0000'and characters in the range'\u0080'to'\u07FF'are represented by a pair of bytes:charvalues in the range'\u0800'to'\uFFFF'are represented by three bytes:The differences between this format and the standard UTF-8 format are the following:
'\u0000'is encoded in 2-byte format rather than 1-byte, so that the encoded strings never have embedded nulls.