com.glaforge.i18n.io
Class SmartEncodingInputStream

java.lang.Object
  extended by java.io.InputStream
      extended by com.glaforge.i18n.io.SmartEncodingInputStream
All Implemented Interfaces:
java.io.Closeable

public class SmartEncodingInputStream
extends java.io.InputStream

com.glaforge.i18n.io.SmartEncodingInputStream extends an InputStream with a special constructor and a special method for dealing with text files encoded within different charsets.

It surrounds a normal InputStream whatever it may be (FileInputStream...). It reads a buffer of a defined length. Then with this byte buffer, it uses the class com.glaforge.i18n.io.CharsetToolkit to parse this buffer and guess what the encoding is. All this steps are done within the constructor. At this time, you can call the method getReader() to retrieve a Reader created with the good charset, as guessed while parsing the first bytes of the file. This Reader reads inside the com.glaforge.i18n.io.SmartEncodingInputStream. It reads first in the internal buffer, then when we reach the end of the buffer, the underlying InputStream is read with the default read method.

Usage:


 FileInputStream fis = new FileInputStream("utf-8.txt");

 com.glaforge.i18n.io.SmartEncodingInputStream smartIS = new com.glaforge.i18n.io.SmartEncodingInputStream(fis);

 Reader reader = smartIS.getReader();

 BufferedReader bufReader = new BufferedReader(reader);



 String line;

 while ((line = bufReader.readLine()) != null)

 {

     System.out.println(line);

 }

 
Date: 23 juil. 2002


Field Summary
static int BUFFER_LENGTH_2KB
           
static int BUFFER_LENGTH_4KB
           
static int BUFFER_LENGTH_8KB
           
 
Constructor Summary
SmartEncodingInputStream(java.io.InputStream is)
          Constructor of the com.glaforge.i18n.io.SmartEncodingInputStream.
SmartEncodingInputStream(java.io.InputStream is, int bufferLength)
          Constructor of the com.glaforge.i18n.io.SmartEncodingInputStream.
SmartEncodingInputStream(java.io.InputStream is, int bufferLength, java.nio.charset.Charset defaultCharset)
          Constructor of the com.glaforge.i18n.io.SmartEncodingInputStream.
SmartEncodingInputStream(java.io.InputStream is, int bufferLength, java.nio.charset.Charset defaultCharset, boolean enforce8Bit)
          Constructor of the com.glaforge.i18n.io.SmartEncodingInputStream class.
 
Method Summary
 java.nio.charset.Charset getEncoding()
          Retrieves the Charset as guessed from the underlying InputStream.
 java.io.Reader getReader()
          Gets a Reader with the right Charset as guessed by reading the beginning of the underlying InputStream.
static void main(java.lang.String[] args)
           
 int read()
          Implements the method read() as defined in the InputStream interface.
 
Methods inherited from class java.io.InputStream
available, close, mark, markSupported, read, read, reset, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

BUFFER_LENGTH_2KB

public static final int BUFFER_LENGTH_2KB
See Also:
Constant Field Values

BUFFER_LENGTH_4KB

public static final int BUFFER_LENGTH_4KB
See Also:
Constant Field Values

BUFFER_LENGTH_8KB

public static final int BUFFER_LENGTH_8KB
See Also:
Constant Field Values
Constructor Detail

SmartEncodingInputStream

public SmartEncodingInputStream(java.io.InputStream is,
                                int bufferLength,
                                java.nio.charset.Charset defaultCharset,
                                boolean enforce8Bit)
                         throws java.io.IOException

Constructor of the com.glaforge.i18n.io.SmartEncodingInputStream class. The wider the buffer is, the most sure you are to have guessed the encoding of the InputStream you wished to get a Reader from.

It is possible to defined

Parameters:
is - the InputStream of which we want to create a Reader with the encoding guessed from the first buffer of the file.
bufferLength - the length of the buffer that is used to guess the encoding.
defaultCharset - specifies the default Charset to use when an 8-bit Charset is guessed. This parameter may be null, in this case the default system charset is used as definied in the system property "file.encoding" read by the method getDefaultSystemCharset() from the class com.glaforge.i18n.io.CharsetToolkit.
enforce8Bit - enforce the use of the specified default Charset in case the encoding US-ASCII is recognized.
Throws:
java.io.IOException

SmartEncodingInputStream

public SmartEncodingInputStream(java.io.InputStream is,
                                int bufferLength,
                                java.nio.charset.Charset defaultCharset)
                         throws java.io.IOException
Constructor of the com.glaforge.i18n.io.SmartEncodingInputStream. With this constructor, the default Charset used when an 8-bit encoding is guessed does not need to be specified. The default system charset will be used instead.

Parameters:
is - is the InputStream of which we want to create a Reader with the encoding guessed from the first buffer of the file.
bufferLength - the length of the buffer that is used to guess the encoding.
defaultCharset - specifies the default Charset to use when an 8-bit Charset is guessed. This parameter may be null, in this case the default system charset is used as definied in the system property "file.encoding" read by the method getDefaultSystemCharset() from the class com.glaforge.i18n.io.CharsetToolkit.
Throws:
java.io.IOException

SmartEncodingInputStream

public SmartEncodingInputStream(java.io.InputStream is,
                                int bufferLength)
                         throws java.io.IOException
Constructor of the com.glaforge.i18n.io.SmartEncodingInputStream. With this constructor, the default Charset used when an 8-bit encoding is guessed does not need to be specified. The default system charset will be used instead.

Parameters:
is - is the InputStream of which we want to create a Reader with the encoding guessed from the first buffer of the file.
bufferLength - the length of the buffer that is used to guess the encoding.
Throws:
java.io.IOException

SmartEncodingInputStream

public SmartEncodingInputStream(java.io.InputStream is)
                         throws java.io.IOException
Constructor of the com.glaforge.i18n.io.SmartEncodingInputStream. With this constructor, the default Charset used when an 8-bit encoding is guessed does not need to be specified. The default system charset will be used instead. The buffer length does not need to be specified either. A default buffer length of 4 KB is used.

Parameters:
is - is the InputStream of which we want to create a Reader with the encoding guessed from the first buffer of the file.
Throws:
java.io.IOException
Method Detail

read

public int read()
         throws java.io.IOException
Implements the method read() as defined in the InputStream interface. As a certain number of bytes has already been read from the underlying InputStream, we first read the bytes of this buffer, otherwise, we directly read the rest of the stream from the underlying InputStream.

Specified by:
read in class java.io.InputStream
Returns:
the total number of bytes read into the buffer, or -1 is there is no more data because the end of the stream has been reached.
Throws:
java.io.IOException

getReader

public java.io.Reader getReader()
Gets a Reader with the right Charset as guessed by reading the beginning of the underlying InputStream.

Returns:
a Reader defined with the right encoding.

getEncoding

public java.nio.charset.Charset getEncoding()
Retrieves the Charset as guessed from the underlying InputStream.

Returns:
the Charset guessed.

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
Throws:
java.io.IOException