Class LanguageIdentifier


  • public class LanguageIdentifier
    extends Object
    Identify the language of a content, based on statistical analysis.
    Author:
    Sami Siren, Jérôme Charron, Hendrik Schreiber
    See Also:
    ISO 639 Language Codes
    • Method Detail

      • getMinLength

        public int getMinLength()
      • getMaxLength

        public int getMaxLength()
      • getAnalyzeLength

        public int getAnalyzeLength()
      • identify

        public String identify​(String content)
        Identify language of a content.
        Parameters:
        content - is the content to analyze.
        Returns:
        The 2 letter ISO 639 language code (en, fi, sv, ...) of the language that best matches the specified content.
      • identify

        public String identify​(StringBuilder content)
        Identify language of a content.
        Parameters:
        content - is the content to analyze.
        Returns:
        The 2 letter ISO 639 language code (en, fi, sv, ...) of the language that best matches the specified content.
      • identify

        public String identify​(InputStream is)
                        throws IOException
        Identify language from input stream. This method uses the platform default encoding to read the input stream. For using a specific encoding, use the identify(InputStream, String) method.
        Parameters:
        is - is the input stream to analyze.
        Returns:
        The 2 letter ISO 639 language code (en, fi, sv, ...) of the language that best matches the content of the specified input stream.
        Throws:
        IOException - if something wrong occurs on the input stream.
      • identify

        public String identify​(InputStream is,
                               String charset)
                        throws IOException
        Identify language from input stream.
        Parameters:
        is - is the input stream to analyze.
        charset - is the charset to use to read the input stream.
        Returns:
        The 2 letter ISO 639 language code (en, fi, sv, ...) of the language that best matches the content of the specified input stream.
        Throws:
        IOException - if something wrong occurs on the input stream.