Package com.tagtraum.core.lang
Class LanguageIdentifier
- java.lang.Object
-
- com.tagtraum.core.lang.LanguageIdentifier
-
public class LanguageIdentifier extends Object
Identify the language of a content, based on statistical analysis.- Author:
- Sami Siren, Jérôme Charron, Hendrik Schreiber
- See Also:
- ISO 639 Language Codes
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getAnalyzeLength()
static LanguageIdentifier
getInstance()
int
getMaxLength()
int
getMinLength()
String
identify(InputStream is)
Identify language from input stream.String
identify(InputStream is, String charset)
Identify language from input stream.String
identify(String content)
Identify language of a content.String
identify(StringBuilder content)
Identify language of a content.
-
-
-
Method Detail
-
getInstance
public static LanguageIdentifier getInstance()
-
getMinLength
public int getMinLength()
-
getMaxLength
public int getMaxLength()
-
getAnalyzeLength
public int getAnalyzeLength()
-
identify
public String identify(String content)
Identify language of a content.- Parameters:
content
- is the content to analyze.- Returns:
- The 2 letter ISO 639 language code (en, fi, sv, ...) of the language that best matches the specified content.
-
identify
public String identify(StringBuilder content)
Identify language of a content.- Parameters:
content
- is the content to analyze.- Returns:
- The 2 letter ISO 639 language code (en, fi, sv, ...) of the language that best matches the specified content.
-
identify
public String identify(InputStream is) throws IOException
Identify language from input stream. This method uses the platform default encoding to read the input stream. For using a specific encoding, use theidentify(InputStream, String)
method.- Parameters:
is
- is the input stream to analyze.- Returns:
- The 2 letter ISO 639 language code (en, fi, sv, ...) of the language that best matches the content of the specified input stream.
- Throws:
IOException
- if something wrong occurs on the input stream.
-
identify
public String identify(InputStream is, String charset) throws IOException
Identify language from input stream.- Parameters:
is
- is the input stream to analyze.charset
- is the charset to use to read the input stream.- Returns:
- The 2 letter ISO 639 language code (en, fi, sv, ...) of the language that best matches the content of the specified input stream.
- Throws:
IOException
- if something wrong occurs on the input stream.
-
-