Package com.tagtraum.core.lang
Class NGramProfile
- java.lang.Object
-
- com.tagtraum.core.lang.NGramProfile
-
public class NGramProfile extends Object
This class runs an ngram analysis over submitted text, results might be used for automatic language identification. The similarity calculation is at experimental level. You have been warned. Methods are provided to build new NGramProfiles profiles.- Author:
- Sami Siren, Jerome Charron - http://frutch.free.fr/, Hendrik Schreiber
-
-
Constructor Summary
Constructors Constructor Description NGramProfile(String name, int minlen, int maxlen)
Construct a new ngram profile
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(StringBuffer word)
Add ngrams from a single word to this profilevoid
analyze(StringBuilder text)
Analyze a piece of textstatic NGramProfile
create(String name, InputStream is, String encoding)
Create a new Language profile from (preferably quite large) text fileString
getName()
List<com.tagtraum.core.lang.NGramProfile.NGramEntry>
getSorted()
Return a sorted list of ngrams (sort done by 1.void
load(InputStream is)
Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)static void
main(String[] args)
main method used for testing onlyprotected void
normalize()
Normalize the profile (calculates the ngrams frequencies)void
save(OutputStream os)
Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encodingString
toString()
-
-
-
Constructor Detail
-
NGramProfile
public NGramProfile(String name, int minlen, int maxlen)
Construct a new ngram profile- Parameters:
name
- is the name of the profileminlen
- is the min length of ngram sequencesmaxlen
- is the max length of ngram sequences
-
-
Method Detail
-
getName
public String getName()
- Returns:
- Returns the name.
-
add
public void add(StringBuffer word)
Add ngrams from a single word to this profile- Parameters:
word
- is the word to add
-
analyze
public void analyze(StringBuilder text)
Analyze a piece of text- Parameters:
text
- the text to be analyzed
-
normalize
protected void normalize()
Normalize the profile (calculates the ngrams frequencies)
-
getSorted
public List<com.tagtraum.core.lang.NGramProfile.NGramEntry> getSorted()
Return a sorted list of ngrams (sort done by 1. frequency 2. sequence)- Returns:
- sorted vector of ngrams
-
load
public void load(InputStream is) throws IOException
Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)- Parameters:
is
- the InputStream to read- Throws:
IOException
-
create
public static NGramProfile create(String name, InputStream is, String encoding)
Create a new Language profile from (preferably quite large) text file- Parameters:
name
- is thename of profileis
- is the stream to readencoding
- is the encoding of stream
-
save
public void save(OutputStream os) throws IOException
Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encoding- Parameters:
os
- the Stream to output to- Throws:
IOException
-
main
public static void main(String[] args)
main method used for testing only- Parameters:
args
- args
-
-