Tag Archives: java

Generate n-gram Language Profile using the Language Detection Library for Java

For creating n-gram data from xml-formatted Wikipedia abstract files to be used with the language-detection library for Java/Processing:
  • At the same folder as the downloaded file, create a folder called profiles.
  • In Terminal, cd into that folder.
  • Run the following line by replacing the last argument with the language code of your downloaded file (here tr is used for Turkish):

java -jar /[PathToLangDetectFile]/langdetect.jar --genprofile -d ./ tr

More information: http://code.google.com/p/language-detection/wiki/Tools

Language codes: http://code.google.com/p/language-detection/wiki/LanguageList


Language Detection in Processing

Simple sketch using the Language Detection Library for Java:

import com.cybozu.labs.langdetect.*;
 Detector detector;
 void setup() {
 try {
 DetectorFactory.loadProfile( "/path/to/Documents/Processing/mysketch/code/profiles" );
 detector = DetectorFactory.create();
 catch ( Exception e ) {
 println( e.getMessage() );
 int i = millis();
 try {
 detector.append( "naber abi, nasılsın?" );
 String lang = detector.detect();
 println( ( millis() - i ) + " milliseconds to compute -> " + lang );
 catch (Exception e) {
 println( e.getMessage() );