|
Jacson | ||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object de.spieleck.app.lang.StemmerDE
A stemmer for German words. The algorithm is based on the report "A Fast and Simple Stemming Algorithm for German Words" by J”rg Caumanns (joerg.caumanns@isst.fhg.de). This implementation is based on code of Gerhard Schwarz for the Apache Lucene project.
Field Summary | |
static char |
CH_TOKEN
|
static char |
EI_TOKEN
|
static char |
IE_TOKEN
|
static char |
IG_TOKEN
|
static char |
REP_TOKEN
|
static char |
SCH_TOKEN
|
static char |
ST_TOKEN
|
Constructor Summary | |
StemmerDE()
|
Method Summary | |
protected void |
deleteParticleDenotion(java.lang.StringBuffer term)
Removes a particle denotion ("ge") from a term. |
protected boolean |
isStemmable(java.lang.String term)
Checks if a term could be stemmed. |
protected void |
optimizations(java.lang.StringBuffer term)
Does some optimizations on the term. |
protected void |
resubstituteSpecialChars(java.lang.StringBuffer term)
Undoes the changes made by substituteSpecialChars(). |
java.lang.String |
stem(java.lang.String term)
Stemms the given term to an unique discriminator. |
protected void |
stripSuffixes(java.lang.StringBuffer term,
boolean lowerCase)
suffix stripping (stemming) on the current term. |
protected void |
substituteSpecialChars(java.lang.StringBuffer term)
Do some substitutions for the term to reduce overstemming: - Substitute Umlauts with their corresponding vowel: äöü -> aou, "ß" is substituted by "ss" - Substitute a second char of a pair of equal characters with an asterisk: ?? |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final char REP_TOKEN
public static final char SCH_TOKEN
public static final char CH_TOKEN
public static final char EI_TOKEN
public static final char IE_TOKEN
public static final char IG_TOKEN
public static final char ST_TOKEN
Constructor Detail |
public StemmerDE()
Method Detail |
public java.lang.String stem(java.lang.String term)
stem
in interface Stemmer
term
- The term that should be stemmed.
protected boolean isStemmable(java.lang.String term)
protected void stripSuffixes(java.lang.StringBuffer term, boolean lowerCase)
protected void optimizations(java.lang.StringBuffer term)
protected void deleteParticleDenotion(java.lang.StringBuffer term)
protected void substituteSpecialChars(java.lang.StringBuffer term)
protected void resubstituteSpecialChars(java.lang.StringBuffer term)
|
Spieleck | ||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |