Closeable
, AutoCloseable
public final class CJKBigramFilter
extends org.apache.lucene.analysis.TokenFilter
CJK types are set by these tokenizers, but you can also use
CJKBigramFilter(TokenStream, int)
to explicitly control which
of the CJK scripts are turned into bigrams.
In all cases, all non-CJK input is passed thru unmodified.
Modifier and Type | Field | Description |
---|---|---|
static String |
DOUBLE_TYPE |
when we emit a bigram, its then marked as this type
|
static int |
HAN |
bigram flag for Han Ideographs
|
static int |
HANGUL |
bigram flag for Hangul
|
static int |
HIRAGANA |
bigram flag for Hiragana
|
static int |
KATAKANA |
bigram flag for Katakana
|
static String |
SINGLE_TYPE |
when we emit a unigram, its then marked as this type
|
Constructor | Description |
---|---|
CJKBigramFilter(org.apache.lucene.analysis.TokenStream in) |
|
CJKBigramFilter(org.apache.lucene.analysis.TokenStream in,
int flags) |
Create a new CJKBigramFilter, specifying which writing systems should be bigrammed.
|
Modifier and Type | Method | Description |
---|---|---|
boolean |
incrementToken() |
|
void |
reset() |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public static final int HAN
public static final int HIRAGANA
public static final int KATAKANA
public static final int HANGUL
public static final String DOUBLE_TYPE
public static final String SINGLE_TYPE
public CJKBigramFilter(org.apache.lucene.analysis.TokenStream in)
public boolean incrementToken() throws IOException
incrementToken
in class org.apache.lucene.analysis.TokenStream
IOException
public void reset() throws IOException
reset
in class org.apache.lucene.analysis.TokenFilter
IOException
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.