Closeable
, AutoCloseable
@Deprecated public class ArabicLetterTokenizer extends LetterTokenizer
StandardTokenizer
instead.The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
You must specify the required Version
compatibility when creating
ArabicLetterTokenizer
:
CharTokenizer
uses an int based API to normalize and
detect token characters. See isTokenChar(int)
and
CharTokenizer.normalize(int)
for details.AttributeSource.AttributeFactory, AttributeSource.State
Constructor | Description |
---|---|
ArabicLetterTokenizer(Reader in) |
Deprecated.
use
ArabicLetterTokenizer(Version, Reader) instead. |
ArabicLetterTokenizer(AttributeSource.AttributeFactory factory,
Reader in) |
Deprecated.
|
ArabicLetterTokenizer(AttributeSource source,
Reader in) |
Deprecated.
|
ArabicLetterTokenizer(Version matchVersion,
Reader in) |
Deprecated.
Construct a new ArabicLetterTokenizer.
|
ArabicLetterTokenizer(Version matchVersion,
AttributeSource.AttributeFactory factory,
Reader in) |
Deprecated.
Construct a new ArabicLetterTokenizer using a given
AttributeSource.AttributeFactory . |
ArabicLetterTokenizer(Version matchVersion,
AttributeSource source,
Reader in) |
Deprecated.
Construct a new ArabicLetterTokenizer using a given
AttributeSource . |
Modifier and Type | Method | Description |
---|---|---|
protected boolean |
isTokenChar(int c) |
Deprecated.
Allows for Letter category or NonspacingMark category
|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
end, incrementToken, isTokenChar, normalize, normalize, reset
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
close, correctOffset
reset
public ArabicLetterTokenizer(Version matchVersion, Reader in)
matchVersion
- Lucene version
to match See {@link above}in
- the input to split up into tokenspublic ArabicLetterTokenizer(Version matchVersion, AttributeSource source, Reader in)
AttributeSource
.matchVersion
- Lucene version to match See {@link above}source
- the attribute source to use for this Tokenizerin
- the input to split up into tokenspublic ArabicLetterTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)
AttributeSource.AttributeFactory
. * @param
matchVersion Lucene version to match See
{@link above}factory
- the attribute factory to use for this Tokenizerin
- the input to split up into tokens@Deprecated public ArabicLetterTokenizer(Reader in)
ArabicLetterTokenizer(Version, Reader)
instead. This will
be removed in Lucene 4.0.@Deprecated public ArabicLetterTokenizer(AttributeSource source, Reader in)
ArabicLetterTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0.AttributeSource
.@Deprecated public ArabicLetterTokenizer(AttributeSource.AttributeFactory factory, Reader in)
ArabicLetterTokenizer(Version, AttributeSource.AttributeFactory, Reader)
instead. This will be removed in Lucene 4.0.AttributeSource.AttributeFactory
.protected boolean isTokenChar(int c)
isTokenChar
in class LetterTokenizer
LetterTokenizer.isTokenChar(int)
Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.