Default Strategy
This module defines the DefaultStrategy class, which is a concrete implementation of the LemmatizationStrategy protocol.
It provides lemmatization using a combination of different strategies such as dictionary lookup, hyphen removal, rule-based lemmatization, prefix decomposition, and affix decomposition.
Classes
DefaultStrategy
Bases: LemmatizationStrategy
This class represents a lemmatization strategy that combines different techniques to perform lemmatization.
It implements the LemmatizationStrategy protocol.
Source code in simplemma/strategies/default.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
Functions
__init__(greedy=False, dictionary_factory=DefaultDictionaryFactory())
Initialize the Default Strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
greedy
|
bool
|
Whether to use a greedy approach for dictionary lookup. Defaults to |
False
|
dictionary_factory
|
DictionaryFactory
|
A factory for creating dictionaries.
Defaults to |
DefaultDictionaryFactory()
|
Source code in simplemma/strategies/default.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
get_lemma(token, lang)
Get the lemma for a given token and language using the combination of different lemmatization techniques.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
token
|
str
|
The token to lemmatize. |
required |
lang
|
str
|
The language of the token. |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
str | None: The lemma of the token, or None if no lemma is found. |
Source code in simplemma/strategies/default.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |