Greedy Dictionary Lookup Strategy
This module defines the GreedyDictionaryLookupStrategy
class, which is a concrete implementation of the LemmatizationStrategy
protocol.
It provides lemmatization using a greedy dictionary lookup strategy.
Classes
GreedyDictionaryLookupStrategy
Bases: LemmatizationStrategy
This class represents a lemmatization strategy that performs lemmatization using a greedy dictionary lookup strategy.
Source code in simplemma/strategies/greedy_dictionary_lookup.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|
Functions
__init__(dictionary_factory=DefaultDictionaryFactory(), steps=1, distance=5)
Initialize the Greedy Dictionary Lookup Strategy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dictionary_factory |
DictionaryFactory
|
The dictionary factory used to obtain language dictionaries.
Defaults to |
DefaultDictionaryFactory()
|
steps |
int
|
The maximum number of lemmatization steps to perform. Defaults to |
1
|
distance |
int
|
The maximum allowed Levenshtein distance between candidate lemmas. Defaults to |
5
|
Source code in simplemma/strategies/greedy_dictionary_lookup.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
get_lemma(token, lang)
Get Lemma using Greedy Dictionary Lookup Strategy
This method performs lemmatization by looking up the token in the language-specific dictionary using a greedy strategy. It iteratively applies the dictionary lookup and checks the candidate lemmas based on length and Levenshtein distance. It returns the resulting lemma after the specified number of steps or when the conditions are not met.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token |
str
|
The input token to lemmatize. |
required |
lang |
str
|
The language code for the token's language. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The lemma for the token. |
Source code in simplemma/strategies/greedy_dictionary_lookup.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
|