To Lowercase Strategy
This module defines the ToLowercaseFallbackStrategy
class, which is a concrete implementation of the LemmatizationFallbackStrategy
protocol. It represents a fallback strategy that converts tokens to lowercase for specific languages.
Classes
ToLowercaseFallbackStrategy
Bases: LemmatizationFallbackStrategy
ToLowercaseFallbackStrategy is a concrete implementation of the LemmatizationFallbackStrategy protocol. It represents a fallback strategy that converts tokens to lowercase for specific languages.
Source code in simplemma/strategies/fallback/to_lowercase.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
Functions
__init__(langs_to_lower=BETTER_LOWER)
Initialize the ToLowercaseFallbackStrategy with the specified set of languages to convert to lowercase.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
langs_to_lower |
Set[str]
|
The set of languages for which tokens should be converted to lowercase.
Defaults to |
BETTER_LOWER
|
Source code in simplemma/strategies/fallback/to_lowercase.py
20 21 22 23 24 25 26 27 28 29 |
|
get_lemma(token, lang)
Convert the token to lowercase if the language is in the set of languages to convert.
This method is called when the lemma of a token cannot be determined using other lemmatization strategies. It converts the token to lowercase if the language is in the set of languages specified during initialization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token |
str
|
The token for which the lemma could not be determined. |
required |
lang |
str
|
The language of the token. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The lowercase version of the token if the language is in the set of languages to convert, otherwise returns the original token. |
Source code in simplemma/strategies/fallback/to_lowercase.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|