Skip to content

Rules Strategy

This module defines the RulesStrategy class, which is a concrete implementation of the LemmatizationStrategy protocol. It provides lemmatization by applying pre-defined rules for each language.

Classes

RulesStrategy

Bases: LemmatizationStrategy

This class represents a lemmatization strategy that performs lemmatization by applying pre-defined rules for each language. It implements the LemmatizationStrategy protocol.

Source code in simplemma/strategies/rules.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class RulesStrategy(LemmatizationStrategy):
    """
    This class represents a lemmatization strategy that performs lemmatization by applying pre-defined rules for each language.
    It implements the `LemmatizationStrategy` protocol.
    """

    __slots__ = ["_rules"]

    def __init__(
        self, rules: Dict[str, Callable[[str], Optional[str]]] = DEFAULT_RULES
    ):
        """
        Initialize the Rules Strategy.

        Args:
            rules (Dict[str, Callable[[str], Optional[str]]]): A dictionary of pre-defined rules for various languages.
                Defaults to `DEFAULT_RULES`.

        """
        self._rules = rules

    def get_lemma(self, token: str, lang: str) -> Optional[str]:
        """
        Get Lemma using Rules Strategy

        This method performs lemmatization by applying pre-defined rules for each language.
        It checks if the language has pre-defined rules defined.
        If rules are defined, it applies the corresponding rule on the token to get the lemma.
        If a lemma is found, it is returned.
        If no rules are defined for the language or no lemma is found, None is returned.

        Args:
            token (str): The input token to lemmatize.
            lang (str): The language code for the token's language.

        Returns:
            Optional[str]: The lemma for the token, or None if no lemma is found.

        """
        if lang not in self._rules:
            return None

        return self._rules[lang](token)

Functions

__init__(rules=DEFAULT_RULES)

Initialize the Rules Strategy.

Parameters:

Name Type Description Default
rules Dict[str, Callable[[str], Optional[str]]]

A dictionary of pre-defined rules for various languages. Defaults to DEFAULT_RULES.

DEFAULT_RULES
Source code in simplemma/strategies/rules.py
21
22
23
24
25
26
27
28
29
30
31
32
def __init__(
    self, rules: Dict[str, Callable[[str], Optional[str]]] = DEFAULT_RULES
):
    """
    Initialize the Rules Strategy.

    Args:
        rules (Dict[str, Callable[[str], Optional[str]]]): A dictionary of pre-defined rules for various languages.
            Defaults to `DEFAULT_RULES`.

    """
    self._rules = rules
get_lemma(token, lang)

Get Lemma using Rules Strategy

This method performs lemmatization by applying pre-defined rules for each language. It checks if the language has pre-defined rules defined. If rules are defined, it applies the corresponding rule on the token to get the lemma. If a lemma is found, it is returned. If no rules are defined for the language or no lemma is found, None is returned.

Parameters:

Name Type Description Default
token str

The input token to lemmatize.

required
lang str

The language code for the token's language.

required

Returns:

Type Description
Optional[str]

Optional[str]: The lemma for the token, or None if no lemma is found.

Source code in simplemma/strategies/rules.py
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def get_lemma(self, token: str, lang: str) -> Optional[str]:
    """
    Get Lemma using Rules Strategy

    This method performs lemmatization by applying pre-defined rules for each language.
    It checks if the language has pre-defined rules defined.
    If rules are defined, it applies the corresponding rule on the token to get the lemma.
    If a lemma is found, it is returned.
    If no rules are defined for the language or no lemma is found, None is returned.

    Args:
        token (str): The input token to lemmatize.
        lang (str): The language code for the token's language.

    Returns:
        Optional[str]: The lemma for the token, or None if no lemma is found.

    """
    if lang not in self._rules:
        return None

    return self._rules[lang](token)