Skip to content

Dictionary Lookup Strategy

This module defines the DictionaryLookupStrategy class, which is a concrete implementation of the LemmatizationStrategy protocol. It provides lemmatization using dictionary lookup.

Classes

DictionaryLookupStrategy

Bases: LemmatizationStrategy

Dictionary Lookup Strategy

Source code in simplemma/strategies/dictionary_lookup.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class DictionaryLookupStrategy(LemmatizationStrategy):
    """Dictionary Lookup Strategy"""

    __slots__ = ["_dictionary_factory"]

    def __init__(
        self, dictionary_factory: DictionaryFactory = DefaultDictionaryFactory()
    ):
        """
        Initialize the Dictionary Lookup Strategy.

        Args:
            dictionary_factory (DictionaryFactory): The dictionary factory used to obtain language dictionaries.
                Defaults to [`DefaultDictionaryFactory()`][simplemma.strategies.dictionaries.dictionary_factory.DefaultDictionaryFactory].
        """
        self._dictionary_factory = dictionary_factory

    def get_lemma(self, token: str, lang: str) -> Optional[str]:
        """
        Get Lemma using Dictionary Lookup

        This method performs lemmatization by looking up the token in the language-specific dictionary.
        It returns the lemma if found, or `None` if not found.

        Args:
            token (str): The input token to lemmatize.
            lang (str): The language code for the token's language.

        Returns:
            Optional[str]: The lemma for the token, or `None` if not found in the dictionary.

        """
        # Search the language data, reverse case to extend coverage.
        dictionary = self._dictionary_factory.get_dictionary(lang)
        if token in dictionary:
            return dictionary[token]
        # Try upper or lowercase.
        token = token.lower() if token[0].isupper() else token.capitalize()
        return dictionary.get(token)

Functions

__init__(dictionary_factory=DefaultDictionaryFactory())

Initialize the Dictionary Lookup Strategy.

Parameters:

Name Type Description Default
dictionary_factory DictionaryFactory

The dictionary factory used to obtain language dictionaries. Defaults to DefaultDictionaryFactory().

DefaultDictionaryFactory()
Source code in simplemma/strategies/dictionary_lookup.py
17
18
19
20
21
22
23
24
25
26
27
def __init__(
    self, dictionary_factory: DictionaryFactory = DefaultDictionaryFactory()
):
    """
    Initialize the Dictionary Lookup Strategy.

    Args:
        dictionary_factory (DictionaryFactory): The dictionary factory used to obtain language dictionaries.
            Defaults to [`DefaultDictionaryFactory()`][simplemma.strategies.dictionaries.dictionary_factory.DefaultDictionaryFactory].
    """
    self._dictionary_factory = dictionary_factory
get_lemma(token, lang)

Get Lemma using Dictionary Lookup

This method performs lemmatization by looking up the token in the language-specific dictionary. It returns the lemma if found, or None if not found.

Parameters:

Name Type Description Default
token str

The input token to lemmatize.

required
lang str

The language code for the token's language.

required

Returns:

Type Description
Optional[str]

Optional[str]: The lemma for the token, or None if not found in the dictionary.

Source code in simplemma/strategies/dictionary_lookup.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def get_lemma(self, token: str, lang: str) -> Optional[str]:
    """
    Get Lemma using Dictionary Lookup

    This method performs lemmatization by looking up the token in the language-specific dictionary.
    It returns the lemma if found, or `None` if not found.

    Args:
        token (str): The input token to lemmatize.
        lang (str): The language code for the token's language.

    Returns:
        Optional[str]: The lemma for the token, or `None` if not found in the dictionary.

    """
    # Search the language data, reverse case to extend coverage.
    dictionary = self._dictionary_factory.get_dictionary(lang)
    if token in dictionary:
        return dictionary[token]
    # Try upper or lowercase.
    token = token.lower() if token[0].isupper() else token.capitalize()
    return dictionary.get(token)