Skip to content

medspacy.target_matcher.concept_tagger

ConceptTagger

ConceptTagger is a component for setting an attribute on tokens contained in spans extracted by TargetRules. This can be used for tasks such as semantic labeling or for normalizing tokens, making downstream extraction simpler.

A common use case is when a single concept can have many synonyms or variants and downstream rules would be simplified by matching on a unified token tag for those synonyms rather than including the entire synonym list in each downstream rule.

Source code in medspacy/target_matcher/concept_tagger.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
@Language.factory("medspacy_concept_tagger")
class ConceptTagger:
    """ConceptTagger is a component for setting an attribute on tokens contained in spans extracted by TargetRules. This
    can be used for tasks such as semantic labeling or for normalizing tokens, making downstream extraction simpler.

    A common use case is when a single concept can have many synonyms or variants and downstream rules would be
    simplified by matching on a unified token tag for those synonyms rather than including the entire synonym list in
    each downstream rule.
    """

    def __init__(
        self,
        nlp: Language,
        name: str = "medspacy_concept_tagger",
        attr_name: str = "concept_tag",
    ):
        """
        Creates a new ConceptTagger.

        Args:
            nlp: A spaCy Language model.
            name: The name of the ConceptTagger component. Must be a valid python variable name.
            attr_name: The name of the attribute to set to tokens.
        """
        self.nlp = nlp
        self.name = name
        self._attr_name = attr_name
        self.__matcher = MedspacyMatcher(nlp, name=name)

        # If the token attribute hasn't been registered, add it now
        # If it has already been set, then we can pass.
        # This will happen, for example, if you've already instantiated
        # the ConceptTagger and it registered the attribute.
        if not Token.has_extension(attr_name):
            Token.set_extension(attr_name, default="")

    @property
    def attr_name(self) -> str:
        """
        The name of the attribute that will be set on each matched token.

        Returns:
            The attribute name.
        """
        return self._attr_name

    def add(self, rules: Union[TargetRule, List[TargetRule]]):
        """
        Adds a single TargetRule or a list of TargetRules to the ConceptTagger.

        Args:
            rules: A single TargetRule or a collection of TargetRules.
        """
        self.__matcher.add(rules)

    def __call__(self, doc: Doc) -> Doc:
        """
        Call ConceptTagger on a doc. Matches spans and assigns attributes to all tokens contained in those spans, but
        does not preserve the spans themselves.

        Args:
            doc: The spaCy Doc to process.

        Returns:
            The spaCy Doc processed.
        """
        matches = self.__matcher(doc)
        for (rule_id, start, end) in matches:
            rule = self.__matcher.rule_map[self.nlp.vocab.strings[rule_id]]
            for i in range(start, end):
                setattr(doc[i]._, self.attr_name, rule.category)

        return doc

attr_name property

The name of the attribute that will be set on each matched token.

Returns:

Type Description
str

The attribute name.

__call__(doc)

Call ConceptTagger on a doc. Matches spans and assigns attributes to all tokens contained in those spans, but does not preserve the spans themselves.

Parameters:

Name Type Description Default
doc Doc

The spaCy Doc to process.

required

Returns:

Type Description
Doc

The spaCy Doc processed.

Source code in medspacy/target_matcher/concept_tagger.py
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def __call__(self, doc: Doc) -> Doc:
    """
    Call ConceptTagger on a doc. Matches spans and assigns attributes to all tokens contained in those spans, but
    does not preserve the spans themselves.

    Args:
        doc: The spaCy Doc to process.

    Returns:
        The spaCy Doc processed.
    """
    matches = self.__matcher(doc)
    for (rule_id, start, end) in matches:
        rule = self.__matcher.rule_map[self.nlp.vocab.strings[rule_id]]
        for i in range(start, end):
            setattr(doc[i]._, self.attr_name, rule.category)

    return doc

__init__(nlp, name='medspacy_concept_tagger', attr_name='concept_tag')

Creates a new ConceptTagger.

Parameters:

Name Type Description Default
nlp Language

A spaCy Language model.

required
name str

The name of the ConceptTagger component. Must be a valid python variable name.

'medspacy_concept_tagger'
attr_name str

The name of the attribute to set to tokens.

'concept_tag'
Source code in medspacy/target_matcher/concept_tagger.py
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def __init__(
    self,
    nlp: Language,
    name: str = "medspacy_concept_tagger",
    attr_name: str = "concept_tag",
):
    """
    Creates a new ConceptTagger.

    Args:
        nlp: A spaCy Language model.
        name: The name of the ConceptTagger component. Must be a valid python variable name.
        attr_name: The name of the attribute to set to tokens.
    """
    self.nlp = nlp
    self.name = name
    self._attr_name = attr_name
    self.__matcher = MedspacyMatcher(nlp, name=name)

    # If the token attribute hasn't been registered, add it now
    # If it has already been set, then we can pass.
    # This will happen, for example, if you've already instantiated
    # the ConceptTagger and it registered the attribute.
    if not Token.has_extension(attr_name):
        Token.set_extension(attr_name, default="")

add(rules)

Adds a single TargetRule or a list of TargetRules to the ConceptTagger.

Parameters:

Name Type Description Default
rules Union[TargetRule, List[TargetRule]]

A single TargetRule or a collection of TargetRules.

required
Source code in medspacy/target_matcher/concept_tagger.py
56
57
58
59
60
61
62
63
def add(self, rules: Union[TargetRule, List[TargetRule]]):
    """
    Adds a single TargetRule or a list of TargetRules to the ConceptTagger.

    Args:
        rules: A single TargetRule or a collection of TargetRules.
    """
    self.__matcher.add(rules)