Skip to content

medspacy.context

ConText

The ConText for spaCy processing.

This component matches modifiers in a Doc, defines their scope, and identifies edges between targets and modifiers. Sets two spaCy extensions: - Span..modifiers: a list of ConTextModifier objects which modify a target Span - Doc..context_graph: a ConText graph object which contains the targets, modifiers, and edges between them.

Source code in medspacy/context/context.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
@Language.factory("medspacy_context")
class ConText:
    """
    The ConText for spaCy processing.

    This component matches modifiers in a Doc, defines their scope, and identifies edges between targets and modifiers.
    Sets two spaCy extensions:
            - Span._.modifiers: a list of ConTextModifier objects which modify a target Span
            - Doc._.context_graph: a ConText graph object which contains the targets,
                modifiers, and edges between them.
    """

    def __init__(
        self,
        nlp: Language,
        name: str = "medspacy_context",
        rules: Optional[str] = "default",
        language_code: str = 'en',
        phrase_matcher_attr: str = "LOWER",
        allowed_types: Optional[Set[str]] = None,
        excluded_types: Optional[Set[str]] = None,
        terminating_types: Optional[Dict[str, Iterable[str]]] = None,
        max_scope: Optional[int] = None,
        max_targets: Optional[int] = None,
        prune_on_modifier_overlap: bool = True,
        prune_on_target_overlap: bool = False,
        span_attrs: Union[
            Literal["default"], Dict[str, Dict[str, Any]], None
        ] = "default",
        input_span_type: Union[Literal["ents", "group"]] = "ents",
        span_group_name: str = "medspacy_spans",
    ):
        """
        Creates a new ConText object.

        Args:
            nlp: A SpaCy Language object.
            name: The name of the component.
            rules: The rules to load. Default is "default", loads rules packaged with medspaCy that are derived from
                original ConText rules and years of practical applications at the US Department of Veterans Affairs.  If
                None, no rules are loaded. Otherwise, must be a path to a json file containing rules. Add ConTextRules
                directly through `ConText.add`.
            language_code: Language code to use (ISO code) as a default for loading resources.  See documentation
                and also the /resources directory to see which resources might be available in each language.
                Default is "en" for English.
            phrase_matcher_attr: The token attribute to use for PhraseMatcher for rules where `pattern` is None. Default
                is 'LOWER'.
            allowed_types: A global list of types included by context. Rules will operate on only spans with these
                labels.
            excluded_types: A global list of types excluded by context. Rules will not operate on spans with these
                labels.
            terminating_types: A global map of types to the types that can terminate them. This can be used to apply
                terminations to all rules of a particular type rather than adding to every rule individually in the
                ContextRule object.
            max_scope: The number of tokens around a modifier in a target can be modified. Default value is None,
                Context will use the sentence boundaries. If a value greater than zero, applies the window globally.
                Both options will be overridden by a more specific value in a ContextRule.
            max_targets: The maximum number of targets a modifier can modify. Default value is None, context will modify
                all targets in its scope. If a value greater than zero, applies this value globally. Both options will
                be overridden by a more specific value in a ContextRule.
            prune_on_modifier_overlap: Whether to prune modifiers which are substrings of another modifier. If True,
                will drop substrings completely. For example, if "no history of"  and "history of" are both
                ConTextRules,both will match the text "no history of afib", but only "no  history of" should modify
                afib. Default True.
            prune_on_target_overlap: Whether to remove any matched modifiers which overlap with target entities. If
                False, any overlapping modifiers will not modify the overlapping entity but will still modify any other
                targets in its scope. Default False.
            span_attrs: The optional span attributes to modify. Default option "default" uses attributes in
                `DEFAULT_ATTRIBUTES`. If a dictionary, format is mapping context modifier categories to a dictionary
                containing the attribute name and the value to set the attribute to when a  span is modified by a
                modifier of that category. If None, no attributes will be modified.
            input_span_type: "ents" or "group". Where to look for targets. "ents" will modify attributes of spans
                in doc.ents. "group" will modify attributes of spans in the span group specified by `span_group_name`.
            span_group_name: The name of the span group used when `input_span_type` is "group". Default is
                "medspacy_spans".
        """
        self.nlp = nlp
        self.name = name
        self.prune_on_modifier_overlap = prune_on_modifier_overlap
        self.prune_on_target_overlap = prune_on_target_overlap
        self.input_span_type = input_span_type
        self.span_group_name = span_group_name
        self.context_attributes_mapping = None

        self.DEFAULT_RULES_FILEPATH = path.join(
            Path(__file__).resolve().parents[2], "resources", language_code.lower(), "context_rules.json"
        )

        self.__matcher = MedspacyMatcher(
            nlp,
            name=name,
            phrase_matcher_attr=phrase_matcher_attr,
            prune=prune_on_modifier_overlap,
        )

        if span_attrs == "default":
            self.context_attributes_mapping = DEFAULT_ATTRIBUTES
            self.register_default_attributes()
        elif span_attrs:
            for _, attr_dict in span_attrs.items():
                for attr_name in attr_dict.keys():
                    if not Span.has_extension(attr_name):
                        raise ValueError(
                            f"Custom extension {attr_name} has not been set. Please ensure Span.set_extension is "
                            f"called for your pipeline's custom extensions."
                        )
            self.context_attributes_mapping = span_attrs

        self.register_graph_attributes()

        if max_scope is not None:
            if not (isinstance(max_scope, int) and max_scope > 0):
                raise ValueError(
                    f"If 'max_scope' must be a value greater than 0, not the current value: {max_scope}"
                )
        self.max_scope = max_scope

        self.allowed_types = allowed_types
        self.excluded_types = excluded_types
        self.max_targets = max_targets

        self.terminating_types = dict()
        if terminating_types:
            self.terminating_types = {
                k.upper(): v for (k, v) in terminating_types.items()
            }

        rule_path = None
        if rules == "default":
            rule_path = self.DEFAULT_RULES_FILEPATH
        else:
            rule_path = rules

        if rule_path:
            self.add(ConTextRule.from_json(rule_path))

    @property
    def rules(self):
        """
        Returns list of ConTextRules available to context.
        """
        return self.__matcher.rules

    @property
    def categories(self):
        """
        Returns list of categories available that Context might produce.
        """
        return self.__matcher.labels

    @property
    def input_span_type(self):
        """
        The input source of entities for the component. Must be either "ents" corresponding to doc.ents or "group" for
        a spaCy span group.

        Returns:
            The input type, "ents" or "group".
        """
        return self._input_span_type

    @input_span_type.setter
    def input_span_type(self, val):
        if not (val == "ents" or val == "group"):
            raise ValueError('input_type must be "ents" or "group".')
        self._input_span_type = val

    @property
    def span_group_name(self) -> str:
        """
        The name of the span group used by this component. If `input_type` is "group", calling this component will
        use spans in the span group with this name.

        Returns:
            The span group name.
        """
        return self._span_group_name

    @span_group_name.setter
    def span_group_name(self, name: str):
        if not name or not isinstance(name, str):
            raise ValueError("Span group name must be a string.")
        self._span_group_name = name

    def add(self, rules):
        """
        Adds ConTextRules to Context.

        Args:
            rules: A single ConTextRule or a collection of ConTextRules to add to the Sectionizer.
        """
        if isinstance(rules, ConTextRule):
            rules = [rules]
        for rule in rules:
            if not isinstance(rule, ConTextRule):
                raise TypeError(f"Rules must type ConTextRule, not {type(rule)}.")

            # If global attributes like allowed_types and max_scope are defined,
            # check if the ConTextRule has them defined. If not, set to the global
            for attr in (
                "allowed_types",
                "excluded_types",
                "max_scope",
                "max_targets",
            ):
                value = getattr(self, attr)
                if value is None:  # No global value set
                    continue
                if (
                    getattr(rule, attr) is None
                ):  # If the direction itself has it defined, don't override
                    setattr(rule, attr, value)

            # Check custom termination points
            if rule.category.upper() in self.terminating_types:
                for other_modifier in self.terminating_types[rule.category.upper()]:
                    rule.terminated_by.add(other_modifier.upper())

        self.__matcher.add(rules)

    @classmethod
    def register_graph_attributes(cls):
        """
        Registers spaCy attribute extensions: Span._.modifiers and Doc._.context_graph.
        """
        try:
            Span.set_extension("modifiers", default=(), force=True)
            Doc.set_extension("context_graph", default=None, force=True)
        except ValueError:  # Extension already set
            pass

    @classmethod
    def register_default_attributes(cls):
        """
        Registers the default values for the Span attributes defined in `DEFAULT_ATTRIBUTES`.
        """
        for attr_name in [
            "is_negated",
            "is_uncertain",
            "is_historical",
            "is_hypothetical",
            "is_family",
        ]:
            try:
                Span.set_extension(attr_name, default=False)
            except ValueError:  # Extension already set
                pass

    def set_context_attributes(self, edges):
        """
        Adds Span-level attributes to targets with modifiers.

        Args:
            edges: The edges of the ContextGraph to modify.
        """
        for (target, modifier) in edges:
            if modifier.category in self.context_attributes_mapping:
                attr_dict = self.context_attributes_mapping[modifier.category]
                for attr_name, attr_value in attr_dict.items():
                    setattr(target._, attr_name, attr_value)

    def __call__(self, doc, targets: str = None) -> Doc:
        """
        Applies the ConText algorithm to a Doc.

        Args:
            doc: The spaCy Doc to process.
            targets: The optional custom attribute extension on doc to run over. Must contain an iterable of Span objects

        Returns:
            The processed spaCy Doc.
        """
        if not targets and self.input_span_type == "ents":
            targets = doc.ents
        elif not targets and self.input_span_type == "group":
            targets = doc.spans[self.span_group_name]
        elif targets:
            targets = getattr(doc._, targets)
        # Store data in ConTextGraph object
        # TODO: move some of this over to ConTextGraph
        context_graph = ConTextGraph(
            prune_on_modifier_overlap=self.prune_on_target_overlap
        )

        context_graph.targets = targets

        context_graph.modifiers = []
        matches = self.__matcher(doc)

        for (match_id, start, end) in matches:
            # Get the ConTextRule object defining this modifier
            rule = self.__matcher.rule_map[self.nlp.vocab[match_id].text]
            modifier = ConTextModifier(rule, start, end, doc, max_scope=self.max_scope)
            context_graph.modifiers.append(modifier)

        context_graph.update_scopes()
        context_graph.apply_modifiers()

        # Link targets to their modifiers
        for target, modifier in context_graph.edges:
            target._.modifiers += (modifier,)

        # If attributes need to be modified
        if self.context_attributes_mapping:
            self.set_context_attributes(context_graph.edges)

        doc._.context_graph = context_graph

        return doc

categories property

Returns list of categories available that Context might produce.

input_span_type property writable

The input source of entities for the component. Must be either "ents" corresponding to doc.ents or "group" for a spaCy span group.

Returns:

Type Description

The input type, "ents" or "group".

rules property

Returns list of ConTextRules available to context.

span_group_name property writable

The name of the span group used by this component. If input_type is "group", calling this component will use spans in the span group with this name.

Returns:

Type Description
str

The span group name.

__call__(doc, targets=None)

Applies the ConText algorithm to a Doc.

Parameters:

Name Type Description Default
doc

The spaCy Doc to process.

required
targets str

The optional custom attribute extension on doc to run over. Must contain an iterable of Span objects

None

Returns:

Type Description
Doc

The processed spaCy Doc.

Source code in medspacy/context/context.py
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
def __call__(self, doc, targets: str = None) -> Doc:
    """
    Applies the ConText algorithm to a Doc.

    Args:
        doc: The spaCy Doc to process.
        targets: The optional custom attribute extension on doc to run over. Must contain an iterable of Span objects

    Returns:
        The processed spaCy Doc.
    """
    if not targets and self.input_span_type == "ents":
        targets = doc.ents
    elif not targets and self.input_span_type == "group":
        targets = doc.spans[self.span_group_name]
    elif targets:
        targets = getattr(doc._, targets)
    # Store data in ConTextGraph object
    # TODO: move some of this over to ConTextGraph
    context_graph = ConTextGraph(
        prune_on_modifier_overlap=self.prune_on_target_overlap
    )

    context_graph.targets = targets

    context_graph.modifiers = []
    matches = self.__matcher(doc)

    for (match_id, start, end) in matches:
        # Get the ConTextRule object defining this modifier
        rule = self.__matcher.rule_map[self.nlp.vocab[match_id].text]
        modifier = ConTextModifier(rule, start, end, doc, max_scope=self.max_scope)
        context_graph.modifiers.append(modifier)

    context_graph.update_scopes()
    context_graph.apply_modifiers()

    # Link targets to their modifiers
    for target, modifier in context_graph.edges:
        target._.modifiers += (modifier,)

    # If attributes need to be modified
    if self.context_attributes_mapping:
        self.set_context_attributes(context_graph.edges)

    doc._.context_graph = context_graph

    return doc

__init__(nlp, name='medspacy_context', rules='default', language_code='en', phrase_matcher_attr='LOWER', allowed_types=None, excluded_types=None, terminating_types=None, max_scope=None, max_targets=None, prune_on_modifier_overlap=True, prune_on_target_overlap=False, span_attrs='default', input_span_type='ents', span_group_name='medspacy_spans')

Creates a new ConText object.

Parameters:

Name Type Description Default
nlp Language

A SpaCy Language object.

required
name str

The name of the component.

'medspacy_context'
rules Optional[str]

The rules to load. Default is "default", loads rules packaged with medspaCy that are derived from original ConText rules and years of practical applications at the US Department of Veterans Affairs. If None, no rules are loaded. Otherwise, must be a path to a json file containing rules. Add ConTextRules directly through ConText.add.

'default'
language_code str

Language code to use (ISO code) as a default for loading resources. See documentation and also the /resources directory to see which resources might be available in each language. Default is "en" for English.

'en'
phrase_matcher_attr str

The token attribute to use for PhraseMatcher for rules where pattern is None. Default is 'LOWER'.

'LOWER'
allowed_types Optional[Set[str]]

A global list of types included by context. Rules will operate on only spans with these labels.

None
excluded_types Optional[Set[str]]

A global list of types excluded by context. Rules will not operate on spans with these labels.

None
terminating_types Optional[Dict[str, Iterable[str]]]

A global map of types to the types that can terminate them. This can be used to apply terminations to all rules of a particular type rather than adding to every rule individually in the ContextRule object.

None
max_scope Optional[int]

The number of tokens around a modifier in a target can be modified. Default value is None, Context will use the sentence boundaries. If a value greater than zero, applies the window globally. Both options will be overridden by a more specific value in a ContextRule.

None
max_targets Optional[int]

The maximum number of targets a modifier can modify. Default value is None, context will modify all targets in its scope. If a value greater than zero, applies this value globally. Both options will be overridden by a more specific value in a ContextRule.

None
prune_on_modifier_overlap bool

Whether to prune modifiers which are substrings of another modifier. If True, will drop substrings completely. For example, if "no history of" and "history of" are both ConTextRules,both will match the text "no history of afib", but only "no history of" should modify afib. Default True.

True
prune_on_target_overlap bool

Whether to remove any matched modifiers which overlap with target entities. If False, any overlapping modifiers will not modify the overlapping entity but will still modify any other targets in its scope. Default False.

False
span_attrs Union[Literal['default'], Dict[str, Dict[str, Any]], None]

The optional span attributes to modify. Default option "default" uses attributes in DEFAULT_ATTRIBUTES. If a dictionary, format is mapping context modifier categories to a dictionary containing the attribute name and the value to set the attribute to when a span is modified by a modifier of that category. If None, no attributes will be modified.

'default'
input_span_type Union[Literal['ents', 'group']]

"ents" or "group". Where to look for targets. "ents" will modify attributes of spans in doc.ents. "group" will modify attributes of spans in the span group specified by span_group_name.

'ents'
span_group_name str

The name of the span group used when input_span_type is "group". Default is "medspacy_spans".

'medspacy_spans'
Source code in medspacy/context/context.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
def __init__(
    self,
    nlp: Language,
    name: str = "medspacy_context",
    rules: Optional[str] = "default",
    language_code: str = 'en',
    phrase_matcher_attr: str = "LOWER",
    allowed_types: Optional[Set[str]] = None,
    excluded_types: Optional[Set[str]] = None,
    terminating_types: Optional[Dict[str, Iterable[str]]] = None,
    max_scope: Optional[int] = None,
    max_targets: Optional[int] = None,
    prune_on_modifier_overlap: bool = True,
    prune_on_target_overlap: bool = False,
    span_attrs: Union[
        Literal["default"], Dict[str, Dict[str, Any]], None
    ] = "default",
    input_span_type: Union[Literal["ents", "group"]] = "ents",
    span_group_name: str = "medspacy_spans",
):
    """
    Creates a new ConText object.

    Args:
        nlp: A SpaCy Language object.
        name: The name of the component.
        rules: The rules to load. Default is "default", loads rules packaged with medspaCy that are derived from
            original ConText rules and years of practical applications at the US Department of Veterans Affairs.  If
            None, no rules are loaded. Otherwise, must be a path to a json file containing rules. Add ConTextRules
            directly through `ConText.add`.
        language_code: Language code to use (ISO code) as a default for loading resources.  See documentation
            and also the /resources directory to see which resources might be available in each language.
            Default is "en" for English.
        phrase_matcher_attr: The token attribute to use for PhraseMatcher for rules where `pattern` is None. Default
            is 'LOWER'.
        allowed_types: A global list of types included by context. Rules will operate on only spans with these
            labels.
        excluded_types: A global list of types excluded by context. Rules will not operate on spans with these
            labels.
        terminating_types: A global map of types to the types that can terminate them. This can be used to apply
            terminations to all rules of a particular type rather than adding to every rule individually in the
            ContextRule object.
        max_scope: The number of tokens around a modifier in a target can be modified. Default value is None,
            Context will use the sentence boundaries. If a value greater than zero, applies the window globally.
            Both options will be overridden by a more specific value in a ContextRule.
        max_targets: The maximum number of targets a modifier can modify. Default value is None, context will modify
            all targets in its scope. If a value greater than zero, applies this value globally. Both options will
            be overridden by a more specific value in a ContextRule.
        prune_on_modifier_overlap: Whether to prune modifiers which are substrings of another modifier. If True,
            will drop substrings completely. For example, if "no history of"  and "history of" are both
            ConTextRules,both will match the text "no history of afib", but only "no  history of" should modify
            afib. Default True.
        prune_on_target_overlap: Whether to remove any matched modifiers which overlap with target entities. If
            False, any overlapping modifiers will not modify the overlapping entity but will still modify any other
            targets in its scope. Default False.
        span_attrs: The optional span attributes to modify. Default option "default" uses attributes in
            `DEFAULT_ATTRIBUTES`. If a dictionary, format is mapping context modifier categories to a dictionary
            containing the attribute name and the value to set the attribute to when a  span is modified by a
            modifier of that category. If None, no attributes will be modified.
        input_span_type: "ents" or "group". Where to look for targets. "ents" will modify attributes of spans
            in doc.ents. "group" will modify attributes of spans in the span group specified by `span_group_name`.
        span_group_name: The name of the span group used when `input_span_type` is "group". Default is
            "medspacy_spans".
    """
    self.nlp = nlp
    self.name = name
    self.prune_on_modifier_overlap = prune_on_modifier_overlap
    self.prune_on_target_overlap = prune_on_target_overlap
    self.input_span_type = input_span_type
    self.span_group_name = span_group_name
    self.context_attributes_mapping = None

    self.DEFAULT_RULES_FILEPATH = path.join(
        Path(__file__).resolve().parents[2], "resources", language_code.lower(), "context_rules.json"
    )

    self.__matcher = MedspacyMatcher(
        nlp,
        name=name,
        phrase_matcher_attr=phrase_matcher_attr,
        prune=prune_on_modifier_overlap,
    )

    if span_attrs == "default":
        self.context_attributes_mapping = DEFAULT_ATTRIBUTES
        self.register_default_attributes()
    elif span_attrs:
        for _, attr_dict in span_attrs.items():
            for attr_name in attr_dict.keys():
                if not Span.has_extension(attr_name):
                    raise ValueError(
                        f"Custom extension {attr_name} has not been set. Please ensure Span.set_extension is "
                        f"called for your pipeline's custom extensions."
                    )
        self.context_attributes_mapping = span_attrs

    self.register_graph_attributes()

    if max_scope is not None:
        if not (isinstance(max_scope, int) and max_scope > 0):
            raise ValueError(
                f"If 'max_scope' must be a value greater than 0, not the current value: {max_scope}"
            )
    self.max_scope = max_scope

    self.allowed_types = allowed_types
    self.excluded_types = excluded_types
    self.max_targets = max_targets

    self.terminating_types = dict()
    if terminating_types:
        self.terminating_types = {
            k.upper(): v for (k, v) in terminating_types.items()
        }

    rule_path = None
    if rules == "default":
        rule_path = self.DEFAULT_RULES_FILEPATH
    else:
        rule_path = rules

    if rule_path:
        self.add(ConTextRule.from_json(rule_path))

add(rules)

Adds ConTextRules to Context.

Parameters:

Name Type Description Default
rules

A single ConTextRule or a collection of ConTextRules to add to the Sectionizer.

required
Source code in medspacy/context/context.py
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
def add(self, rules):
    """
    Adds ConTextRules to Context.

    Args:
        rules: A single ConTextRule or a collection of ConTextRules to add to the Sectionizer.
    """
    if isinstance(rules, ConTextRule):
        rules = [rules]
    for rule in rules:
        if not isinstance(rule, ConTextRule):
            raise TypeError(f"Rules must type ConTextRule, not {type(rule)}.")

        # If global attributes like allowed_types and max_scope are defined,
        # check if the ConTextRule has them defined. If not, set to the global
        for attr in (
            "allowed_types",
            "excluded_types",
            "max_scope",
            "max_targets",
        ):
            value = getattr(self, attr)
            if value is None:  # No global value set
                continue
            if (
                getattr(rule, attr) is None
            ):  # If the direction itself has it defined, don't override
                setattr(rule, attr, value)

        # Check custom termination points
        if rule.category.upper() in self.terminating_types:
            for other_modifier in self.terminating_types[rule.category.upper()]:
                rule.terminated_by.add(other_modifier.upper())

    self.__matcher.add(rules)

register_default_attributes() classmethod

Registers the default values for the Span attributes defined in DEFAULT_ATTRIBUTES.

Source code in medspacy/context/context.py
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
@classmethod
def register_default_attributes(cls):
    """
    Registers the default values for the Span attributes defined in `DEFAULT_ATTRIBUTES`.
    """
    for attr_name in [
        "is_negated",
        "is_uncertain",
        "is_historical",
        "is_hypothetical",
        "is_family",
    ]:
        try:
            Span.set_extension(attr_name, default=False)
        except ValueError:  # Extension already set
            pass

register_graph_attributes() classmethod

Registers spaCy attribute extensions: Span..modifiers and Doc..context_graph.

Source code in medspacy/context/context.py
245
246
247
248
249
250
251
252
253
254
@classmethod
def register_graph_attributes(cls):
    """
    Registers spaCy attribute extensions: Span._.modifiers and Doc._.context_graph.
    """
    try:
        Span.set_extension("modifiers", default=(), force=True)
        Doc.set_extension("context_graph", default=None, force=True)
    except ValueError:  # Extension already set
        pass

set_context_attributes(edges)

Adds Span-level attributes to targets with modifiers.

Parameters:

Name Type Description Default
edges

The edges of the ContextGraph to modify.

required
Source code in medspacy/context/context.py
273
274
275
276
277
278
279
280
281
282
283
284
def set_context_attributes(self, edges):
    """
    Adds Span-level attributes to targets with modifiers.

    Args:
        edges: The edges of the ContextGraph to modify.
    """
    for (target, modifier) in edges:
        if modifier.category in self.context_attributes_mapping:
            attr_dict = self.context_attributes_mapping[modifier.category]
            for attr_name, attr_value in attr_dict.items():
                setattr(target._, attr_name, attr_value)

ConTextGraph

The ConTextGraph class defines the internal structure of the ConText algorithm. It stores a collection of modifiers, matched with ConTextRules, and targets from some other source such as the TargetMatcher or a spaCy NER model.

Each modifier can have some number of associated targets that it modifies. This relationship is stored as edges of of the graph.

Source code in medspacy/context/context_graph.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
class ConTextGraph:
    """
    The ConTextGraph class defines the internal structure of the ConText algorithm. It stores a collection of modifiers,
    matched with ConTextRules, and targets from some other source such as the TargetMatcher or a spaCy NER model.

    Each modifier can have some number of associated targets that it modifies. This relationship is stored as edges of
    of the graph.
    """

    def __init__(
        self,
        targets: Optional[List[Span]] = None,
        modifiers: Optional[List[ConTextModifier]] = None,
        edges: Optional[List] = None,
        prune_on_modifier_overlap: bool = False,
    ):
        """
        Creates a new ConTextGraph object.

        Args:
            targets: A spans that context might modify.
            modifiers: A list of ConTextModifiers that might modify the targets.
            edges: A list of edges between targets and modifiers representing the modification relationship.
            prune_on_modifier_overlap: Whether to prune modifiers when one modifier completely covers another.
        """
        self.targets = targets if targets is not None else []
        self.modifiers = modifiers if modifiers is not None else []
        self.edges = edges if edges is not None else []
        self.prune_on_modifier_overlap = prune_on_modifier_overlap

    def update_scopes(self):
        """
        Update the scope of all ConTextModifier.

        For each modifier in a list of ConTextModifiers, check against each other
        modifier to see if one of the modifiers should update the other.
        This allows neighboring similar modifiers to extend each other's
        scope and allows "terminate" modifiers to end a modifier's scope.
        """
        for i in range(len(self.modifiers) - 1):
            modifier1 = self.modifiers[i]
            for j in range(i + 1, len(self.modifiers)):
                modifier2 = self.modifiers[j]
                # TODO: Add modifier -> modifier edges
                modifier1.limit_scope(modifier2)
                modifier2.limit_scope(modifier1)

    def apply_modifiers(self):
        """
        Checks each target/modifier pair. If modifier modifies target,
        create an edge between them.
        """
        if self.prune_on_modifier_overlap:
            for i in range(len(self.modifiers) - 1, -1, -1):
                modifier = self.modifiers[i]
                for target in self.targets:
                    if tuple_overlaps(
                        (target.start, target.end), modifier.modifier_span
                    ):
                        self.modifiers.pop(i)
                        break

        edges = []
        for target in self.targets:
            for modifier in self.modifiers:
                if modifier.modifies(target):
                    modifier.modify(target)

        # Now do a second pass and reduce the number of targets
        # for any modifiers with a max_targets int
        for modifier in self.modifiers:
            modifier.reduce_targets()
            for target in modifier._targets:
                edges.append((target, modifier))

        self.edges = edges

    def __repr__(self):
        return f"<ConTextGraph> with {len(self.targets)} targets and {len(self.modifiers)} modifiers"

    def serialized_representation(self) -> Dict[str, Any]:
        """
        Returns the serialized representation of the ConTextGraph
        """
        return self.__dict__

    @classmethod
    def from_serialized_representation(cls, serialized_representation) -> ConTextGraph:
        """
        Creates the ConTextGraph from the serialized representation
        """
        context_graph = ConTextGraph(**serialized_representation)

        return context_graph

__init__(targets=None, modifiers=None, edges=None, prune_on_modifier_overlap=False)

Creates a new ConTextGraph object.

Parameters:

Name Type Description Default
targets Optional[List[Span]]

A spans that context might modify.

None
modifiers Optional[List[ConTextModifier]]

A list of ConTextModifiers that might modify the targets.

None
edges Optional[List]

A list of edges between targets and modifiers representing the modification relationship.

None
prune_on_modifier_overlap bool

Whether to prune modifiers when one modifier completely covers another.

False
Source code in medspacy/context/context_graph.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def __init__(
    self,
    targets: Optional[List[Span]] = None,
    modifiers: Optional[List[ConTextModifier]] = None,
    edges: Optional[List] = None,
    prune_on_modifier_overlap: bool = False,
):
    """
    Creates a new ConTextGraph object.

    Args:
        targets: A spans that context might modify.
        modifiers: A list of ConTextModifiers that might modify the targets.
        edges: A list of edges between targets and modifiers representing the modification relationship.
        prune_on_modifier_overlap: Whether to prune modifiers when one modifier completely covers another.
    """
    self.targets = targets if targets is not None else []
    self.modifiers = modifiers if modifiers is not None else []
    self.edges = edges if edges is not None else []
    self.prune_on_modifier_overlap = prune_on_modifier_overlap

apply_modifiers()

Checks each target/modifier pair. If modifier modifies target, create an edge between them.

Source code in medspacy/context/context_graph.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
def apply_modifiers(self):
    """
    Checks each target/modifier pair. If modifier modifies target,
    create an edge between them.
    """
    if self.prune_on_modifier_overlap:
        for i in range(len(self.modifiers) - 1, -1, -1):
            modifier = self.modifiers[i]
            for target in self.targets:
                if tuple_overlaps(
                    (target.start, target.end), modifier.modifier_span
                ):
                    self.modifiers.pop(i)
                    break

    edges = []
    for target in self.targets:
        for modifier in self.modifiers:
            if modifier.modifies(target):
                modifier.modify(target)

    # Now do a second pass and reduce the number of targets
    # for any modifiers with a max_targets int
    for modifier in self.modifiers:
        modifier.reduce_targets()
        for target in modifier._targets:
            edges.append((target, modifier))

    self.edges = edges

from_serialized_representation(serialized_representation) classmethod

Creates the ConTextGraph from the serialized representation

Source code in medspacy/context/context_graph.py
 98
 99
100
101
102
103
104
105
@classmethod
def from_serialized_representation(cls, serialized_representation) -> ConTextGraph:
    """
    Creates the ConTextGraph from the serialized representation
    """
    context_graph = ConTextGraph(**serialized_representation)

    return context_graph

serialized_representation()

Returns the serialized representation of the ConTextGraph

Source code in medspacy/context/context_graph.py
92
93
94
95
96
def serialized_representation(self) -> Dict[str, Any]:
    """
    Returns the serialized representation of the ConTextGraph
    """
    return self.__dict__

update_scopes()

Update the scope of all ConTextModifier.

For each modifier in a list of ConTextModifiers, check against each other modifier to see if one of the modifiers should update the other. This allows neighboring similar modifiers to extend each other's scope and allows "terminate" modifiers to end a modifier's scope.

Source code in medspacy/context/context_graph.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def update_scopes(self):
    """
    Update the scope of all ConTextModifier.

    For each modifier in a list of ConTextModifiers, check against each other
    modifier to see if one of the modifiers should update the other.
    This allows neighboring similar modifiers to extend each other's
    scope and allows "terminate" modifiers to end a modifier's scope.
    """
    for i in range(len(self.modifiers) - 1):
        modifier1 = self.modifiers[i]
        for j in range(i + 1, len(self.modifiers)):
            modifier2 = self.modifiers[j]
            # TODO: Add modifier -> modifier edges
            modifier1.limit_scope(modifier2)
            modifier2.limit_scope(modifier1)

ConTextModifier

Represents a concept found by ConText in a document. An instance of this class is the result of ConTextRule matching text in a Doc.

Source code in medspacy/context/context_modifier.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
class ConTextModifier:
    """
    Represents a concept found by ConText in a document. An instance of this class is the result of ConTextRule matching
    text in a Doc.
    """

    def __init__(
        self,
        context_rule: ConTextRule,
        start: int,
        end: int,
        doc: Doc,
        scope_start: Optional[int] = None,
        scope_end: Optional[int] = None,
        max_scope: Optional[int] = None,
    ):
        """
        Create a new ConTextModifier from a document span. Each modifier represents a span in the text and a surrounding
        window. Spans such as entities or other members of span groups that occur within this window can be modified by
        this ConTextModifier.

        Args:
            context_rule: The ConTextRule object which defines the modifier.
            start: The start token index.
            end: The end token index (non-inclusive).
            doc: The spaCy Doc which contains this span. This is needed to initialize the modifier but is not
                maintained.
            scope_start: The start token index of the scope.
            scope_end: The end index of the scope.
            max_scope: Whether to use scope values rather than sentence boundaries for modifications.
        """
        self._context_rule = context_rule
        self._start = start
        self._end = end

        self._targets = []
        self._num_targets = 0

        self._max_scope = max_scope
        self._scope_start = scope_start
        self._scope_end = scope_end
        if doc is not None and (self._scope_end is None or self._scope_start is None):
            self.__set_scope(doc)

    @property
    def modifier_span(self) -> Tuple[int, int]:
        """
        The spaCy Span object, which is a view of self.doc, covered by this match.
        """
        return self._start, self._end

    @property
    def rule(self) -> ConTextRule:
        """
        Returns the associated context rule.
        """
        return self._context_rule

    @property
    def direction(self) -> str:
        """
        Returns the associated direction.
        """
        return self.rule.direction

    @property
    def category(self) -> str:
        """
        Returns the associated category.
        """
        return self.rule.category

    @property
    def scope_span(self) -> Tuple[int, int]:
        """
        Returns the associated scope.
        """
        return self._scope_start, self._scope_end

    @property
    def allowed_types(self) -> Set[str]:
        """
        Returns the associated allowed types.
        """
        return self.rule.allowed_types

    @property
    def excluded_types(self) -> Set[str]:
        """
        Returns the associated excluded types.
        """
        return self.rule.excluded_types

    @property
    def num_targets(self) -> int:
        """
        Returns the associated number of targets.
        """
        return self._num_targets

    @property
    def max_targets(self) -> Union[int, None]:
        """
        Returns the associated maximum number of targets.
        """
        return self.rule.max_targets

    @property
    def max_scope(self) -> Union[int, None]:
        """
        Returns the associated maximum scope.
        """
        return self.rule.max_scope

    def __set_scope(self, doc: Doc):
        """
        Applies the direction of the ConTextRule which generated this ConTextModifier to define a scope. If
        self._max_scope is None, then the default scope is the sentence which it occurs in whichever direction defined by
        self.direction. For example, if the direction is "forward", the scope will be [self.end: sentence.end]. If the
        direction is "backward", it will be [self.start: sentence.start].

        If self.max_scope is not None and the length of the default scope is longer than self.max_scope, it will be
        reduced to self.max_scope.

        Args:
            doc: The spaCy doc to use to set scope.
        """
        # If ConText is set to use defined windows, do that instead of sentence splitting
        if self._max_scope:
            full_scope_span = doc[self._start : self._end]._.window(
                n=self.rule.max_scope
            )
        # Otherwise, use the sentence
        else:
            full_scope_span = doc[self._start].sent
            if full_scope_span is None:
                raise ValueError(
                    "ConText failed because sentence boundaries have not been set. Add an upstream component such as the "
                    "dependency parser, Sentencizer, or PyRuSH to detect sentence boundaries or initialize ConText with "
                    "`max_scope` set to a value greater than 0."
                )

        if self.direction.lower() == "forward":
            self._scope_start, self._scope_end = self._end, full_scope_span.end
            if (
                self.max_scope is not None
                and (self._scope_end - self._scope_start) > self.max_scope
            ):
                self._scope_end = self._end + self.max_scope

        elif self.direction.lower() == "backward":
            self._scope_start, self._scope_end = (
                full_scope_span.start,
                self._start,
            )
            if (
                self.max_scope is not None
                and (self._scope_end - self._scope_start) > self.max_scope
            ):
                self._scope_start = self._start - self.max_scope

        else:  # bidirectional
            self._scope_start, self._scope_end = (
                full_scope_span.start,
                full_scope_span.end,
            )

            # Set the max scope on either side
            # Backwards
            if (
                self.max_scope is not None
                and (self._start - self._scope_start) > self.max_scope
            ):
                self._scope_start = self._start - self.max_scope
            # Forwards
            if (
                self.max_scope is not None
                and (self._scope_end - self._end) > self.max_scope
            ):
                self._scope_end = self._end + self.max_scope

    def update_scope(self, span: Span):
        """
        Changes the scope of self to be the given spaCy span.

        Args:
            span: a spaCy Span which contains the scope which a modifier should cover.
        """
        self._scope_start = span.start
        self._scope_end = span.end

    def limit_scope(self, other: ConTextModifier) -> bool:
        """
        If self and other have the same category or if other has a directionality of 'terminate', use the span of other
        to update the scope of self. Limiting the scope of two modifiers of the same category reduces the number of
        modifiers. For example, in 'no evidence of CHF, no pneumonia', 'pneumonia' will only be modified by 'no', not
        'no evidence of'. 'terminate' modifiers limit the scope of a modifier like 'no evidence of' in 'no evidence of
        CHF, but there is pneumonia'

        Args:
            other: The modifier to check against.

        Returns:
            Whether the other modifier modified the scope of self.
        """
        if not tuple_overlaps(self.scope_span, other.scope_span):
            return False
        if self.direction.upper() == "TERMINATE":
            return False
        # Check if the other modifier is a type which can modify self
        # or if they are the same category. If not, don't reduce scope.
        if (
            (other.direction.upper() != "TERMINATE")
            and (other.category.upper() not in self.rule.terminated_by)
            and (other.category.upper() != self.category.upper())
        ):
            return False

        # If two modifiers have the same category but modify different target types,
        # don't limit scope.
        if self.category == other.category and (
            (self.allowed_types != other.allowed_types)
            or (self.excluded_types != other.excluded_types)
        ):
            return False

        orig_scope = self.scope_span
        if self.direction.lower() in ("forward", "bidirectional"):
            if other > self:
                self._scope_end = min(self._scope_end, other.modifier_span[0])
        if self.direction.lower() in ("backward", "bidirectional"):
            if other < self:
                self._scope_start = max(self._scope_start, other.modifier_span[1])
        return orig_scope != self.scope_span

    def modifies(self, target: Span) -> bool:
        """
        Checks whether the target is within the modifier scope and if self is allowed to modify target.

        Args:
            target: a spaCy span representing a target concept.

        Returns:
            Whether the target is within `modifier_scope` and if self is allowed to modify the target.
        """
        # If the target and modifier overlap, meaning at least one token
        # one extracted as both a target and modifier, return False
        # to avoid self-modifying concepts

        if tuple_overlaps(
            self.modifier_span, (target.start, target.end)
        ):  # self.overlaps(target):
            return False
        if self.direction in ("TERMINATE", "PSEUDO"):
            return False
        if not self.allows(target.label_.upper()):
            return False

        if tuple_overlaps(self.scope_span, (target.start, target.end)):
            if not self.on_modifies(target):
                return False
            else:
                return True
        return False

    def allows(self, target_label: str) -> bool:
        """
        Returns whether if a modifier is able to modify a target type.

        Args:
            target_label: The target type to check.

        Returns:
            Whether the modifier is allowed to modify a target of the specified type. True if `target_label` in
            `self.allowed_types` or if `target_label` not in `self.excluded_tupes`. False otherwise.
        """
        if self.allowed_types is not None:
            return target_label in self.allowed_types
        if self.excluded_types is not None:
            return target_label not in self.excluded_types
        return True

    def on_modifies(self, target: Span) -> bool:
        """
        If the ConTextRule used to define a ConTextModifier has an `on_modifies` callback function, evaluate and return
        either True or False.

        Args:
            target: The spaCy span to evaluate.

        Returns:
            The result of the `on_modifies` callback for the rule. True if the callback is None.
        """
        if self.rule.on_modifies is None:
            return True
        # Find the span in between the target and modifier
        start = min(target.end, self._end)
        end = max(target.start, self._end)
        span_between = target.doc[start:end]
        rslt = self.rule.on_modifies(
            target, target.doc[self._start : self._end], span_between
        )
        if rslt not in (True, False):
            raise ValueError(
                "The on_modifies function must return either True or False indicating "
                "whether a modify modifies a target. Actual value: {0}".format(rslt)
            )
        return rslt

    def modify(self, target: Span):
        """
        Add target to the list of self._targets and increment self._num_targets.

        Args:
            target: The spaCy span to add.
        """
        self._targets.append(target)
        self._num_targets += 1

    def reduce_targets(self):
        """
        Reduces the number of targets to the n-closest targets based on the value of `self.max_targets`. If
        `self.max_targets` is None, no pruning is done.
        """
        if self.max_targets is None or self.num_targets <= self.max_targets:
            return

        target_dists = []
        for target in self._targets:
            dist = min(abs(self._start - target.end), abs(target.start - self._end))
            target_dists.append((target, dist))
        srtd_targets, _ = zip(*sorted(target_dists, key=lambda x: x[1]))
        self._targets = srtd_targets[: self.max_targets]
        self._num_targets = len(self._targets)

    def __gt__(self, other: ConTextModifier):
        return self._start > other.modifier_span[0]

    def __ge__(self, other):
        return self._start >= other.modifier_span[0]

    def __lt__(self, other):
        return self._end < other.modifier_span[1]

    def __le__(self, other):
        return self._end <= other.modifier_span[1]

    def __len__(self):
        return self._end - self._start

    def __repr__(self):
        return f"<ConTextModifier> [{self._start}, {self._end}, {self.category}]"

    def serialized_representation(self):
        """
        Serialized Representation of the modifier
        """
        dict_repr = dict()
        dict_repr["context_rule"] = self.rule.to_dict()
        dict_repr["start"] = self._start
        dict_repr["end"] = self._end
        dict_repr["max_scope"] = self._max_scope
        dict_repr["scope_start"] = self._scope_start
        dict_repr["scope_end"] = self._scope_end

        return dict_repr

    @classmethod
    def from_serialized_representation(
        cls, serialized_representation
    ) -> ConTextModifier:
        """
        Instantiates the class from the serialized representation
        """
        rule = ConTextRule.from_dict(serialized_representation["context_rule"])

        serialized_representation["context_rule"] = rule
        serialized_representation["doc"] = None

        return ConTextModifier(**serialized_representation)

allowed_types property

Returns the associated allowed types.

category property

Returns the associated category.

direction property

Returns the associated direction.

excluded_types property

Returns the associated excluded types.

max_scope property

Returns the associated maximum scope.

max_targets property

Returns the associated maximum number of targets.

modifier_span property

The spaCy Span object, which is a view of self.doc, covered by this match.

num_targets property

Returns the associated number of targets.

rule property

Returns the associated context rule.

scope_span property

Returns the associated scope.

__init__(context_rule, start, end, doc, scope_start=None, scope_end=None, max_scope=None)

Create a new ConTextModifier from a document span. Each modifier represents a span in the text and a surrounding window. Spans such as entities or other members of span groups that occur within this window can be modified by this ConTextModifier.

Parameters:

Name Type Description Default
context_rule ConTextRule

The ConTextRule object which defines the modifier.

required
start int

The start token index.

required
end int

The end token index (non-inclusive).

required
doc Doc

The spaCy Doc which contains this span. This is needed to initialize the modifier but is not maintained.

required
scope_start Optional[int]

The start token index of the scope.

None
scope_end Optional[int]

The end index of the scope.

None
max_scope Optional[int]

Whether to use scope values rather than sentence boundaries for modifications.

None
Source code in medspacy/context/context_modifier.py
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def __init__(
    self,
    context_rule: ConTextRule,
    start: int,
    end: int,
    doc: Doc,
    scope_start: Optional[int] = None,
    scope_end: Optional[int] = None,
    max_scope: Optional[int] = None,
):
    """
    Create a new ConTextModifier from a document span. Each modifier represents a span in the text and a surrounding
    window. Spans such as entities or other members of span groups that occur within this window can be modified by
    this ConTextModifier.

    Args:
        context_rule: The ConTextRule object which defines the modifier.
        start: The start token index.
        end: The end token index (non-inclusive).
        doc: The spaCy Doc which contains this span. This is needed to initialize the modifier but is not
            maintained.
        scope_start: The start token index of the scope.
        scope_end: The end index of the scope.
        max_scope: Whether to use scope values rather than sentence boundaries for modifications.
    """
    self._context_rule = context_rule
    self._start = start
    self._end = end

    self._targets = []
    self._num_targets = 0

    self._max_scope = max_scope
    self._scope_start = scope_start
    self._scope_end = scope_end
    if doc is not None and (self._scope_end is None or self._scope_start is None):
        self.__set_scope(doc)

__set_scope(doc)

Applies the direction of the ConTextRule which generated this ConTextModifier to define a scope. If self._max_scope is None, then the default scope is the sentence which it occurs in whichever direction defined by self.direction. For example, if the direction is "forward", the scope will be [self.end: sentence.end]. If the direction is "backward", it will be [self.start: sentence.start].

If self.max_scope is not None and the length of the default scope is longer than self.max_scope, it will be reduced to self.max_scope.

Parameters:

Name Type Description Default
doc Doc

The spaCy doc to use to set scope.

required
Source code in medspacy/context/context_modifier.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
def __set_scope(self, doc: Doc):
    """
    Applies the direction of the ConTextRule which generated this ConTextModifier to define a scope. If
    self._max_scope is None, then the default scope is the sentence which it occurs in whichever direction defined by
    self.direction. For example, if the direction is "forward", the scope will be [self.end: sentence.end]. If the
    direction is "backward", it will be [self.start: sentence.start].

    If self.max_scope is not None and the length of the default scope is longer than self.max_scope, it will be
    reduced to self.max_scope.

    Args:
        doc: The spaCy doc to use to set scope.
    """
    # If ConText is set to use defined windows, do that instead of sentence splitting
    if self._max_scope:
        full_scope_span = doc[self._start : self._end]._.window(
            n=self.rule.max_scope
        )
    # Otherwise, use the sentence
    else:
        full_scope_span = doc[self._start].sent
        if full_scope_span is None:
            raise ValueError(
                "ConText failed because sentence boundaries have not been set. Add an upstream component such as the "
                "dependency parser, Sentencizer, or PyRuSH to detect sentence boundaries or initialize ConText with "
                "`max_scope` set to a value greater than 0."
            )

    if self.direction.lower() == "forward":
        self._scope_start, self._scope_end = self._end, full_scope_span.end
        if (
            self.max_scope is not None
            and (self._scope_end - self._scope_start) > self.max_scope
        ):
            self._scope_end = self._end + self.max_scope

    elif self.direction.lower() == "backward":
        self._scope_start, self._scope_end = (
            full_scope_span.start,
            self._start,
        )
        if (
            self.max_scope is not None
            and (self._scope_end - self._scope_start) > self.max_scope
        ):
            self._scope_start = self._start - self.max_scope

    else:  # bidirectional
        self._scope_start, self._scope_end = (
            full_scope_span.start,
            full_scope_span.end,
        )

        # Set the max scope on either side
        # Backwards
        if (
            self.max_scope is not None
            and (self._start - self._scope_start) > self.max_scope
        ):
            self._scope_start = self._start - self.max_scope
        # Forwards
        if (
            self.max_scope is not None
            and (self._scope_end - self._end) > self.max_scope
        ):
            self._scope_end = self._end + self.max_scope

allows(target_label)

Returns whether if a modifier is able to modify a target type.

Parameters:

Name Type Description Default
target_label str

The target type to check.

required

Returns:

Type Description
bool

Whether the modifier is allowed to modify a target of the specified type. True if target_label in

bool

self.allowed_types or if target_label not in self.excluded_tupes. False otherwise.

Source code in medspacy/context/context_modifier.py
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
def allows(self, target_label: str) -> bool:
    """
    Returns whether if a modifier is able to modify a target type.

    Args:
        target_label: The target type to check.

    Returns:
        Whether the modifier is allowed to modify a target of the specified type. True if `target_label` in
        `self.allowed_types` or if `target_label` not in `self.excluded_tupes`. False otherwise.
    """
    if self.allowed_types is not None:
        return target_label in self.allowed_types
    if self.excluded_types is not None:
        return target_label not in self.excluded_types
    return True

from_serialized_representation(serialized_representation) classmethod

Instantiates the class from the serialized representation

Source code in medspacy/context/context_modifier.py
379
380
381
382
383
384
385
386
387
388
389
390
391
@classmethod
def from_serialized_representation(
    cls, serialized_representation
) -> ConTextModifier:
    """
    Instantiates the class from the serialized representation
    """
    rule = ConTextRule.from_dict(serialized_representation["context_rule"])

    serialized_representation["context_rule"] = rule
    serialized_representation["doc"] = None

    return ConTextModifier(**serialized_representation)

limit_scope(other)

If self and other have the same category or if other has a directionality of 'terminate', use the span of other to update the scope of self. Limiting the scope of two modifiers of the same category reduces the number of modifiers. For example, in 'no evidence of CHF, no pneumonia', 'pneumonia' will only be modified by 'no', not 'no evidence of'. 'terminate' modifiers limit the scope of a modifier like 'no evidence of' in 'no evidence of CHF, but there is pneumonia'

Parameters:

Name Type Description Default
other ConTextModifier

The modifier to check against.

required

Returns:

Type Description
bool

Whether the other modifier modified the scope of self.

Source code in medspacy/context/context_modifier.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
def limit_scope(self, other: ConTextModifier) -> bool:
    """
    If self and other have the same category or if other has a directionality of 'terminate', use the span of other
    to update the scope of self. Limiting the scope of two modifiers of the same category reduces the number of
    modifiers. For example, in 'no evidence of CHF, no pneumonia', 'pneumonia' will only be modified by 'no', not
    'no evidence of'. 'terminate' modifiers limit the scope of a modifier like 'no evidence of' in 'no evidence of
    CHF, but there is pneumonia'

    Args:
        other: The modifier to check against.

    Returns:
        Whether the other modifier modified the scope of self.
    """
    if not tuple_overlaps(self.scope_span, other.scope_span):
        return False
    if self.direction.upper() == "TERMINATE":
        return False
    # Check if the other modifier is a type which can modify self
    # or if they are the same category. If not, don't reduce scope.
    if (
        (other.direction.upper() != "TERMINATE")
        and (other.category.upper() not in self.rule.terminated_by)
        and (other.category.upper() != self.category.upper())
    ):
        return False

    # If two modifiers have the same category but modify different target types,
    # don't limit scope.
    if self.category == other.category and (
        (self.allowed_types != other.allowed_types)
        or (self.excluded_types != other.excluded_types)
    ):
        return False

    orig_scope = self.scope_span
    if self.direction.lower() in ("forward", "bidirectional"):
        if other > self:
            self._scope_end = min(self._scope_end, other.modifier_span[0])
    if self.direction.lower() in ("backward", "bidirectional"):
        if other < self:
            self._scope_start = max(self._scope_start, other.modifier_span[1])
    return orig_scope != self.scope_span

modifies(target)

Checks whether the target is within the modifier scope and if self is allowed to modify target.

Parameters:

Name Type Description Default
target Span

a spaCy span representing a target concept.

required

Returns:

Type Description
bool

Whether the target is within modifier_scope and if self is allowed to modify the target.

Source code in medspacy/context/context_modifier.py
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
def modifies(self, target: Span) -> bool:
    """
    Checks whether the target is within the modifier scope and if self is allowed to modify target.

    Args:
        target: a spaCy span representing a target concept.

    Returns:
        Whether the target is within `modifier_scope` and if self is allowed to modify the target.
    """
    # If the target and modifier overlap, meaning at least one token
    # one extracted as both a target and modifier, return False
    # to avoid self-modifying concepts

    if tuple_overlaps(
        self.modifier_span, (target.start, target.end)
    ):  # self.overlaps(target):
        return False
    if self.direction in ("TERMINATE", "PSEUDO"):
        return False
    if not self.allows(target.label_.upper()):
        return False

    if tuple_overlaps(self.scope_span, (target.start, target.end)):
        if not self.on_modifies(target):
            return False
        else:
            return True
    return False

modify(target)

Add target to the list of self._targets and increment self._num_targets.

Parameters:

Name Type Description Default
target Span

The spaCy span to add.

required
Source code in medspacy/context/context_modifier.py
321
322
323
324
325
326
327
328
329
def modify(self, target: Span):
    """
    Add target to the list of self._targets and increment self._num_targets.

    Args:
        target: The spaCy span to add.
    """
    self._targets.append(target)
    self._num_targets += 1

on_modifies(target)

If the ConTextRule used to define a ConTextModifier has an on_modifies callback function, evaluate and return either True or False.

Parameters:

Name Type Description Default
target Span

The spaCy span to evaluate.

required

Returns:

Type Description
bool

The result of the on_modifies callback for the rule. True if the callback is None.

Source code in medspacy/context/context_modifier.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
def on_modifies(self, target: Span) -> bool:
    """
    If the ConTextRule used to define a ConTextModifier has an `on_modifies` callback function, evaluate and return
    either True or False.

    Args:
        target: The spaCy span to evaluate.

    Returns:
        The result of the `on_modifies` callback for the rule. True if the callback is None.
    """
    if self.rule.on_modifies is None:
        return True
    # Find the span in between the target and modifier
    start = min(target.end, self._end)
    end = max(target.start, self._end)
    span_between = target.doc[start:end]
    rslt = self.rule.on_modifies(
        target, target.doc[self._start : self._end], span_between
    )
    if rslt not in (True, False):
        raise ValueError(
            "The on_modifies function must return either True or False indicating "
            "whether a modify modifies a target. Actual value: {0}".format(rslt)
        )
    return rslt

reduce_targets()

Reduces the number of targets to the n-closest targets based on the value of self.max_targets. If self.max_targets is None, no pruning is done.

Source code in medspacy/context/context_modifier.py
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
def reduce_targets(self):
    """
    Reduces the number of targets to the n-closest targets based on the value of `self.max_targets`. If
    `self.max_targets` is None, no pruning is done.
    """
    if self.max_targets is None or self.num_targets <= self.max_targets:
        return

    target_dists = []
    for target in self._targets:
        dist = min(abs(self._start - target.end), abs(target.start - self._end))
        target_dists.append((target, dist))
    srtd_targets, _ = zip(*sorted(target_dists, key=lambda x: x[1]))
    self._targets = srtd_targets[: self.max_targets]
    self._num_targets = len(self._targets)

serialized_representation()

Serialized Representation of the modifier

Source code in medspacy/context/context_modifier.py
365
366
367
368
369
370
371
372
373
374
375
376
377
def serialized_representation(self):
    """
    Serialized Representation of the modifier
    """
    dict_repr = dict()
    dict_repr["context_rule"] = self.rule.to_dict()
    dict_repr["start"] = self._start
    dict_repr["end"] = self._end
    dict_repr["max_scope"] = self._max_scope
    dict_repr["scope_start"] = self._scope_start
    dict_repr["scope_end"] = self._scope_end

    return dict_repr

update_scope(span)

Changes the scope of self to be the given spaCy span.

Parameters:

Name Type Description Default
span Span

a spaCy Span which contains the scope which a modifier should cover.

required
Source code in medspacy/context/context_modifier.py
193
194
195
196
197
198
199
200
201
def update_scope(self, span: Span):
    """
    Changes the scope of self to be the given spaCy span.

    Args:
        span: a spaCy Span which contains the scope which a modifier should cover.
    """
    self._scope_start = span.start
    self._scope_end = span.end

ConTextRule

Bases: BaseRule

A ConTextRule defines a ConText modifier. ConTextRules are rules which define which spans are extracted as modifiers and how they behave, such as the phrase to be matched, the category/semantic class, the direction of the modifier in the text, and what types of target spans can be modified.

Source code in medspacy/context/context_rule.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
class ConTextRule(BaseRule):
    """
    A ConTextRule defines a ConText modifier. ConTextRules are rules which define which spans are extracted as modifiers
    and how they behave, such as the phrase to be matched, the category/semantic class, the direction of the modifier in
    the text, and what types of target spans can be modified.
    """

    _ALLOWED_DIRECTIONS = (
        "FORWARD",
        "BACKWARD",
        "BIDIRECTIONAL",
        "TERMINATE",
        "PSEUDO"
    )
    _ALLOWED_KEYS = {
        "literal",
        "direction",
        "pattern",
        "category",
        "metadata",
        "allowed_types",
        "excluded_types",
        "max_targets",
        "max_scope",
    }

    def __init__(
        self,
        literal: str,
        category: str,
        pattern: Optional[Union[str, List[Dict[str, str]]]] = None,
        direction: str = "BIDIRECTIONAL",
        on_match: Optional[
            Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]
        ] = None,
        on_modifies: Optional[Callable[[Span, Span, Span], bool]] = None,
        allowed_types: Optional[Set[str]] = None,
        excluded_types: Optional[Set[str]] = None,
        max_scope: Optional[int] = None,
        max_targets: Optional[int] = None,
        terminated_by: Optional[Set[str]] = None,
        metadata: Optional[Dict[Any, Any]] = None,
    ):
        """
        Creates a ConTextRule object.

        The primary arguments of `literal` `category`, and `direction` define the span of text to be matched, the
        semantic category, and the direction within the sentence in which the modifier operates.
        Other arguments specify additional custom logic such as:
            - Additional control over what text can be matched as a modifier (pattern and on_match)
            - Which types of targets can be modified (allowed_types, excluded_types)
            - The scope size and number of targets that a modifier can modify (max_targets, max_scope)
            - Other logic for terminating a span or for allowing a modifier to modify a target (on_modifies,
            terminated_by)

        Args:
            literal: The string representation of a concept. If `pattern` is None, this string will be lower-cased and
                matched to the lower-case string. If `pattern` is not None, this argument will not be used for matching
                but can be used as a reference as the rule name.
            category: The semantic class of the matched span. This corresponds to the `label_` attribute of an entity.
            pattern: A list or string to use as a spaCy pattern rather than `literal`. If a list, will use spaCy
                token-based pattern matching to match using token attributes. If a string, will use medspaCy's
                RegexMatcher. If None, will use `literal` as the pattern for phrase matching. For more information, see
                https://spacy.io/usage/rule-based-matching.
            direction: The directionality or action of a modifier. This defines which part of a sentence a modifier will
                include as its scope. Entities within the scope will be considered to be modified.
                Valid values are:
                - "FORWARD": Scope will begin after the end of a modifier and move to the right
                - "BACKWARD": Scope will begin before the beginning of a modifier and move to the left
                - "BIDIRECTIONAL": Scope will expand on either side of a modifier
                - "TERMINATE": A special direction to limit any other modifiers if this phrase is in its scope. Example:
                    "no evidence of chf but there is pneumonia": "but" will prevent "no evidence of" from modifying
                    "pneumonia"
                - "PSEUDO": A special direction which will not modify any targets. This can be used for differentiating
                    superstrings of modifiers. Example: A modifier with literal="negative attitude" will prevent the
                    phrase "negative" in "She has a negative attitude about her treatment" from being extracted as a
                    modifier.
            on_match: An optional callback function or other callable which takes 4 arguments: `(matcher, doc, i,
                matches)`. For more information, see https://spacy.io/usage/rule-based-matching#on_match
            on_modifies: Callback function to run when building an edge between a target and a modifier. This allows
                specifying custom logic for allowing or preventing certain modifiers from modifying certain targets. The
                callable should take 3 arguments:
                    target: The spaCy Span from doc.ents (ie., 'Evidence of pneumonia')
                    modifier: The spaCy Span covered in a resulting modifier (ie., 'no evidence of')
                    span_between: The Span between the target and modifier in question.
                Should return either True or False. If returns False, then the modifier will not modify the target.
            allowed_types: A collection of target labels to allow a modifier to modify. If None, will apply to any type
                not specifically excluded in excluded_types. Only one of allowed_types and excluded_types can be used.
                An error will be thrown if both are not None.
            excluded_types: A collection of target labels which this modifier cannot modify. If None, will apply to all
                target types unless allowed_types is not None.
            max_scope: A number of tokens to explicitly limit the size of the modifier's scope. If None, the scope will
                include the entire sentence in the direction of `direction` and the entire sentence for "BIDIRECTIONAL".
                This is useful for requiring modifiers be very close to a concept in the text or for preventing long
                modifier ranges caused by sentence splitting problems.
            max_targets: The maximum number of targets which a modifier can modify. If None, will modify all targets in
                its scope.
            terminated_by: An optional collection of other modifier categories which will terminate the scope of this
                modifier. If None, only "TERMINATE" will do this. Example: if a ConTextRule defining "positive for" has
                terminated_by={"NEGATED_EXISTENCE"}, then in the sentence "positive for flu, negative for RSV", the
                positive modifier will modify "flu" but will be terminated by "negative for" and will not modify "RSV".
                This helps prevent multiple conflicting modifiers from distributing too far across a sentence.
            metadata: Optional dictionary of any extra metadata.
        """
        super().__init__(literal, category.upper(), pattern, on_match, metadata)
        self.on_modifies = on_modifies

        if allowed_types is not None and excluded_types is not None:
            raise ValueError(
                "A ConTextRule was instantiated with non-null values for both allowed_types and excluded_types. "
                "Only one of these can be non-null."
            )
        if allowed_types is not None:
            self.allowed_types = {label.upper() for label in allowed_types}
        else:
            self.allowed_types = None
        if excluded_types is not None:
            self.excluded_types = {label.upper() for label in excluded_types}
        else:
            self.excluded_types = None

        if max_targets is not None and max_targets <= 0:
            raise ValueError("max_targets must be >= 0 or None.")
        self.max_targets = max_targets
        if max_scope is not None and max_scope <= 0:
            raise ValueError("max_scope must be >= 0 or None.")
        self.max_scope = max_scope
        if terminated_by is None:
            terminated_by = set()
        else:
            if isinstance(terminated_by, str):
                raise ValueError(
                    f"terminated_by must be an iterable, such as a list or set, not {terminated_by}."
                )
            terminated_by = {string.upper() for string in terminated_by}

        self.terminated_by = terminated_by

        self.metadata = metadata

        if direction.upper() not in self._ALLOWED_DIRECTIONS:
            raise ValueError(
                "Direction {0} not recognized. Must be one of: {1}".format(
                    direction, self._ALLOWED_DIRECTIONS
                )
            )
        self.direction = direction.upper()

    @classmethod
    def from_json(cls, filepath) -> List[ConTextRule]:
        """
        Reads in a lexicon of modifiers from a JSON file under the key `context_rules`.

        Args:
            filepath: The .json file containing modifier rules. Must contain `context_rules` key containing the rule
                JSONs.

        Returns:
            A list of ConTextRules objects read from the JSON.
        """

        with open(filepath) as file:
            modifier_data = json.load(file)
        context_rules = []
        for data in modifier_data["context_rules"]:
            context_rules.append(ConTextRule.from_dict(data))
        return context_rules

    @classmethod
    def from_dict(cls, rule_dict) -> ConTextRule:
        """
        Reads a dictionary into a ConTextRule.

        Args:
            rule_dict: The dictionary to convert.

        Returns:
            The ConTextRule created from the dictionary.
        """
        keys = set(rule_dict.keys())
        invalid_keys = keys.difference(cls._ALLOWED_KEYS)
        if invalid_keys:
            msg = (
                "JSON object contains invalid keys: {0}.\n"
                "Must be one of: {1}".format(invalid_keys, cls._ALLOWED_KEYS)
            )
            raise ValueError(msg)
        rule = ConTextRule(**rule_dict)
        return rule

    def to_dict(self):
        """
        Converts ConTextItems to a python dictionary. Used when writing context rules to a json file.

        Returns:
            The dictionary containing the ConTextRule info.
        """

        rule_dict = {}
        for key in self._ALLOWED_KEYS:
            value = self.__dict__.get(key)
            if isinstance(value, set):
                value = list(value)
            if value is not None:
                rule_dict[key] = value
        return rule_dict

    @classmethod
    def to_json(cls, context_rules: List[ConTextRule], filepath: str):
        """Writes ConTextItems to a json file.

            Args:
            context_rules: a list of ContextRules that will be written to a file.
            filepath: the .json file to contain modifier rules
        """
        import json

        data = {"context_rules": [rule.to_dict() for rule in context_rules]}
        with open(filepath, "w") as file:
            json.dump(data, file, indent=4)

    def __repr__(self):
        return (
            f"ConTextRule(literal='{self.literal}', category='{self.category}', pattern={self.pattern}, "
            f"direction='{self.direction}')"
        )

__init__(literal, category, pattern=None, direction='BIDIRECTIONAL', on_match=None, on_modifies=None, allowed_types=None, excluded_types=None, max_scope=None, max_targets=None, terminated_by=None, metadata=None)

Creates a ConTextRule object.

The primary arguments of literal category, and direction define the span of text to be matched, the semantic category, and the direction within the sentence in which the modifier operates. Other arguments specify additional custom logic such as: - Additional control over what text can be matched as a modifier (pattern and on_match) - Which types of targets can be modified (allowed_types, excluded_types) - The scope size and number of targets that a modifier can modify (max_targets, max_scope) - Other logic for terminating a span or for allowing a modifier to modify a target (on_modifies, terminated_by)

Parameters:

Name Type Description Default
literal str

The string representation of a concept. If pattern is None, this string will be lower-cased and matched to the lower-case string. If pattern is not None, this argument will not be used for matching but can be used as a reference as the rule name.

required
category str

The semantic class of the matched span. This corresponds to the label_ attribute of an entity.

required
pattern Optional[Union[str, List[Dict[str, str]]]]

A list or string to use as a spaCy pattern rather than literal. If a list, will use spaCy token-based pattern matching to match using token attributes. If a string, will use medspaCy's RegexMatcher. If None, will use literal as the pattern for phrase matching. For more information, see https://spacy.io/usage/rule-based-matching.

None
direction str

The directionality or action of a modifier. This defines which part of a sentence a modifier will include as its scope. Entities within the scope will be considered to be modified. Valid values are: - "FORWARD": Scope will begin after the end of a modifier and move to the right - "BACKWARD": Scope will begin before the beginning of a modifier and move to the left - "BIDIRECTIONAL": Scope will expand on either side of a modifier - "TERMINATE": A special direction to limit any other modifiers if this phrase is in its scope. Example: "no evidence of chf but there is pneumonia": "but" will prevent "no evidence of" from modifying "pneumonia" - "PSEUDO": A special direction which will not modify any targets. This can be used for differentiating superstrings of modifiers. Example: A modifier with literal="negative attitude" will prevent the phrase "negative" in "She has a negative attitude about her treatment" from being extracted as a modifier.

'BIDIRECTIONAL'
on_match Optional[Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]]

An optional callback function or other callable which takes 4 arguments: (matcher, doc, i, matches). For more information, see https://spacy.io/usage/rule-based-matching#on_match

None
on_modifies Optional[Callable[[Span, Span, Span], bool]]

Callback function to run when building an edge between a target and a modifier. This allows specifying custom logic for allowing or preventing certain modifiers from modifying certain targets. The callable should take 3 arguments: target: The spaCy Span from doc.ents (ie., 'Evidence of pneumonia') modifier: The spaCy Span covered in a resulting modifier (ie., 'no evidence of') span_between: The Span between the target and modifier in question. Should return either True or False. If returns False, then the modifier will not modify the target.

None
allowed_types Optional[Set[str]]

A collection of target labels to allow a modifier to modify. If None, will apply to any type not specifically excluded in excluded_types. Only one of allowed_types and excluded_types can be used. An error will be thrown if both are not None.

None
excluded_types Optional[Set[str]]

A collection of target labels which this modifier cannot modify. If None, will apply to all target types unless allowed_types is not None.

None
max_scope Optional[int]

A number of tokens to explicitly limit the size of the modifier's scope. If None, the scope will include the entire sentence in the direction of direction and the entire sentence for "BIDIRECTIONAL". This is useful for requiring modifiers be very close to a concept in the text or for preventing long modifier ranges caused by sentence splitting problems.

None
max_targets Optional[int]

The maximum number of targets which a modifier can modify. If None, will modify all targets in its scope.

None
terminated_by Optional[Set[str]]

An optional collection of other modifier categories which will terminate the scope of this modifier. If None, only "TERMINATE" will do this. Example: if a ConTextRule defining "positive for" has terminated_by={"NEGATED_EXISTENCE"}, then in the sentence "positive for flu, negative for RSV", the positive modifier will modify "flu" but will be terminated by "negative for" and will not modify "RSV". This helps prevent multiple conflicting modifiers from distributing too far across a sentence.

None
metadata Optional[Dict[Any, Any]]

Optional dictionary of any extra metadata.

None
Source code in medspacy/context/context_rule.py
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
def __init__(
    self,
    literal: str,
    category: str,
    pattern: Optional[Union[str, List[Dict[str, str]]]] = None,
    direction: str = "BIDIRECTIONAL",
    on_match: Optional[
        Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]
    ] = None,
    on_modifies: Optional[Callable[[Span, Span, Span], bool]] = None,
    allowed_types: Optional[Set[str]] = None,
    excluded_types: Optional[Set[str]] = None,
    max_scope: Optional[int] = None,
    max_targets: Optional[int] = None,
    terminated_by: Optional[Set[str]] = None,
    metadata: Optional[Dict[Any, Any]] = None,
):
    """
    Creates a ConTextRule object.

    The primary arguments of `literal` `category`, and `direction` define the span of text to be matched, the
    semantic category, and the direction within the sentence in which the modifier operates.
    Other arguments specify additional custom logic such as:
        - Additional control over what text can be matched as a modifier (pattern and on_match)
        - Which types of targets can be modified (allowed_types, excluded_types)
        - The scope size and number of targets that a modifier can modify (max_targets, max_scope)
        - Other logic for terminating a span or for allowing a modifier to modify a target (on_modifies,
        terminated_by)

    Args:
        literal: The string representation of a concept. If `pattern` is None, this string will be lower-cased and
            matched to the lower-case string. If `pattern` is not None, this argument will not be used for matching
            but can be used as a reference as the rule name.
        category: The semantic class of the matched span. This corresponds to the `label_` attribute of an entity.
        pattern: A list or string to use as a spaCy pattern rather than `literal`. If a list, will use spaCy
            token-based pattern matching to match using token attributes. If a string, will use medspaCy's
            RegexMatcher. If None, will use `literal` as the pattern for phrase matching. For more information, see
            https://spacy.io/usage/rule-based-matching.
        direction: The directionality or action of a modifier. This defines which part of a sentence a modifier will
            include as its scope. Entities within the scope will be considered to be modified.
            Valid values are:
            - "FORWARD": Scope will begin after the end of a modifier and move to the right
            - "BACKWARD": Scope will begin before the beginning of a modifier and move to the left
            - "BIDIRECTIONAL": Scope will expand on either side of a modifier
            - "TERMINATE": A special direction to limit any other modifiers if this phrase is in its scope. Example:
                "no evidence of chf but there is pneumonia": "but" will prevent "no evidence of" from modifying
                "pneumonia"
            - "PSEUDO": A special direction which will not modify any targets. This can be used for differentiating
                superstrings of modifiers. Example: A modifier with literal="negative attitude" will prevent the
                phrase "negative" in "She has a negative attitude about her treatment" from being extracted as a
                modifier.
        on_match: An optional callback function or other callable which takes 4 arguments: `(matcher, doc, i,
            matches)`. For more information, see https://spacy.io/usage/rule-based-matching#on_match
        on_modifies: Callback function to run when building an edge between a target and a modifier. This allows
            specifying custom logic for allowing or preventing certain modifiers from modifying certain targets. The
            callable should take 3 arguments:
                target: The spaCy Span from doc.ents (ie., 'Evidence of pneumonia')
                modifier: The spaCy Span covered in a resulting modifier (ie., 'no evidence of')
                span_between: The Span between the target and modifier in question.
            Should return either True or False. If returns False, then the modifier will not modify the target.
        allowed_types: A collection of target labels to allow a modifier to modify. If None, will apply to any type
            not specifically excluded in excluded_types. Only one of allowed_types and excluded_types can be used.
            An error will be thrown if both are not None.
        excluded_types: A collection of target labels which this modifier cannot modify. If None, will apply to all
            target types unless allowed_types is not None.
        max_scope: A number of tokens to explicitly limit the size of the modifier's scope. If None, the scope will
            include the entire sentence in the direction of `direction` and the entire sentence for "BIDIRECTIONAL".
            This is useful for requiring modifiers be very close to a concept in the text or for preventing long
            modifier ranges caused by sentence splitting problems.
        max_targets: The maximum number of targets which a modifier can modify. If None, will modify all targets in
            its scope.
        terminated_by: An optional collection of other modifier categories which will terminate the scope of this
            modifier. If None, only "TERMINATE" will do this. Example: if a ConTextRule defining "positive for" has
            terminated_by={"NEGATED_EXISTENCE"}, then in the sentence "positive for flu, negative for RSV", the
            positive modifier will modify "flu" but will be terminated by "negative for" and will not modify "RSV".
            This helps prevent multiple conflicting modifiers from distributing too far across a sentence.
        metadata: Optional dictionary of any extra metadata.
    """
    super().__init__(literal, category.upper(), pattern, on_match, metadata)
    self.on_modifies = on_modifies

    if allowed_types is not None and excluded_types is not None:
        raise ValueError(
            "A ConTextRule was instantiated with non-null values for both allowed_types and excluded_types. "
            "Only one of these can be non-null."
        )
    if allowed_types is not None:
        self.allowed_types = {label.upper() for label in allowed_types}
    else:
        self.allowed_types = None
    if excluded_types is not None:
        self.excluded_types = {label.upper() for label in excluded_types}
    else:
        self.excluded_types = None

    if max_targets is not None and max_targets <= 0:
        raise ValueError("max_targets must be >= 0 or None.")
    self.max_targets = max_targets
    if max_scope is not None and max_scope <= 0:
        raise ValueError("max_scope must be >= 0 or None.")
    self.max_scope = max_scope
    if terminated_by is None:
        terminated_by = set()
    else:
        if isinstance(terminated_by, str):
            raise ValueError(
                f"terminated_by must be an iterable, such as a list or set, not {terminated_by}."
            )
        terminated_by = {string.upper() for string in terminated_by}

    self.terminated_by = terminated_by

    self.metadata = metadata

    if direction.upper() not in self._ALLOWED_DIRECTIONS:
        raise ValueError(
            "Direction {0} not recognized. Must be one of: {1}".format(
                direction, self._ALLOWED_DIRECTIONS
            )
        )
    self.direction = direction.upper()

from_dict(rule_dict) classmethod

Reads a dictionary into a ConTextRule.

Parameters:

Name Type Description Default
rule_dict

The dictionary to convert.

required

Returns:

Type Description
ConTextRule

The ConTextRule created from the dictionary.

Source code in medspacy/context/context_rule.py
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
@classmethod
def from_dict(cls, rule_dict) -> ConTextRule:
    """
    Reads a dictionary into a ConTextRule.

    Args:
        rule_dict: The dictionary to convert.

    Returns:
        The ConTextRule created from the dictionary.
    """
    keys = set(rule_dict.keys())
    invalid_keys = keys.difference(cls._ALLOWED_KEYS)
    if invalid_keys:
        msg = (
            "JSON object contains invalid keys: {0}.\n"
            "Must be one of: {1}".format(invalid_keys, cls._ALLOWED_KEYS)
        )
        raise ValueError(msg)
    rule = ConTextRule(**rule_dict)
    return rule

from_json(filepath) classmethod

Reads in a lexicon of modifiers from a JSON file under the key context_rules.

Parameters:

Name Type Description Default
filepath

The .json file containing modifier rules. Must contain context_rules key containing the rule JSONs.

required

Returns:

Type Description
List[ConTextRule]

A list of ConTextRules objects read from the JSON.

Source code in medspacy/context/context_rule.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
@classmethod
def from_json(cls, filepath) -> List[ConTextRule]:
    """
    Reads in a lexicon of modifiers from a JSON file under the key `context_rules`.

    Args:
        filepath: The .json file containing modifier rules. Must contain `context_rules` key containing the rule
            JSONs.

    Returns:
        A list of ConTextRules objects read from the JSON.
    """

    with open(filepath) as file:
        modifier_data = json.load(file)
    context_rules = []
    for data in modifier_data["context_rules"]:
        context_rules.append(ConTextRule.from_dict(data))
    return context_rules

to_dict()

Converts ConTextItems to a python dictionary. Used when writing context rules to a json file.

Returns:

Type Description

The dictionary containing the ConTextRule info.

Source code in medspacy/context/context_rule.py
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
def to_dict(self):
    """
    Converts ConTextItems to a python dictionary. Used when writing context rules to a json file.

    Returns:
        The dictionary containing the ConTextRule info.
    """

    rule_dict = {}
    for key in self._ALLOWED_KEYS:
        value = self.__dict__.get(key)
        if isinstance(value, set):
            value = list(value)
        if value is not None:
            rule_dict[key] = value
    return rule_dict

to_json(context_rules, filepath) classmethod

Writes ConTextItems to a json file.

Args: context_rules: a list of ContextRules that will be written to a file. filepath: the .json file to contain modifier rules

Source code in medspacy/context/context_rule.py
224
225
226
227
228
229
230
231
232
233
234
235
236
@classmethod
def to_json(cls, context_rules: List[ConTextRule], filepath: str):
    """Writes ConTextItems to a json file.

        Args:
        context_rules: a list of ContextRules that will be written to a file.
        filepath: the .json file to contain modifier rules
    """
    import json

    data = {"context_rules": [rule.to_dict() for rule in context_rules]}
    with open(filepath, "w") as file:
        json.dump(data, file, indent=4)