Single Page
All docs as a single page.
MedSpaCy
_extensions
This module will set extension attributes and methods for medspaCy. Examples include custom methods like span._.window()
any_context_attribute(span)
Return True if any of the ConText assertion attributes (is_negated, is_historical, etc.) is True.
Source code in medspacy/_extensions.py
130 131 132 | |
data_to_rows(data)
Unzip column-wise data from doc._.data into rows
Source code in medspacy/_extensions.py
243 244 245 246 247 | |
get_context_attributes(span)
Return a dict of all ConText assertion attributes (is_negated, is_historical, etc.) and their values.
Source code in medspacy/_extensions.py
110 111 112 113 114 115 116 117 | |
get_extensions()
Get a list of medspaCy extensions for Token, Span, and Doc classes.
Source code in medspacy/_extensions.py
43 44 45 46 47 48 49 | |
get_span_literal(span)
Get the literal value from an entity's TargetRule, which is set when an entity is extracted by TargetMatcher. If the span does not have a TargetRule, it returns the lower-cased text.
Source code in medspacy/_extensions.py
120 121 122 123 124 125 126 127 | |
get_window_span(span, n=1, left=True, right=True)
Get a Span of a window of text containing a span. Args: n (int): Number of tokens on each side of a span to return. Default 1. left (bool): Whether to include the span precedinga span. Default True. right (bool): Whether to include the span following a span. Default True. Returns: a spaCy Span
Source code in medspacy/_extensions.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
get_window_token(token, n=1, left=True, right=True)
Get a Span of a window of text containing a token. Args: n (int): Number of tokens on each side of token to return. Default 1. left (bool): Whether to include the span preceding token. Default True. right (bool): Whether to include the span following token. Default True. Returns: a spaCy Span
Source code in medspacy/_extensions.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | |
set_extensions()
Set custom medspaCy extensions for Token, Span, and Doc classes.
Source code in medspacy/_extensions.py
11 12 13 14 15 | |
common
base_rule
BaseRule
BaseRule is the basic class for the rules contained in the MedspacyMatcher class. It contains the basic structure for a rule to be used by the spaCy matchers or by the RegexMatcher class in order to produce match tuples for processing by a component such as the Sectionizer, ContextComponent or TargetMatcher
Source code in medspacy/common/base_rule.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
__init__(literal, category, pattern=None, on_match=None, metadata=None)
Base class for medspaCy rules such as TargetRule and ConTextRule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
literal
|
str
|
The plaintext form of the pattern. Can be a human-readable form of a more complex pattern or, if
|
required |
category
|
str
|
The category for the match. Corresponds to ent.label_ for entities. |
required |
pattern
|
Optional[Union[str, List[Dict[str, str]]]]
|
A list or string to use as a spaCy pattern rather than |
None
|
on_match
|
Optional[Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]]
|
An optional callback function or other callable which takes 4 arguments: |
None
|
metadata
|
Optional[Dict[Any, Any]]
|
Optional dictionary of any extra metadata. |
None
|
Source code in medspacy/common/base_rule.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
medspacy_matcher
MedspacyMatcher
MedspacyMatcher is a class which combines spaCy's Matcher and PhraseMatcher classes along with medspaCy's RegexMatcher and acts as one single matcher using 3 different types of rules: - Exact phrases - List of dictionaries for matching on token attributes (see https://spacy.io/usage/rule-based-matching#matcher) - Regular expression matches. Note that regular-expression matching is not natively supported by spaCy and could result in unexpected matched spans if match boundaries do not align with token boundaries. Rules can be defined by any class which inherits from medspacy.common.BaseRule, such as: medspacy.target_matcher.TargetRule medspacy.context.ConTextRule
Source code in medspacy/common/medspacy_matcher.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
labels
property
The set of labels available to the matcher.
Returns:
| Type | Description |
|---|---|
Set[str]
|
A set of labels containing the labels for all the rules added to the matcher. |
rule_map
property
The dictionary mapping a rule's id to the rule object.
Returns:
| Type | Description |
|---|---|
Dict[str, BaseRule]
|
A dictionary mapping the rule's id to the rule. |
rules
property
The list of rules used by the MedspacyMatcher.
Returns:
| Type | Description |
|---|---|
List[BaseRule]
|
A list of rules, all of which inherit from BaseRule. |
__call__(doc)
Call MedspacyMatcher on a doc and return a single list of matches. If self.prune is True, in the case of overlapping matches the longest will be returned.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spaCy Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int]]
|
A list of tuples, each containing 3 ints representing the individual match (match_id, start, end). |
Source code in medspacy/common/medspacy_matcher.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
__init__(nlp, name='medspacy_matcher', phrase_matcher_attr='LOWER', prune=True)
Creates a MedspacyMatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nlp
|
Language
|
A spaCy Language model. |
required |
name
|
str
|
The name of the component. |
'medspacy_matcher'
|
phrase_matcher_attr
|
str
|
The attribute to use for spaCy's PhraseMatcher. Default is 'LOWER'. |
'LOWER'
|
prune
|
bool
|
Whether to prune matches that overlap or are substrings of another match. For example, if "no history of" and "history of" are both matches, setting prune to True would drop "history of". Default is True. |
True
|
Source code in medspacy/common/medspacy_matcher.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
add(rules)
Adds a collection of rules to the matcher. Rules must inherit from medspacy.common.BaseRule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Iterable[BaseRule]
|
A collection of rules. Each rule must inherit from |
required |
Source code in medspacy/common/medspacy_matcher.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | |
regex_matcher
RegexMatcher
The RegexMatcher is an alternative to spaCy's native Matcher and PhraseMatcher classes and allows matching based on typical regular expressions over the underlying doc text rather than spacy token attributes.
This can be useful for allowing more traditional text matching methods, but can lead to issues if the matched spans
in the text do not line up with spacy token boundaries. In this case, the RegexMatcher will by default resolve to
the nearest token boundaries by expanding to the left and right. This behavior can be configured using
resolve_start and resolve_end. To avoid this, consider using a list of dicts, such as in a spacy Matcher.
For more information, see: https://spacy.io/usage/rule-based-matching
Examples of resolve_start/resolve_end: In the string 'SERVICE: Radiology' the pattern 'ICE: Rad' would match in the middle of the tokens 'SERVICE' and 'RADIOLOGY'. SpaCy would normally return None. The RegexMatcher will expand in the following ways: resolve_start='left': The resulting span will start at 'SERVICE' -> 'SERVICE: Radiology' resolve_start='right': The resulting span will start at ':' -> ': Radiology' resolve_end='left': The resulting span will end at ':': -> 'SERVICE:' resolve_end='right': The resulting span will end at 'RADIOLOGY' -> 'SERVICE: Radiology'
Source code in medspacy/common/regex_matcher.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
__call__(doc)
Call the RegexMatcher on a spaCy Doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spaCy doc to process. |
required |
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int]]
|
The list of match tuples (match_id, start, end). |
Source code in medspacy/common/regex_matcher.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
__init__(vocab, flags=re.IGNORECASE, resolve_start='left', resolve_end='right')
Creates a new RegexMatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocab
|
Vocab
|
A spaCy model vocabulary |
required |
flags
|
RegexFlag
|
Regular expression flag. Default re.IGNORECASE |
IGNORECASE
|
resolve_start
|
str
|
How to resolve if the start character index of a match does not align with spacy token boundaries. If 'left', will find the nearest token boundary to the left of the unmatched character index, leading to a longer than expected span. If 'right', will find the nearest token boundary to the right of the unmatched character index, leading to a shorter than expected span. Default 'left'. |
'left'
|
resolve_end
|
str
|
How to resolve if the end character index of a match does not align with spacy token boundaries. If 'left', will find the nearest token boundary to the left of the unmatched character index, leading to a shorter than expected span. If 'right', will find the nearest token boundary to the right of the unmatched character index, leading to a longer than expected span. Default 'right'. |
'right'
|
Source code in medspacy/common/regex_matcher.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | |
add(match_id, regex_rules, on_match=None)
Add a rule with one or more regex patterns to one match id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
match_id
|
str
|
The name of the pattern. |
required |
regex_rules
|
Iterable[str]
|
The list of regex strings to associate with |
required |
on_match
|
Optional[Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]]
|
An optional callback function or other callable which takes 4 arguments: |
None
|
Source code in medspacy/common/regex_matcher.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
util
This module will contain helper functions and classes for common clinical processing tasks which will be used in medspaCy's matcher objects.
get_token_for_char(doc, char_idx, resolve='left')
Get the token index that best matches a particular character index. Because regex find returns a character index and spaCy matches must align with token boundaries, each character index must be converted into a token index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spaCy Doc to search in. |
required |
char_idx
|
int
|
The character index to find the corresponding token for. |
required |
resolve
|
str
|
The resolution type. "left" will snap character to the token index to the left which precede the |
'left'
|
Returns:
| Type | Description |
|---|---|
Union[Token, None]
|
The token that best fits the character index based on the resolution type. |
Source code in medspacy/common/util.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |
matches_to_spans(doc, matches, set_label=True)
Converts all identified matches to spans.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spaCy doc corresponding to the matches. |
required |
matches
|
List[Tuple[int, int, int]]
|
The list of match Tuples (match_id, start, end). |
required |
set_label
|
bool
|
Whether to assign a label to the span based off the source rule. Default is True. |
True
|
Returns:
| Type | Description |
|---|---|
List[Span]
|
A list of spacy spans corresponding to the input matches. |
Source code in medspacy/common/util.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
overlaps(a, b)
Checks whether two match Tuples out of spacy matchers overlap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
Tuple[int, int, int]
|
A match Tuple (match_id, start, end). |
required |
b
|
Tuple[int, int, int]
|
A match Tuple (match_id, start, end). |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the tuples overlap. |
Source code in medspacy/common/util.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
prune_overlapping_matches(matches, strategy='longest')
Prunes overlapping matches from a list of spaCy match tuples (match_id, start, end).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
matches
|
List[Tuple[int, int, int]]
|
A list of match tuples of form (match_id, start, end). |
required |
strategy
|
str
|
The pruning strategy to use. At this time, the only available option is "longest" and will keep the longest of any two overlapping spans. Other behavior will be added in a future update. |
'longest'
|
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int]]
|
The pruned list of matches. |
Source code in medspacy/common/util.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
span_contains(span, target, regex=True, case_insensitive=True)
Return True if a Span object contains a target phrase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Union[Doc, Span]
|
A spaCy Doc or Span, such as an entity in doc.ents |
required |
target
|
str
|
A target phrase or iterable of phrases to check in span.text.lower(). |
required |
regex
|
bool
|
Whether to search the span using a regular expression rather than a literal string. Default is True. |
True
|
case_insensitive
|
bool
|
Whether the matching is case-insensitive. Default is True. |
True
|
Source code in medspacy/common/util.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
context
ConText
The ConText for spaCy processing.
This component matches modifiers in a Doc, defines their scope, and identifies edges between targets and modifiers. Sets two spaCy extensions: - Span..modifiers: a list of ConTextModifier objects which modify a target Span - Doc..context_graph: a ConText graph object which contains the targets, modifiers, and edges between them.
Source code in medspacy/context/context.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
categories
property
Returns list of categories available that Context might produce.
input_span_type
property
writable
The input source of entities for the component. Must be either "ents" corresponding to doc.ents or "group" for a spaCy span group.
Returns:
| Type | Description |
|---|---|
|
The input type, "ents" or "group". |
rules
property
Returns list of ConTextRules available to context.
span_group_name
property
writable
The name of the span group used by this component. If input_type is "group", calling this component will
use spans in the span group with this name.
Returns:
| Type | Description |
|---|---|
str
|
The span group name. |
__call__(doc, targets=None)
Applies the ConText algorithm to a Doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
The spaCy Doc to process. |
required | |
targets
|
str
|
The optional custom attribute extension on doc to run over. Must contain an iterable of Span objects |
None
|
Returns:
| Type | Description |
|---|---|
Doc
|
The processed spaCy Doc. |
Source code in medspacy/context/context.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
__init__(nlp, name='medspacy_context', rules='default', language_code='en', phrase_matcher_attr='LOWER', allowed_types=None, excluded_types=None, terminating_types=None, max_scope=None, max_targets=None, prune_on_modifier_overlap=True, prune_on_target_overlap=False, span_attrs='default', input_span_type='ents', span_group_name='medspacy_spans')
Creates a new ConText object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nlp
|
Language
|
A SpaCy Language object. |
required |
name
|
str
|
The name of the component. |
'medspacy_context'
|
rules
|
Optional[str]
|
The rules to load. Default is "default", loads rules packaged with medspaCy that are derived from
original ConText rules and years of practical applications at the US Department of Veterans Affairs. If
None, no rules are loaded. Otherwise, must be a path to a json file containing rules. Add ConTextRules
directly through |
'default'
|
language_code
|
str
|
Language code to use (ISO code) as a default for loading resources. See documentation and also the /resources directory to see which resources might be available in each language. Default is "en" for English. |
'en'
|
phrase_matcher_attr
|
str
|
The token attribute to use for PhraseMatcher for rules where |
'LOWER'
|
allowed_types
|
Optional[Set[str]]
|
A global list of types included by context. Rules will operate on only spans with these labels. |
None
|
excluded_types
|
Optional[Set[str]]
|
A global list of types excluded by context. Rules will not operate on spans with these labels. |
None
|
terminating_types
|
Optional[Dict[str, Iterable[str]]]
|
A global map of types to the types that can terminate them. This can be used to apply terminations to all rules of a particular type rather than adding to every rule individually in the ContextRule object. |
None
|
max_scope
|
Optional[int]
|
The number of tokens around a modifier in a target can be modified. Default value is None, Context will use the sentence boundaries. If a value greater than zero, applies the window globally. Both options will be overridden by a more specific value in a ContextRule. |
None
|
max_targets
|
Optional[int]
|
The maximum number of targets a modifier can modify. Default value is None, context will modify all targets in its scope. If a value greater than zero, applies this value globally. Both options will be overridden by a more specific value in a ContextRule. |
None
|
prune_on_modifier_overlap
|
bool
|
Whether to prune modifiers which are substrings of another modifier. If True, will drop substrings completely. For example, if "no history of" and "history of" are both ConTextRules,both will match the text "no history of afib", but only "no history of" should modify afib. Default True. |
True
|
prune_on_target_overlap
|
bool
|
Whether to remove any matched modifiers which overlap with target entities. If False, any overlapping modifiers will not modify the overlapping entity but will still modify any other targets in its scope. Default False. |
False
|
span_attrs
|
Union[Literal['default'], Dict[str, Dict[str, Any]], None]
|
The optional span attributes to modify. Default option "default" uses attributes in
|
'default'
|
input_span_type
|
Union[Literal['ents', 'group']]
|
"ents" or "group". Where to look for targets. "ents" will modify attributes of spans
in doc.ents. "group" will modify attributes of spans in the span group specified by |
'ents'
|
span_group_name
|
str
|
The name of the span group used when |
'medspacy_spans'
|
Source code in medspacy/context/context.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
add(rules)
Adds ConTextRules to Context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
A single ConTextRule or a collection of ConTextRules to add to the Sectionizer. |
required |
Source code in medspacy/context/context.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | |
register_default_attributes()
classmethod
Registers the default values for the Span attributes defined in DEFAULT_ATTRIBUTES.
Source code in medspacy/context/context.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
register_graph_attributes()
classmethod
Registers spaCy attribute extensions: Span..modifiers and Doc..context_graph.
Source code in medspacy/context/context.py
245 246 247 248 249 250 251 252 253 254 | |
set_context_attributes(edges)
Adds Span-level attributes to targets with modifiers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
edges
|
The edges of the ContextGraph to modify. |
required |
Source code in medspacy/context/context.py
273 274 275 276 277 278 279 280 281 282 283 284 | |
ConTextGraph
The ConTextGraph class defines the internal structure of the ConText algorithm. It stores a collection of modifiers, matched with ConTextRules, and targets from some other source such as the TargetMatcher or a spaCy NER model.
Each modifier can have some number of associated targets that it modifies. This relationship is stored as edges of of the graph.
Source code in medspacy/context/context_graph.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
__init__(targets=None, modifiers=None, edges=None, prune_on_modifier_overlap=False)
Creates a new ConTextGraph object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
targets
|
Optional[List[Span]]
|
A spans that context might modify. |
None
|
modifiers
|
Optional[List[ConTextModifier]]
|
A list of ConTextModifiers that might modify the targets. |
None
|
edges
|
Optional[List]
|
A list of edges between targets and modifiers representing the modification relationship. |
None
|
prune_on_modifier_overlap
|
bool
|
Whether to prune modifiers when one modifier completely covers another. |
False
|
Source code in medspacy/context/context_graph.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
apply_modifiers()
Checks each target/modifier pair. If modifier modifies target, create an edge between them.
Source code in medspacy/context/context_graph.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
from_serialized_representation(serialized_representation)
classmethod
Creates the ConTextGraph from the serialized representation
Source code in medspacy/context/context_graph.py
98 99 100 101 102 103 104 105 | |
serialized_representation()
Returns the serialized representation of the ConTextGraph
Source code in medspacy/context/context_graph.py
92 93 94 95 96 | |
update_scopes()
Update the scope of all ConTextModifier.
For each modifier in a list of ConTextModifiers, check against each other modifier to see if one of the modifiers should update the other. This allows neighboring similar modifiers to extend each other's scope and allows "terminate" modifiers to end a modifier's scope.
Source code in medspacy/context/context_graph.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
ConTextModifier
Represents a concept found by ConText in a document. An instance of this class is the result of ConTextRule matching text in a Doc.
Source code in medspacy/context/context_modifier.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 | |
allowed_types
property
Returns the associated allowed types.
category
property
Returns the associated category.
direction
property
Returns the associated direction.
excluded_types
property
Returns the associated excluded types.
max_scope
property
Returns the associated maximum scope.
max_targets
property
Returns the associated maximum number of targets.
modifier_span
property
The spaCy Span object, which is a view of self.doc, covered by this match.
num_targets
property
Returns the associated number of targets.
rule
property
Returns the associated context rule.
scope_span
property
Returns the associated scope.
__init__(context_rule, start, end, doc, scope_start=None, scope_end=None, max_scope=None)
Create a new ConTextModifier from a document span. Each modifier represents a span in the text and a surrounding window. Spans such as entities or other members of span groups that occur within this window can be modified by this ConTextModifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context_rule
|
ConTextRule
|
The ConTextRule object which defines the modifier. |
required |
start
|
int
|
The start token index. |
required |
end
|
int
|
The end token index (non-inclusive). |
required |
doc
|
Doc
|
The spaCy Doc which contains this span. This is needed to initialize the modifier but is not maintained. |
required |
scope_start
|
Optional[int]
|
The start token index of the scope. |
None
|
scope_end
|
Optional[int]
|
The end index of the scope. |
None
|
max_scope
|
Optional[int]
|
Whether to use scope values rather than sentence boundaries for modifications. |
None
|
Source code in medspacy/context/context_modifier.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
__set_scope(doc)
Applies the direction of the ConTextRule which generated this ConTextModifier to define a scope. If self._max_scope is None, then the default scope is the sentence which it occurs in whichever direction defined by self.direction. For example, if the direction is "forward", the scope will be [self.end: sentence.end]. If the direction is "backward", it will be [self.start: sentence.start].
If self.max_scope is not None and the length of the default scope is longer than self.max_scope, it will be reduced to self.max_scope.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spaCy doc to use to set scope. |
required |
Source code in medspacy/context/context_modifier.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
allows(target_label)
Returns whether if a modifier is able to modify a target type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_label
|
str
|
The target type to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the modifier is allowed to modify a target of the specified type. True if |
bool
|
|
Source code in medspacy/context/context_modifier.py
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 | |
from_serialized_representation(serialized_representation)
classmethod
Instantiates the class from the serialized representation
Source code in medspacy/context/context_modifier.py
379 380 381 382 383 384 385 386 387 388 389 390 391 | |
limit_scope(other)
If self and other have the same category or if other has a directionality of 'terminate', use the span of other to update the scope of self. Limiting the scope of two modifiers of the same category reduces the number of modifiers. For example, in 'no evidence of CHF, no pneumonia', 'pneumonia' will only be modified by 'no', not 'no evidence of'. 'terminate' modifiers limit the scope of a modifier like 'no evidence of' in 'no evidence of CHF, but there is pneumonia'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
ConTextModifier
|
The modifier to check against. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the other modifier modified the scope of self. |
Source code in medspacy/context/context_modifier.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
modifies(target)
Checks whether the target is within the modifier scope and if self is allowed to modify target.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
Span
|
a spaCy span representing a target concept. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the target is within |
Source code in medspacy/context/context_modifier.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 | |
modify(target)
Add target to the list of self._targets and increment self._num_targets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
Span
|
The spaCy span to add. |
required |
Source code in medspacy/context/context_modifier.py
321 322 323 324 325 326 327 328 329 | |
on_modifies(target)
If the ConTextRule used to define a ConTextModifier has an on_modifies callback function, evaluate and return
either True or False.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
Span
|
The spaCy span to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
The result of the |
Source code in medspacy/context/context_modifier.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
reduce_targets()
Reduces the number of targets to the n-closest targets based on the value of self.max_targets. If
self.max_targets is None, no pruning is done.
Source code in medspacy/context/context_modifier.py
331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 | |
serialized_representation()
Serialized Representation of the modifier
Source code in medspacy/context/context_modifier.py
365 366 367 368 369 370 371 372 373 374 375 376 377 | |
update_scope(span)
Changes the scope of self to be the given spaCy span.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
a spaCy Span which contains the scope which a modifier should cover. |
required |
Source code in medspacy/context/context_modifier.py
193 194 195 196 197 198 199 200 201 | |
ConTextRule
Bases: BaseRule
A ConTextRule defines a ConText modifier. ConTextRules are rules which define which spans are extracted as modifiers and how they behave, such as the phrase to be matched, the category/semantic class, the direction of the modifier in the text, and what types of target spans can be modified.
Source code in medspacy/context/context_rule.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
__init__(literal, category, pattern=None, direction='BIDIRECTIONAL', on_match=None, on_modifies=None, allowed_types=None, excluded_types=None, max_scope=None, max_targets=None, terminated_by=None, metadata=None)
Creates a ConTextRule object.
The primary arguments of literal category, and direction define the span of text to be matched, the
semantic category, and the direction within the sentence in which the modifier operates.
Other arguments specify additional custom logic such as:
- Additional control over what text can be matched as a modifier (pattern and on_match)
- Which types of targets can be modified (allowed_types, excluded_types)
- The scope size and number of targets that a modifier can modify (max_targets, max_scope)
- Other logic for terminating a span or for allowing a modifier to modify a target (on_modifies,
terminated_by)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
literal
|
str
|
The string representation of a concept. If |
required |
category
|
str
|
The semantic class of the matched span. This corresponds to the |
required |
pattern
|
Optional[Union[str, List[Dict[str, str]]]]
|
A list or string to use as a spaCy pattern rather than |
None
|
direction
|
str
|
The directionality or action of a modifier. This defines which part of a sentence a modifier will include as its scope. Entities within the scope will be considered to be modified. Valid values are: - "FORWARD": Scope will begin after the end of a modifier and move to the right - "BACKWARD": Scope will begin before the beginning of a modifier and move to the left - "BIDIRECTIONAL": Scope will expand on either side of a modifier - "TERMINATE": A special direction to limit any other modifiers if this phrase is in its scope. Example: "no evidence of chf but there is pneumonia": "but" will prevent "no evidence of" from modifying "pneumonia" - "PSEUDO": A special direction which will not modify any targets. This can be used for differentiating superstrings of modifiers. Example: A modifier with literal="negative attitude" will prevent the phrase "negative" in "She has a negative attitude about her treatment" from being extracted as a modifier. |
'BIDIRECTIONAL'
|
on_match
|
Optional[Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]]
|
An optional callback function or other callable which takes 4 arguments: |
None
|
on_modifies
|
Optional[Callable[[Span, Span, Span], bool]]
|
Callback function to run when building an edge between a target and a modifier. This allows specifying custom logic for allowing or preventing certain modifiers from modifying certain targets. The callable should take 3 arguments: target: The spaCy Span from doc.ents (ie., 'Evidence of pneumonia') modifier: The spaCy Span covered in a resulting modifier (ie., 'no evidence of') span_between: The Span between the target and modifier in question. Should return either True or False. If returns False, then the modifier will not modify the target. |
None
|
allowed_types
|
Optional[Set[str]]
|
A collection of target labels to allow a modifier to modify. If None, will apply to any type not specifically excluded in excluded_types. Only one of allowed_types and excluded_types can be used. An error will be thrown if both are not None. |
None
|
excluded_types
|
Optional[Set[str]]
|
A collection of target labels which this modifier cannot modify. If None, will apply to all target types unless allowed_types is not None. |
None
|
max_scope
|
Optional[int]
|
A number of tokens to explicitly limit the size of the modifier's scope. If None, the scope will
include the entire sentence in the direction of |
None
|
max_targets
|
Optional[int]
|
The maximum number of targets which a modifier can modify. If None, will modify all targets in its scope. |
None
|
terminated_by
|
Optional[Set[str]]
|
An optional collection of other modifier categories which will terminate the scope of this modifier. If None, only "TERMINATE" will do this. Example: if a ConTextRule defining "positive for" has terminated_by={"NEGATED_EXISTENCE"}, then in the sentence "positive for flu, negative for RSV", the positive modifier will modify "flu" but will be terminated by "negative for" and will not modify "RSV". This helps prevent multiple conflicting modifiers from distributing too far across a sentence. |
None
|
metadata
|
Optional[Dict[Any, Any]]
|
Optional dictionary of any extra metadata. |
None
|
Source code in medspacy/context/context_rule.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
from_dict(rule_dict)
classmethod
Reads a dictionary into a ConTextRule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule_dict
|
The dictionary to convert. |
required |
Returns:
| Type | Description |
|---|---|
ConTextRule
|
The ConTextRule created from the dictionary. |
Source code in medspacy/context/context_rule.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | |
from_json(filepath)
classmethod
Reads in a lexicon of modifiers from a JSON file under the key context_rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
The .json file containing modifier rules. Must contain |
required |
Returns:
| Type | Description |
|---|---|
List[ConTextRule]
|
A list of ConTextRules objects read from the JSON. |
Source code in medspacy/context/context_rule.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
to_dict()
Converts ConTextItems to a python dictionary. Used when writing context rules to a json file.
Returns:
| Type | Description |
|---|---|
|
The dictionary containing the ConTextRule info. |
Source code in medspacy/context/context_rule.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
to_json(context_rules, filepath)
classmethod
Writes ConTextItems to a json file.
Args: context_rules: a list of ContextRules that will be written to a file. filepath: the .json file to contain modifier rules
Source code in medspacy/context/context_rule.py
224 225 226 227 228 229 230 231 232 233 234 235 236 | |
context
The ConText definiton.
ConText
The ConText for spaCy processing.
This component matches modifiers in a Doc, defines their scope, and identifies edges between targets and modifiers. Sets two spaCy extensions: - Span..modifiers: a list of ConTextModifier objects which modify a target Span - Doc..context_graph: a ConText graph object which contains the targets, modifiers, and edges between them.
Source code in medspacy/context/context.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
categories
property
Returns list of categories available that Context might produce.
input_span_type
property
writable
The input source of entities for the component. Must be either "ents" corresponding to doc.ents or "group" for a spaCy span group.
Returns:
| Type | Description |
|---|---|
|
The input type, "ents" or "group". |
rules
property
Returns list of ConTextRules available to context.
span_group_name
property
writable
The name of the span group used by this component. If input_type is "group", calling this component will
use spans in the span group with this name.
Returns:
| Type | Description |
|---|---|
str
|
The span group name. |
__call__(doc, targets=None)
Applies the ConText algorithm to a Doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
The spaCy Doc to process. |
required | |
targets
|
str
|
The optional custom attribute extension on doc to run over. Must contain an iterable of Span objects |
None
|
Returns:
| Type | Description |
|---|---|
Doc
|
The processed spaCy Doc. |
Source code in medspacy/context/context.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
__init__(nlp, name='medspacy_context', rules='default', language_code='en', phrase_matcher_attr='LOWER', allowed_types=None, excluded_types=None, terminating_types=None, max_scope=None, max_targets=None, prune_on_modifier_overlap=True, prune_on_target_overlap=False, span_attrs='default', input_span_type='ents', span_group_name='medspacy_spans')
Creates a new ConText object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nlp
|
Language
|
A SpaCy Language object. |
required |
name
|
str
|
The name of the component. |
'medspacy_context'
|
rules
|
Optional[str]
|
The rules to load. Default is "default", loads rules packaged with medspaCy that are derived from
original ConText rules and years of practical applications at the US Department of Veterans Affairs. If
None, no rules are loaded. Otherwise, must be a path to a json file containing rules. Add ConTextRules
directly through |
'default'
|
language_code
|
str
|
Language code to use (ISO code) as a default for loading resources. See documentation and also the /resources directory to see which resources might be available in each language. Default is "en" for English. |
'en'
|
phrase_matcher_attr
|
str
|
The token attribute to use for PhraseMatcher for rules where |
'LOWER'
|
allowed_types
|
Optional[Set[str]]
|
A global list of types included by context. Rules will operate on only spans with these labels. |
None
|
excluded_types
|
Optional[Set[str]]
|
A global list of types excluded by context. Rules will not operate on spans with these labels. |
None
|
terminating_types
|
Optional[Dict[str, Iterable[str]]]
|
A global map of types to the types that can terminate them. This can be used to apply terminations to all rules of a particular type rather than adding to every rule individually in the ContextRule object. |
None
|
max_scope
|
Optional[int]
|
The number of tokens around a modifier in a target can be modified. Default value is None, Context will use the sentence boundaries. If a value greater than zero, applies the window globally. Both options will be overridden by a more specific value in a ContextRule. |
None
|
max_targets
|
Optional[int]
|
The maximum number of targets a modifier can modify. Default value is None, context will modify all targets in its scope. If a value greater than zero, applies this value globally. Both options will be overridden by a more specific value in a ContextRule. |
None
|
prune_on_modifier_overlap
|
bool
|
Whether to prune modifiers which are substrings of another modifier. If True, will drop substrings completely. For example, if "no history of" and "history of" are both ConTextRules,both will match the text "no history of afib", but only "no history of" should modify afib. Default True. |
True
|
prune_on_target_overlap
|
bool
|
Whether to remove any matched modifiers which overlap with target entities. If False, any overlapping modifiers will not modify the overlapping entity but will still modify any other targets in its scope. Default False. |
False
|
span_attrs
|
Union[Literal['default'], Dict[str, Dict[str, Any]], None]
|
The optional span attributes to modify. Default option "default" uses attributes in
|
'default'
|
input_span_type
|
Union[Literal['ents', 'group']]
|
"ents" or "group". Where to look for targets. "ents" will modify attributes of spans
in doc.ents. "group" will modify attributes of spans in the span group specified by |
'ents'
|
span_group_name
|
str
|
The name of the span group used when |
'medspacy_spans'
|
Source code in medspacy/context/context.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | |
add(rules)
Adds ConTextRules to Context.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
A single ConTextRule or a collection of ConTextRules to add to the Sectionizer. |
required |
Source code in medspacy/context/context.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | |
register_default_attributes()
classmethod
Registers the default values for the Span attributes defined in DEFAULT_ATTRIBUTES.
Source code in medspacy/context/context.py
256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | |
register_graph_attributes()
classmethod
Registers spaCy attribute extensions: Span..modifiers and Doc..context_graph.
Source code in medspacy/context/context.py
245 246 247 248 249 250 251 252 253 254 | |
set_context_attributes(edges)
Adds Span-level attributes to targets with modifiers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
edges
|
The edges of the ContextGraph to modify. |
required |
Source code in medspacy/context/context.py
273 274 275 276 277 278 279 280 281 282 283 284 | |
context_graph
ConTextGraph
The ConTextGraph class defines the internal structure of the ConText algorithm. It stores a collection of modifiers, matched with ConTextRules, and targets from some other source such as the TargetMatcher or a spaCy NER model.
Each modifier can have some number of associated targets that it modifies. This relationship is stored as edges of of the graph.
Source code in medspacy/context/context_graph.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
__init__(targets=None, modifiers=None, edges=None, prune_on_modifier_overlap=False)
Creates a new ConTextGraph object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
targets
|
Optional[List[Span]]
|
A spans that context might modify. |
None
|
modifiers
|
Optional[List[ConTextModifier]]
|
A list of ConTextModifiers that might modify the targets. |
None
|
edges
|
Optional[List]
|
A list of edges between targets and modifiers representing the modification relationship. |
None
|
prune_on_modifier_overlap
|
bool
|
Whether to prune modifiers when one modifier completely covers another. |
False
|
Source code in medspacy/context/context_graph.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | |
apply_modifiers()
Checks each target/modifier pair. If modifier modifies target, create an edge between them.
Source code in medspacy/context/context_graph.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | |
from_serialized_representation(serialized_representation)
classmethod
Creates the ConTextGraph from the serialized representation
Source code in medspacy/context/context_graph.py
98 99 100 101 102 103 104 105 | |
serialized_representation()
Returns the serialized representation of the ConTextGraph
Source code in medspacy/context/context_graph.py
92 93 94 95 96 | |
update_scopes()
Update the scope of all ConTextModifier.
For each modifier in a list of ConTextModifiers, check against each other modifier to see if one of the modifiers should update the other. This allows neighboring similar modifiers to extend each other's scope and allows "terminate" modifiers to end a modifier's scope.
Source code in medspacy/context/context_graph.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | |
context_modifier
ConTextModifier
Represents a concept found by ConText in a document. An instance of this class is the result of ConTextRule matching text in a Doc.
Source code in medspacy/context/context_modifier.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 | |
allowed_types
property
Returns the associated allowed types.
category
property
Returns the associated category.
direction
property
Returns the associated direction.
excluded_types
property
Returns the associated excluded types.
max_scope
property
Returns the associated maximum scope.
max_targets
property
Returns the associated maximum number of targets.
modifier_span
property
The spaCy Span object, which is a view of self.doc, covered by this match.
num_targets
property
Returns the associated number of targets.
rule
property
Returns the associated context rule.
scope_span
property
Returns the associated scope.
__init__(context_rule, start, end, doc, scope_start=None, scope_end=None, max_scope=None)
Create a new ConTextModifier from a document span. Each modifier represents a span in the text and a surrounding window. Spans such as entities or other members of span groups that occur within this window can be modified by this ConTextModifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context_rule
|
ConTextRule
|
The ConTextRule object which defines the modifier. |
required |
start
|
int
|
The start token index. |
required |
end
|
int
|
The end token index (non-inclusive). |
required |
doc
|
Doc
|
The spaCy Doc which contains this span. This is needed to initialize the modifier but is not maintained. |
required |
scope_start
|
Optional[int]
|
The start token index of the scope. |
None
|
scope_end
|
Optional[int]
|
The end index of the scope. |
None
|
max_scope
|
Optional[int]
|
Whether to use scope values rather than sentence boundaries for modifications. |
None
|
Source code in medspacy/context/context_modifier.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
__set_scope(doc)
Applies the direction of the ConTextRule which generated this ConTextModifier to define a scope. If self._max_scope is None, then the default scope is the sentence which it occurs in whichever direction defined by self.direction. For example, if the direction is "forward", the scope will be [self.end: sentence.end]. If the direction is "backward", it will be [self.start: sentence.start].
If self.max_scope is not None and the length of the default scope is longer than self.max_scope, it will be reduced to self.max_scope.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spaCy doc to use to set scope. |
required |
Source code in medspacy/context/context_modifier.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | |
allows(target_label)
Returns whether if a modifier is able to modify a target type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_label
|
str
|
The target type to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the modifier is allowed to modify a target of the specified type. True if |
bool
|
|
Source code in medspacy/context/context_modifier.py
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 | |
from_serialized_representation(serialized_representation)
classmethod
Instantiates the class from the serialized representation
Source code in medspacy/context/context_modifier.py
379 380 381 382 383 384 385 386 387 388 389 390 391 | |
limit_scope(other)
If self and other have the same category or if other has a directionality of 'terminate', use the span of other to update the scope of self. Limiting the scope of two modifiers of the same category reduces the number of modifiers. For example, in 'no evidence of CHF, no pneumonia', 'pneumonia' will only be modified by 'no', not 'no evidence of'. 'terminate' modifiers limit the scope of a modifier like 'no evidence of' in 'no evidence of CHF, but there is pneumonia'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
ConTextModifier
|
The modifier to check against. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the other modifier modified the scope of self. |
Source code in medspacy/context/context_modifier.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
modifies(target)
Checks whether the target is within the modifier scope and if self is allowed to modify target.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
Span
|
a spaCy span representing a target concept. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the target is within |
Source code in medspacy/context/context_modifier.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 | |
modify(target)
Add target to the list of self._targets and increment self._num_targets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
Span
|
The spaCy span to add. |
required |
Source code in medspacy/context/context_modifier.py
321 322 323 324 325 326 327 328 329 | |
on_modifies(target)
If the ConTextRule used to define a ConTextModifier has an on_modifies callback function, evaluate and return
either True or False.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
Span
|
The spaCy span to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
The result of the |
Source code in medspacy/context/context_modifier.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | |
reduce_targets()
Reduces the number of targets to the n-closest targets based on the value of self.max_targets. If
self.max_targets is None, no pruning is done.
Source code in medspacy/context/context_modifier.py
331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 | |
serialized_representation()
Serialized Representation of the modifier
Source code in medspacy/context/context_modifier.py
365 366 367 368 369 370 371 372 373 374 375 376 377 | |
update_scope(span)
Changes the scope of self to be the given spaCy span.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
a spaCy Span which contains the scope which a modifier should cover. |
required |
Source code in medspacy/context/context_modifier.py
193 194 195 196 197 198 199 200 201 | |
context_rule
ConTextRule
Bases: BaseRule
A ConTextRule defines a ConText modifier. ConTextRules are rules which define which spans are extracted as modifiers and how they behave, such as the phrase to be matched, the category/semantic class, the direction of the modifier in the text, and what types of target spans can be modified.
Source code in medspacy/context/context_rule.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
__init__(literal, category, pattern=None, direction='BIDIRECTIONAL', on_match=None, on_modifies=None, allowed_types=None, excluded_types=None, max_scope=None, max_targets=None, terminated_by=None, metadata=None)
Creates a ConTextRule object.
The primary arguments of literal category, and direction define the span of text to be matched, the
semantic category, and the direction within the sentence in which the modifier operates.
Other arguments specify additional custom logic such as:
- Additional control over what text can be matched as a modifier (pattern and on_match)
- Which types of targets can be modified (allowed_types, excluded_types)
- The scope size and number of targets that a modifier can modify (max_targets, max_scope)
- Other logic for terminating a span or for allowing a modifier to modify a target (on_modifies,
terminated_by)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
literal
|
str
|
The string representation of a concept. If |
required |
category
|
str
|
The semantic class of the matched span. This corresponds to the |
required |
pattern
|
Optional[Union[str, List[Dict[str, str]]]]
|
A list or string to use as a spaCy pattern rather than |
None
|
direction
|
str
|
The directionality or action of a modifier. This defines which part of a sentence a modifier will include as its scope. Entities within the scope will be considered to be modified. Valid values are: - "FORWARD": Scope will begin after the end of a modifier and move to the right - "BACKWARD": Scope will begin before the beginning of a modifier and move to the left - "BIDIRECTIONAL": Scope will expand on either side of a modifier - "TERMINATE": A special direction to limit any other modifiers if this phrase is in its scope. Example: "no evidence of chf but there is pneumonia": "but" will prevent "no evidence of" from modifying "pneumonia" - "PSEUDO": A special direction which will not modify any targets. This can be used for differentiating superstrings of modifiers. Example: A modifier with literal="negative attitude" will prevent the phrase "negative" in "She has a negative attitude about her treatment" from being extracted as a modifier. |
'BIDIRECTIONAL'
|
on_match
|
Optional[Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]]
|
An optional callback function or other callable which takes 4 arguments: |
None
|
on_modifies
|
Optional[Callable[[Span, Span, Span], bool]]
|
Callback function to run when building an edge between a target and a modifier. This allows specifying custom logic for allowing or preventing certain modifiers from modifying certain targets. The callable should take 3 arguments: target: The spaCy Span from doc.ents (ie., 'Evidence of pneumonia') modifier: The spaCy Span covered in a resulting modifier (ie., 'no evidence of') span_between: The Span between the target and modifier in question. Should return either True or False. If returns False, then the modifier will not modify the target. |
None
|
allowed_types
|
Optional[Set[str]]
|
A collection of target labels to allow a modifier to modify. If None, will apply to any type not specifically excluded in excluded_types. Only one of allowed_types and excluded_types can be used. An error will be thrown if both are not None. |
None
|
excluded_types
|
Optional[Set[str]]
|
A collection of target labels which this modifier cannot modify. If None, will apply to all target types unless allowed_types is not None. |
None
|
max_scope
|
Optional[int]
|
A number of tokens to explicitly limit the size of the modifier's scope. If None, the scope will
include the entire sentence in the direction of |
None
|
max_targets
|
Optional[int]
|
The maximum number of targets which a modifier can modify. If None, will modify all targets in its scope. |
None
|
terminated_by
|
Optional[Set[str]]
|
An optional collection of other modifier categories which will terminate the scope of this modifier. If None, only "TERMINATE" will do this. Example: if a ConTextRule defining "positive for" has terminated_by={"NEGATED_EXISTENCE"}, then in the sentence "positive for flu, negative for RSV", the positive modifier will modify "flu" but will be terminated by "negative for" and will not modify "RSV". This helps prevent multiple conflicting modifiers from distributing too far across a sentence. |
None
|
metadata
|
Optional[Dict[Any, Any]]
|
Optional dictionary of any extra metadata. |
None
|
Source code in medspacy/context/context_rule.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
from_dict(rule_dict)
classmethod
Reads a dictionary into a ConTextRule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule_dict
|
The dictionary to convert. |
required |
Returns:
| Type | Description |
|---|---|
ConTextRule
|
The ConTextRule created from the dictionary. |
Source code in medspacy/context/context_rule.py
185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | |
from_json(filepath)
classmethod
Reads in a lexicon of modifiers from a JSON file under the key context_rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
The .json file containing modifier rules. Must contain |
required |
Returns:
| Type | Description |
|---|---|
List[ConTextRule]
|
A list of ConTextRules objects read from the JSON. |
Source code in medspacy/context/context_rule.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
to_dict()
Converts ConTextItems to a python dictionary. Used when writing context rules to a json file.
Returns:
| Type | Description |
|---|---|
|
The dictionary containing the ConTextRule info. |
Source code in medspacy/context/context_rule.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
to_json(context_rules, filepath)
classmethod
Writes ConTextItems to a json file.
Args: context_rules: a list of ContextRules that will be written to a file. filepath: the .json file to contain modifier rules
Source code in medspacy/context/context_rule.py
224 225 226 227 228 229 230 231 232 233 234 235 236 | |
util
This module will contain helper functions and classes for common clinical processing tasks which will be used in medspaCy's context implementation.
is_modified_by(span, modifier_label)
Check whether a span has a modifier of a specific type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to examine. |
required |
modifier_label
|
str
|
The type of modifier to check for. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether there is a modifier of |
Source code in medspacy/context/util.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | |
custom_tokenizer
create_medspacy_tokenizer(nlp)
Generates a custom tokenizer to augment the default spacy tokenizer for situations commonly seen in clinical text. This includes: * Punctuation infixes. For example, this allows the following examples to be more aggresively tokenized as : "Patient complains of c/o" -> [..., 'c', '/', 'o'] "chf+cp" -> ['chf', '+', 'cp'] @param nlp: Spacy language model
Source code in medspacy/custom_tokenizer.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
io
DbConnect
DbConnect is a wrapper for either a pyodbc or sqlite3 connection. It can then be passed into the DbReader and DbWriter classes to retrieve/store document data.
Source code in medspacy/io/db_connect.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
__init__(driver=None, server=None, db=None, user=None, pwd=None, conn=None)
Create a new DbConnect object. You can pass in either information for a pyodbc connection string or directly pass in a sqlite or pyodbc connection object.
If conn is None, all other arguments must be supplied. If conn is passed in, all other arguments will be ignored.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db
|
|
None
|
Source code in medspacy/io/db_connect.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
DbWriter
DbWriter is a utility class for writing structured data back to a database.
Source code in medspacy/io/db_writer.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | |
__init__(db_conn, destination_table, cols=None, col_types=None, doc_dtype='ents', create_table=False, drop_existing=False, write_batch_size=100)
Create a new DbWriter object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_conn
|
A medspacy.io.DbConnect object |
required | |
destination_table
|
The name of the table to write to |
required | |
cols
|
opt
|
The names of the columns of the destination table. These should align with attributes extracted by DocConsumer and stored in doc._.data. A set of default values can be accessed by:
|
None
|
col_types
|
opt
|
The sql data types of the table columns. They should correspond 1:1 with cols. A set of default values can be accesed by:
|
None
|
doc_dtype
|
The type of data from DocConsumer to write from a doc. Either ("ents", "section", "context", or "doc") |
'ents'
|
|
create_table
|
bool
|
Whether to create a table |
False
|
Source code in medspacy/io/db_writer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | |
write(docs)
Write a list of docs or doc to a database.
Source code in medspacy/io/db_writer.py
160 161 162 163 164 165 | |
write_doc(doc)
Write a doc to a database.
Source code in medspacy/io/db_writer.py
167 168 169 170 | |
write_docs(docs, batch_size=800)
write a list of docs to database through bulk insert
Source code in medspacy/io/db_writer.py
172 173 174 175 176 177 178 179 180 181 182 | |
DocConsumer
A DocConsumer object will consume a spacy doc and output rows based on a configuration provided by the user.
This component extracts structured information from a Doc. Information is stored in doc._.data, which is a
nested dictionary. The outer keys represent the data type of can one or more of:
- "ents": data about the spans in doc.ents such as the text, label,
context attributes, section information, or custom attributes
- "group": data about spans in a span group with the name span_group_attrs section text and category
- "context": data about entity-modifier pairs extracted by ConText
- "doc": a single doc-level representation. By default only doc.text is extracted, but other attributes may
be specified
Once processed, a doc's data can be accessed either by:
- doc._.data
- doc._.get_data(dtype=...)
- doc._.ent_data
- doc._.to_dataframe(dtype=...)
Source code in medspacy/io/doc_consumer.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 | |
__call__(doc)
Call the doc consumer on a doc and assign the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
The Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
|
The processed Doc. |
Source code in medspacy/io/doc_consumer.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
__init__(nlp, name='medspacy_doc_consumer', dtypes=('ents',), dtype_attrs=None, span_group_name='medspacy_spans')
Creates a new DocConsumer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nlp
|
A spaCy model |
required | |
dtypes
|
Tuple
|
Either a tuple of data types to collect or the string "all". Default ("ents",). Valid options are: "ents", "group", "section", "context", "doc". |
('ents',)
|
dtype_attrs
|
Dict
|
An optional dictionary mapping the data types in dtypes to a list of attributes. If None, will set defaults for each dtype. Attributes for "ents", "group", and "doc" may be customized be adding either native or custom attributes (i.e., ent._....) "context" and "section" are not customizable at this time. Default values for each dtype can be retrieved by the class method `DocConsumer.get_default_attrs() |
None
|
span_group_name
|
str
|
the name of the span group used when dtypes contains "group". At this time, only one span group is supported. |
'medspacy_spans'
|
Source code in medspacy/io/doc_consumer.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
_set_default_attrs()
Gets the default attributes.
Source code in medspacy/io/doc_consumer.py
156 157 158 159 160 | |
get_default_attrs(dtypes=None)
classmethod
Gets the default attributes available to each type specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dtypes
|
Optional[Tuple]
|
Optional tuple containing "ents", "group", "context", "section", or "doc". If None, all will be returned. |
None
|
Returns:
| Type | Description |
|---|---|
|
The attributes the doc consumer will output for each of the specified types in |
Source code in medspacy/io/doc_consumer.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
validate_section_attrs(attrs)
Validate that section attributes are either not specified or are valid attribute names.
Source code in medspacy/io/doc_consumer.py
162 163 164 165 166 167 168 169 170 171 172 173 | |
Pipeline
The Pipeline class executes a batch process of reading texts, processing them with a spaCy model, and writing the results back to a database.
Source code in medspacy/io/pipeline.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
__init__(nlp, reader, writer, name='medspacy_pipeline', dtype='ent')
Create a new Pipeline object. Args: reader: A DbReader object writer: A Dbwriter object nlp: A spaCy model dtype: The DocConsumer data type to write to a database. Default "ent Valid options are ("ent", "section", "context", "doc")
Source code in medspacy/io/pipeline.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | |
process()
Run a pipeline by reading a set of texts from a source table, processing them with nlp, and writing doc._.data back to the destination table.
Source code in medspacy/io/pipeline.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
db_connect
DbConnect
DbConnect is a wrapper for either a pyodbc or sqlite3 connection. It can then be passed into the DbReader and DbWriter classes to retrieve/store document data.
Source code in medspacy/io/db_connect.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
__init__(driver=None, server=None, db=None, user=None, pwd=None, conn=None)
Create a new DbConnect object. You can pass in either information for a pyodbc connection string or directly pass in a sqlite or pyodbc connection object.
If conn is None, all other arguments must be supplied. If conn is passed in, all other arguments will be ignored.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db
|
|
None
|
Source code in medspacy/io/db_connect.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | |
db_writer
DbWriter
DbWriter is a utility class for writing structured data back to a database.
Source code in medspacy/io/db_writer.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | |
__init__(db_conn, destination_table, cols=None, col_types=None, doc_dtype='ents', create_table=False, drop_existing=False, write_batch_size=100)
Create a new DbWriter object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
db_conn
|
A medspacy.io.DbConnect object |
required | |
destination_table
|
The name of the table to write to |
required | |
cols
|
opt
|
The names of the columns of the destination table. These should align with attributes extracted by DocConsumer and stored in doc._.data. A set of default values can be accessed by:
|
None
|
col_types
|
opt
|
The sql data types of the table columns. They should correspond 1:1 with cols. A set of default values can be accesed by:
|
None
|
doc_dtype
|
The type of data from DocConsumer to write from a doc. Either ("ents", "section", "context", or "doc") |
'ents'
|
|
create_table
|
bool
|
Whether to create a table |
False
|
Source code in medspacy/io/db_writer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | |
write(docs)
Write a list of docs or doc to a database.
Source code in medspacy/io/db_writer.py
160 161 162 163 164 165 | |
write_doc(doc)
Write a doc to a database.
Source code in medspacy/io/db_writer.py
167 168 169 170 | |
write_docs(docs, batch_size=800)
write a list of docs to database through bulk insert
Source code in medspacy/io/db_writer.py
172 173 174 175 176 177 178 179 180 181 182 | |
doc_consumer
DocConsumer
A DocConsumer object will consume a spacy doc and output rows based on a configuration provided by the user.
This component extracts structured information from a Doc. Information is stored in doc._.data, which is a
nested dictionary. The outer keys represent the data type of can one or more of:
- "ents": data about the spans in doc.ents such as the text, label,
context attributes, section information, or custom attributes
- "group": data about spans in a span group with the name span_group_attrs section text and category
- "context": data about entity-modifier pairs extracted by ConText
- "doc": a single doc-level representation. By default only doc.text is extracted, but other attributes may
be specified
Once processed, a doc's data can be accessed either by:
- doc._.data
- doc._.get_data(dtype=...)
- doc._.ent_data
- doc._.to_dataframe(dtype=...)
Source code in medspacy/io/doc_consumer.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 | |
__call__(doc)
Call the doc consumer on a doc and assign the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
The Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
|
The processed Doc. |
Source code in medspacy/io/doc_consumer.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
__init__(nlp, name='medspacy_doc_consumer', dtypes=('ents',), dtype_attrs=None, span_group_name='medspacy_spans')
Creates a new DocConsumer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nlp
|
A spaCy model |
required | |
dtypes
|
Tuple
|
Either a tuple of data types to collect or the string "all". Default ("ents",). Valid options are: "ents", "group", "section", "context", "doc". |
('ents',)
|
dtype_attrs
|
Dict
|
An optional dictionary mapping the data types in dtypes to a list of attributes. If None, will set defaults for each dtype. Attributes for "ents", "group", and "doc" may be customized be adding either native or custom attributes (i.e., ent._....) "context" and "section" are not customizable at this time. Default values for each dtype can be retrieved by the class method `DocConsumer.get_default_attrs() |
None
|
span_group_name
|
str
|
the name of the span group used when dtypes contains "group". At this time, only one span group is supported. |
'medspacy_spans'
|
Source code in medspacy/io/doc_consumer.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | |
_set_default_attrs()
Gets the default attributes.
Source code in medspacy/io/doc_consumer.py
156 157 158 159 160 | |
get_default_attrs(dtypes=None)
classmethod
Gets the default attributes available to each type specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dtypes
|
Optional[Tuple]
|
Optional tuple containing "ents", "group", "context", "section", or "doc". If None, all will be returned. |
None
|
Returns:
| Type | Description |
|---|---|
|
The attributes the doc consumer will output for each of the specified types in |
Source code in medspacy/io/doc_consumer.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
validate_section_attrs(attrs)
Validate that section attributes are either not specified or are valid attribute names.
Source code in medspacy/io/doc_consumer.py
162 163 164 165 166 167 168 169 170 171 172 173 | |
pipeline
Pipeline
The Pipeline class executes a batch process of reading texts, processing them with a spaCy model, and writing the results back to a database.
Source code in medspacy/io/pipeline.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
__init__(nlp, reader, writer, name='medspacy_pipeline', dtype='ent')
Create a new Pipeline object. Args: reader: A DbReader object writer: A Dbwriter object nlp: A spaCy model dtype: The DocConsumer data type to write to a database. Default "ent Valid options are ("ent", "section", "context", "doc")
Source code in medspacy/io/pipeline.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | |
process()
Run a pipeline by reading a set of texts from a source table, processing them with nlp, and writing doc._.data back to the destination table.
Source code in medspacy/io/pipeline.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
postprocess
PostprocessingPattern
PostprocessingPatterns are callable functions and equality values wrapped together that will create triggers in the later Postprocessor as part of PostprocessingRules.
Source code in medspacy/postprocess/postprocessing_pattern.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
__call__(ent)
Call the PostprocessingPattern on the span specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ent
|
Span
|
the span to process. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether calling |
Source code in medspacy/postprocess/postprocessing_pattern.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
__init__(condition, success_value=True, **kwargs)
A PostprocessingPattern defines a single condition to check against an entity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
condition
|
Callable
|
A function to call on an entity. If the result of the function call equals success_value, then the pattern passes. |
required |
success_value
|
Any
|
The value which should be returned by condition(ent) in order for the pattern to pass. Must have == defined for condition(ent) == success_value. |
True
|
kwargs
|
Optional keyword arguments to call with condition(ent, **kwargs). |
{}
|
Source code in medspacy/postprocess/postprocessing_pattern.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
PostprocessingRule
Source code in medspacy/postprocess/postprocessing_rule.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
__call__(ent, i, debug=False)
Iterate through all the rules in self.rules. If any pattern does not pass (ie., return True), then returns False. If they all pass, execute self.action and return True.
Source code in medspacy/postprocess/postprocessing_rule.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
__init__(patterns, action, name=None, description=None, span_group_name='medspacy_spans', **kwargs)
A PostprocessingRule checks conditions of a spaCy Span entity and executes some action if all rules are met.
patterns: A list of PostprocessingPatterns, each of which check a condition of an entity.
action: A function to call with the entity as an argument. This function should take the following arguments:
ent: The spacy span
i: The index of ent
input_span_type: "ents" or "group". Describes where to look for spans.
span_group_name: The name of the span group used when input_span_type is "group".
kwargs: Any additional keyword arguments for action.
name: Optional name of direction.
description: Optional description of the direction.
kwargs: Optional keyword arguments to send to action.
Source code in medspacy/postprocess/postprocessing_rule.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
Postprocessor
Source code in medspacy/postprocess/postprocessor.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
input_span_type
property
writable
The input source of entities for the component. Must be either "ents" corresponding to doc.ents or "group" for a spaCy span group.
Returns:
| Type | Description |
|---|---|
|
The input type, "ents" or "group". |
rules
property
Gets the rules.
Returns:
| Type | Description |
|---|---|
List[PostprocessingRule]
|
The list of PostprocessingRules available to the Postprocessor. |
span_group_name
property
writable
The name of the span group used by this component. If input_span_type is "group", calling this component will
use spans in the span group with this name.
Returns:
| Type | Description |
|---|---|
str
|
The span group name. |
__call__(doc)
Calls the Postprocessor on a spaCy doc. This will call each PostprocessingRule on the doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
|
The processed Doc. |
Source code in medspacy/postprocess/postprocessor.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
add(rules)
Adds PostprocessingRules to the Postprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Union[PostprocessingRule, Iterable[PostprocessingRule]]
|
A single PostprocessingRule or a collection of PostprocessingRules to add to the Postprocessor. |
required |
Source code in medspacy/postprocess/postprocessor.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
postprocessing_functions
This module contains some simple functions that can be used as action or condition functions for postprocessing rules.
ent_contains(ent, target, regex=True)
Check if an entity occurs in the same sentence as another span of text. Case-insensitive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ent
|
Span
|
The span to check. |
required |
target
|
Union[str, Iterable[str]]
|
A string or a collection of strings that will be searched inside |
required |
regex
|
bool
|
If the |
True
|
Returns:
| Type | Description |
|---|---|
bool
|
Whether the target is contained in the ent. |
Source code in medspacy/postprocess/postprocessing_functions.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
is_family(span)
Returns whether a span is marked as family.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the specified span has span._.is_family set to True. |
Source code in medspacy/postprocess/postprocessing_functions.py
63 64 65 66 67 68 69 70 71 72 73 | |
is_followed_by(ent, target, window=1)
Checks if an entity is followed by a target word within a certain window. If any phrases in target are more than one token long, this may not capture it if window is smaller than the number of tokens. Case-insensitive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ent
|
Span
|
The span to check. |
required |
target
|
Union[str, Iterable[str]]
|
A string or a collection of strings that will be searched for in the text following |
required |
window
|
int
|
The number of tokens to search for |
1
|
Returns:
| Type | Description |
|---|---|
bool
|
Whether the entity specified is followed by a target. |
Source code in medspacy/postprocess/postprocessing_functions.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | |
is_historical(span)
Returns whether a span is marked as historical.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the specified span has span._.is_historical set to True. |
Source code in medspacy/postprocess/postprocessing_functions.py
37 38 39 40 41 42 43 44 45 46 47 | |
is_hypothetical(span)
Returns whether a span is marked as hypothetical.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the specified span has span._.is_hypothetical set to True. |
Source code in medspacy/postprocess/postprocessing_functions.py
50 51 52 53 54 55 56 57 58 59 60 | |
is_modified_by_category(span, category)
Returns whether a span is modified by a ConTextModifier of that type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to check. |
required |
category
|
str
|
The category to check whether a ConTextModifier of that type modifies the span. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the specified span has the specified modifier type. |
Source code in medspacy/postprocess/postprocessing_functions.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | |
is_modified_by_text(span, target, regex=True)
Returns whether a span is modified by a ConTextModifier with the specified text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to check. |
required |
target
|
Union[str, Iterable[str]]
|
The category to check whether a ConTextModifier with this text modifies the span. |
required |
regex
|
bool
|
If the |
True
|
Returns:
| Type | Description |
|---|---|
bool
|
Whether the specified span has the specified modifier type. |
Source code in medspacy/postprocess/postprocessing_functions.py
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
is_negated(span)
Returns whether a span is marked as negated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the specified span has span._.is_negated set to True. |
Source code in medspacy/postprocess/postprocessing_functions.py
11 12 13 14 15 16 17 18 19 20 21 | |
is_preceded_by(ent, target, window=1)
Checks if an entity is preceded by a target word within a certain window. If any phrases in target are more than one token long, this may not capture it if window is smaller than the number of tokens. Case-insensitive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ent
|
Span
|
The span to check. |
required |
target
|
Union[str, Iterable[str]]
|
A string or a collection of strings that will be searched for in the text preceding |
required |
window
|
int
|
The number of tokens to search for |
1
|
Returns:
| Type | Description |
|---|---|
bool
|
Whether the entity specified is preceded by a target. |
Source code in medspacy/postprocess/postprocessing_functions.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | |
is_uncertain(span)
Returns whether a span is marked as uncertain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
span
|
Span
|
The span to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the specified span has span._.is_uncertain set to True. |
Source code in medspacy/postprocess/postprocessing_functions.py
24 25 26 27 28 29 30 31 32 33 34 | |
remove_ent(ent, i, input_type='ents', span_group_name='medspacy_spans')
Remove an entity at position [i] from doc.ents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ent
|
Span
|
The entity to remove. |
required |
i
|
int
|
The index of |
required |
input_type
|
Literal['ents', 'group']
|
The source of the entity, either "ents" or "group". |
'ents'
|
span_group_name
|
str
|
If |
'medspacy_spans'
|
Source code in medspacy/postprocess/postprocessing_functions.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | |
sentence_contains(ent, target, regex=True)
Check if an entity occurs in the same sentence as another span of text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ent
|
Span
|
The span to check. |
required |
target
|
Union[str, Iterable[str]]
|
A string or a collection of strings that will be searched for in the text of the sentence containing
|
required |
regex
|
If the |
True
|
Source code in medspacy/postprocess/postprocessing_functions.py
180 181 182 183 184 185 186 187 188 189 190 | |
set_family(ent, i, value=True)
Set the value of ent._.is_family to value.
Source code in medspacy/postprocess/postprocessing_functions.py
279 280 281 | |
set_historical(ent, i, value=True)
Set the value of ent._.is_historical to value.
Source code in medspacy/postprocess/postprocessing_functions.py
269 270 271 | |
set_hypothetical(ent, i, value=True)
Set the value of ent._.is_hypothetical to value.
Source code in medspacy/postprocess/postprocessing_functions.py
274 275 276 | |
set_label(ent, i, input_type='ents', span_group_name='medspacy_spans', **kwargs)
Creates a copy of the entity with a new label.
WARNING: This is not fully safe, as spaCy does not allow modifying the label of a span. Instead, this creates a new copy and attempts to copy existing attributes, but this is not totally reliable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ent
|
The entity to MODIFY. |
required | |
i
|
The index of |
required | |
input_type
|
Literal['ents', 'group']
|
The source of the entity, either "ents" or "group". |
'ents'
|
span_group_name
|
str
|
If |
'medspacy_spans'
|
Source code in medspacy/postprocess/postprocessing_functions.py
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 | |
set_negated(ent, i, value=True)
Set the value of ent._.is_negated to value.
Source code in medspacy/postprocess/postprocessing_functions.py
259 260 261 | |
set_uncertain(ent, i, value=True)
Set the value of ent._.is_uncertain to value.
Source code in medspacy/postprocess/postprocessing_functions.py
264 265 266 | |
postprocessing_pattern
PostprocessingPattern
PostprocessingPatterns are callable functions and equality values wrapped together that will create triggers in the later Postprocessor as part of PostprocessingRules.
Source code in medspacy/postprocess/postprocessing_pattern.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
__call__(ent)
Call the PostprocessingPattern on the span specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ent
|
Span
|
the span to process. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether calling |
Source code in medspacy/postprocess/postprocessing_pattern.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
__init__(condition, success_value=True, **kwargs)
A PostprocessingPattern defines a single condition to check against an entity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
condition
|
Callable
|
A function to call on an entity. If the result of the function call equals success_value, then the pattern passes. |
required |
success_value
|
Any
|
The value which should be returned by condition(ent) in order for the pattern to pass. Must have == defined for condition(ent) == success_value. |
True
|
kwargs
|
Optional keyword arguments to call with condition(ent, **kwargs). |
{}
|
Source code in medspacy/postprocess/postprocessing_pattern.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
postprocessing_rule
PostprocessingRule
Source code in medspacy/postprocess/postprocessing_rule.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | |
__call__(ent, i, debug=False)
Iterate through all the rules in self.rules. If any pattern does not pass (ie., return True), then returns False. If they all pass, execute self.action and return True.
Source code in medspacy/postprocess/postprocessing_rule.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | |
__init__(patterns, action, name=None, description=None, span_group_name='medspacy_spans', **kwargs)
A PostprocessingRule checks conditions of a spaCy Span entity and executes some action if all rules are met.
patterns: A list of PostprocessingPatterns, each of which check a condition of an entity.
action: A function to call with the entity as an argument. This function should take the following arguments:
ent: The spacy span
i: The index of ent
input_span_type: "ents" or "group". Describes where to look for spans.
span_group_name: The name of the span group used when input_span_type is "group".
kwargs: Any additional keyword arguments for action.
name: Optional name of direction.
description: Optional description of the direction.
kwargs: Optional keyword arguments to send to action.
Source code in medspacy/postprocess/postprocessing_rule.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | |
postprocessor
Postprocessor
Source code in medspacy/postprocess/postprocessor.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
input_span_type
property
writable
The input source of entities for the component. Must be either "ents" corresponding to doc.ents or "group" for a spaCy span group.
Returns:
| Type | Description |
|---|---|
|
The input type, "ents" or "group". |
rules
property
Gets the rules.
Returns:
| Type | Description |
|---|---|
List[PostprocessingRule]
|
The list of PostprocessingRules available to the Postprocessor. |
span_group_name
property
writable
The name of the span group used by this component. If input_span_type is "group", calling this component will
use spans in the span group with this name.
Returns:
| Type | Description |
|---|---|
str
|
The span group name. |
__call__(doc)
Calls the Postprocessor on a spaCy doc. This will call each PostprocessingRule on the doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
|
The processed Doc. |
Source code in medspacy/postprocess/postprocessor.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
add(rules)
Adds PostprocessingRules to the Postprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Union[PostprocessingRule, Iterable[PostprocessingRule]]
|
A single PostprocessingRule or a collection of PostprocessingRules to add to the Postprocessor. |
required |
Source code in medspacy/postprocess/postprocessor.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
preprocess
PreprocessingRule
This is a rule for handling preprocessing in the medspaCy Preprocessor. This class does not inherit from BaseRule, as it cannot be used in a spaCy pipeline. The Preprocessor and PreprocessingRules are designed to preprocess text before entering a spaCy pipeline to allow for destructive preprocessing, such as stripping or replacing text.
Source code in medspacy/preprocess/preprocessing_rule.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
__call__(text)
Apply a preprocessing direction. If the callback attribute of direction is None, then it will return a string using the direction sub method. If callback is not None, then callback function will be executed using the resulting match as an argument.
Source code in medspacy/preprocess/preprocessing_rule.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
__init__(pattern, repl='', flags=re.IGNORECASE, callback=None, desc=None)
Creates a new PreprocessingRule. Preprocessing rules define spans of text to be removed and optionally replaced from the text underneath a doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str
|
The text pattern to match and replace in a doc. Must be a string, which will be compiled as a regular expression. The patterns will lead to re.Match objects. |
required |
repl
|
Union[str, Callable[[Match], Any]]
|
The text to replace a matched string with. By default, repl is an empty string. If repl is a function, sends function to re.sub and it will be called on each Match object. More info here https://docs.python.org/3/library/re.html#re.sub |
''
|
flags
|
RegexFlag
|
A regex compilation flag. Default is re.IGNORECASE. |
IGNORECASE
|
callback
|
Optional[Callable[[str, Match], str]]
|
An optional callable which takes the raw text and a Match and returns the new copy of the text, rather than just replacing strings for the matched text. This can allow larger text manipulation, such as stripping out an entire section based on a header. |
None
|
desc
|
Optional[str]
|
An optional description. |
None
|
Source code in medspacy/preprocess/preprocessing_rule.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
from_dict(d)
classmethod
Creates a PreprocessingRule from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
Dict
|
The dict to read. |
required |
Returns:
| Type | Description |
|---|---|
PreprocessingRule
|
A PreprocessingRule from the dictionary. |
Source code in medspacy/preprocess/preprocessing_rule.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
from_json(filepath)
classmethod
Read a JSON file containing PreprocessingRule data at the key "preprocessing_rules".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
The filepath of the JSON to read. |
required |
Returns:
| Type | Description |
|---|---|
|
A list of PreprocessingRules from the JSON file. |
Source code in medspacy/preprocess/preprocessing_rule.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
to_dict()
Writes a preprocessing rule to a dictionary. Useful for writing all rules to a json later.
Returns:
| Type | Description |
|---|---|
|
A dictionary containing the PreprocessingRule's data. |
Source code in medspacy/preprocess/preprocessing_rule.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
Preprocessor
This is the medspacy Preprocessor class. It is designed as a wrapper for destructive preprocessing rules such as stripping or replacing text in a document before the text enters a spaCy pipeline.
This is NOT a spaCy component and cannot be added to a spaCy pipeline. Please use the preprocessor before
calling nlp("your text here"). SpaCy only allows for non-destructive processing on the text, but that is not
always advisable for every project, so this enables destructive preprocessing when required.
Source code in medspacy/preprocess/preprocessor.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
__call__(text, tokenize=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
|
required | |
tokenize
|
|
True
|
Returns:
Source code in medspacy/preprocess/preprocessor.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
__init__(tokenizer)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokenizer
|
|
required |
Source code in medspacy/preprocess/preprocessor.py
18 19 20 21 22 23 24 25 | |
add(rules)
Adds a PreprocessingRule or collection of PreprocessingRules to the Preprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Union[PreprocessingRule, Iterable[PreprocessingRule]]
|
A single PreprocessingRule or a collection of PreprocessingRules to add. |
required |
Source code in medspacy/preprocess/preprocessor.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
preprocessing_rule
PreprocessingRule
This is a rule for handling preprocessing in the medspaCy Preprocessor. This class does not inherit from BaseRule, as it cannot be used in a spaCy pipeline. The Preprocessor and PreprocessingRules are designed to preprocess text before entering a spaCy pipeline to allow for destructive preprocessing, such as stripping or replacing text.
Source code in medspacy/preprocess/preprocessing_rule.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
__call__(text)
Apply a preprocessing direction. If the callback attribute of direction is None, then it will return a string using the direction sub method. If callback is not None, then callback function will be executed using the resulting match as an argument.
Source code in medspacy/preprocess/preprocessing_rule.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
__init__(pattern, repl='', flags=re.IGNORECASE, callback=None, desc=None)
Creates a new PreprocessingRule. Preprocessing rules define spans of text to be removed and optionally replaced from the text underneath a doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str
|
The text pattern to match and replace in a doc. Must be a string, which will be compiled as a regular expression. The patterns will lead to re.Match objects. |
required |
repl
|
Union[str, Callable[[Match], Any]]
|
The text to replace a matched string with. By default, repl is an empty string. If repl is a function, sends function to re.sub and it will be called on each Match object. More info here https://docs.python.org/3/library/re.html#re.sub |
''
|
flags
|
RegexFlag
|
A regex compilation flag. Default is re.IGNORECASE. |
IGNORECASE
|
callback
|
Optional[Callable[[str, Match], str]]
|
An optional callable which takes the raw text and a Match and returns the new copy of the text, rather than just replacing strings for the matched text. This can allow larger text manipulation, such as stripping out an entire section based on a header. |
None
|
desc
|
Optional[str]
|
An optional description. |
None
|
Source code in medspacy/preprocess/preprocessing_rule.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
from_dict(d)
classmethod
Creates a PreprocessingRule from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
Dict
|
The dict to read. |
required |
Returns:
| Type | Description |
|---|---|
PreprocessingRule
|
A PreprocessingRule from the dictionary. |
Source code in medspacy/preprocess/preprocessing_rule.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
from_json(filepath)
classmethod
Read a JSON file containing PreprocessingRule data at the key "preprocessing_rules".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
The filepath of the JSON to read. |
required |
Returns:
| Type | Description |
|---|---|
|
A list of PreprocessingRules from the JSON file. |
Source code in medspacy/preprocess/preprocessing_rule.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
to_dict()
Writes a preprocessing rule to a dictionary. Useful for writing all rules to a json later.
Returns:
| Type | Description |
|---|---|
|
A dictionary containing the PreprocessingRule's data. |
Source code in medspacy/preprocess/preprocessing_rule.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |
preprocessor
Preprocessor
This is the medspacy Preprocessor class. It is designed as a wrapper for destructive preprocessing rules such as stripping or replacing text in a document before the text enters a spaCy pipeline.
This is NOT a spaCy component and cannot be added to a spaCy pipeline. Please use the preprocessor before
calling nlp("your text here"). SpaCy only allows for non-destructive processing on the text, but that is not
always advisable for every project, so this enables destructive preprocessing when required.
Source code in medspacy/preprocess/preprocessor.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
__call__(text, tokenize=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
|
required | |
tokenize
|
|
True
|
Returns:
Source code in medspacy/preprocess/preprocessor.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
__init__(tokenizer)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokenizer
|
|
required |
Source code in medspacy/preprocess/preprocessor.py
18 19 20 21 22 23 24 25 | |
add(rules)
Adds a PreprocessingRule or collection of PreprocessingRules to the Preprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Union[PreprocessingRule, Iterable[PreprocessingRule]]
|
A single PreprocessingRule or a collection of PreprocessingRules to add. |
required |
Source code in medspacy/preprocess/preprocessor.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
section_detection
Section
Bases: object
Section is the object that stores the result of processing by the Sectionizer class. A Section contains information describing the section's category, title span, body span, parent, and the rule that created it.
Section category is equivalent to label_ in a basic spaCy entity. It is a normalized name for the section type
determined on initialization, either created manually or through the Sectionizer pipeline component.
Section title, defined with title_start, title_end, and title_span represents the section title or header
matched with the rule. In the text "Past medical history: stroke and high blood pressure", "Past medical history:"
would be the title.
Section body is defined with body_start, body_end, and body_span. It represents the text between the end of
the current section's title and the start of the title for the next Section or when scope is set in the rule or by
the Sectionizer. In the text "Past medical history: stroke and high blood pressure", "stroke and high blood
pressure" would be the body.
Parent is a string that represents the conceptual "parent" section in a section->subsection->subsubsection hierarchy. Candidates are determined by category in the rule and matched at runtime.
Source code in medspacy/section_detection/section.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
body_span
property
Gets the span of the section body.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start and end indexes of the section body. |
section_span
property
Gets the span of the entire section, from title start to body end.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start index of the section title and the end index of the section body. |
title_span
property
Gets the span of the section title.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start and end indexes of the section title. |
__init__(category, title_start, title_end, body_start, body_end, parent=None, rule=None)
Create a new Section object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
Union[str, None]
|
A normalized name for the section. Equivalent to |
required |
title_start
|
int
|
Index of the first token of the section title. |
required |
title_end
|
int
|
Index of the last token of the section title. |
required |
body_start
|
int
|
Index of the first token of the section body. |
required |
body_end
|
int
|
Index of the last token of the section body. |
required |
parent
|
Optional[str]
|
The category of the parent section. |
None
|
rule
|
Optional[SectionRule]
|
The SectionRule that generated the section. |
None
|
Source code in medspacy/section_detection/section.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
from_serialized_representation(serialized_representation)
classmethod
Load the section from a json-serialized form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
serialized_representation
|
Dict[str, str]
|
The dictionary form of the section object to load. |
required |
Returns:
| Type | Description |
|---|---|
|
A Section object containing the data from the dictionary provided. |
Source code in medspacy/section_detection/section.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
serialized_representation()
Serialize the Section.
Returns:
| Type | Description |
|---|---|
|
A json-serialized representation of the section. |
Source code in medspacy/section_detection/section.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
SectionRule
Bases: BaseRule
SectionRule defines rules for extracting entities from text using the Sectionizer.
Source code in medspacy/section_detection/section_rule.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
__init__(literal, category, pattern=None, on_match=None, max_scope=None, parents=None, parent_required=False, metadata=None)
Class for defining rules for extracting entities from text using TargetMatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
literal
|
str
|
The string representation of a concept. If |
required |
category
|
str
|
The semantic class of the matched span. This corresponds to the |
required |
pattern
|
Optional[Union[List[Dict[str, str]], str]]
|
A list or string to use as a spaCy pattern rather than |
None
|
on_match
|
Optional[Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]]
|
An optional callback function or other callable which takes 4 arguments: |
None
|
max_scope
|
Optional[int]
|
A number of tokens to explicitly limit the size of a section body. If None, the scope will include the entire doc up until either the next section header or the end of the doc. This variable can also be set at a global level as `Sectionizer(nlp, max_scope=...), but if the attribute is set here, the rule scope will take precedence. If not None, this will be the number of tokens following the matched section header Example: In the text "Past Medical History: Pt has hx of pneumonia", SectionRule("Past Medical History:", "pmh", max_scope=None) will include the entire doc, but SectionRule("Past Medical History:", "pmh", max_scope=2) will limit the section to be "Past Medical History: Pt has" This can be useful for limiting certain sections which are known to be short or allowing others to be longer than the regular global max_scope. |
None
|
parents
|
Optional[List[str]]
|
A list of candidate parents for determining subsections |
None
|
parent_required
|
bool
|
Whether a parent is required for the section to exist in the final output. If true and no parent is identified, the section will be removed. |
False
|
metadata
|
Optional[Dict[Any, Any]]
|
Optional dictionary of any extra metadata. |
None
|
Source code in medspacy/section_detection/section_rule.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
from_dict(rule_dict)
classmethod
Reads a dictionary into a SectionRule list. Used when reading from a json file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule_dict
|
the dictionary to convert |
required |
Returns:
| Name | Type | Description |
|---|---|---|
item |
the SectionRule created from the dictionary |
Source code in medspacy/section_detection/section_rule.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
from_json(filepath)
classmethod
Read in a lexicon of modifiers from a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
the .json file containing modifier rules |
required |
Returns:
| Name | Type | Description |
|---|---|---|
section_rules |
List[SectionRule]
|
a list of SectionRule objects |
Source code in medspacy/section_detection/section_rule.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
to_dict()
Converts TargetRules to a python dictionary. Used when writing section rules to a json file.
Returns:
| Name | Type | Description |
|---|---|---|
rule_dict |
the dictionary containing the TargetRule info. |
Source code in medspacy/section_detection/section_rule.py
123 124 125 126 127 128 129 130 131 132 133 134 135 | |
Sectionizer
The Sectionizer will search for spans in the text which match section header rules, such as 'Past Medical History:'. Sections will be represented in custom attributes as: category: A normalized title of the section. Example: 'past_medical_history' section_title: The Span of the doc which was matched as a section header. Example: 'Past Medical History:' section_span: The entire section of the note, starting with section_header and up until the end of the section, which will be either the start of the next section header of some pre-specified scope. Example: 'Past Medical History: Type II DM'
Section attributes will be registered for each Doc, Span, and Token in the following attributes: Doc..sections: A list of namedtuples of type Section with 4 elements: - section_title - section_header - section_parent - section_span. A Doc will also have attributes corresponding to lists of each (ie., Doc..section_titles, Doc..section_headers, Doc..section_parents, Doc..section_list) (Span|Token)..section_title (Span|Token)..section_header (Span|Token)..section_parent (Span|Token)._.section_span
Source code in medspacy/section_detection/sectionizer.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 | |
input_span_type
property
writable
The input source of entities for the component. Must be either "ents" corresponding to doc.ents or "group" for a spaCy span group.
Returns:
| Type | Description |
|---|---|
|
The input type, "ents" or "group". |
rules
property
Gets list of rules associated with the Sectionizer.
Returns:
| Type | Description |
|---|---|
List[SectionRule]
|
The list of SectionRules associated with the Sectionizer. |
section_categories
property
Gets a list of categories used in the Sectionizer.
Returns:
| Type | Description |
|---|---|
Set[str]
|
The list of all section categories available to the Sectionizer. |
span_group_name
property
writable
The name of the span group used by this component. If input_type is "group", calling this component will
use spans in the span group with this name.
Returns:
| Type | Description |
|---|---|
str
|
The span group name. |
__call__(doc)
Call the Sectionizer on a spaCy doc. Sectionizer will identify sections using provided rules, then evaluate any section hierarchy as needed, create section spans, and modify attributes on existing spans based on the sections the entities spans in.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
Doc
|
The processed spaCy Doc. |
Source code in medspacy/section_detection/sectionizer.py
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 | |
__init__(nlp, name='medspacy_sectionizer', rules='default', language_code='en', max_section_length=None, phrase_matcher_attr='LOWER', require_start_line=False, require_end_line=False, newline_pattern='[\\n\\r]+[\\s]*$', input_span_type='ents', span_group_name='medspacy_spans', span_attrs='default', apply_sentence_boundary=False)
Create a new Sectionizer component.
Args:
nlp: A SpaCy Language object.
name: The name of the component.
rules: The rules to load. Default is "default", loads rules packaged with medspaCy that are derived from
SecTag, MIMIC-III, and practical refinement at the US Department of Veterans Affairs. If None, no rules
are loaded. Otherwise, must be a path to a json file containing rules. Add SectionRules directly through
`Sectionizer.add`.
language_code: Language code to use (ISO code) as a default for loading resources. See documentation
and also the /resources directory to see which resources might be available in each language.
Default is "en" for English.
max_section_length: Optional argument specifying the maximum number of tokens following a section header
which can be included in a section body. This can be useful if you think your section rules are
incomplete and want to prevent sections from running too long in the note. Default is None, meaning that
the scope of a section will be until either the next section header or the end of the document.
phrase_matcher_attr: The token attribute to use for PhraseMatcher for rules where `pattern` is None. Default
is 'LOWER'.
require_start_line: Optionally require a section header to start on a new line. Default False.
require_end_line: Optionally require a section header to end with a new line. Default False.
newline_pattern: Regular expression to match the new line either preceding or following a header
if either require_start_line or require_end_line are True. Default is r"[
]+[\s]*$"
span_attrs: The optional span attributes to modify. Default option "default" uses attributes in
DEFAULT_ATTRIBUTES. If a dictionary of custom attributes, format is a dictionary mapping section
categories to a dictionary containing the attribute name and the value to set the attribute to when a
span is contained in a section of that category. Custom attributes must be assigned with
Span.set_extension before creating the Sectionizer. If None, sectionizer will not modify span
attributes.
input_span_type: "ents" or "group". Where to look for spans when modifying attributes of spans
contained in a section if span_attrs is not None. "ents" will modify attributes of spans in doc.ents.
"group" will modify attributes of spans in the span group specified by span_group_name.
span_group_name: The name of the span group used when input_span_type is "group". Default is
"medspacy_spans".
apply_sentence_boundary: Optionally end sentence before and after section header boundary. This ensures
the section header is considered its own sentence.
Source code in medspacy/section_detection/sectionizer.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
add(rules)
Adds SectionRules to the Sectionizer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
A single SectionRule or a collection of SectionRules to add to the Sectionizer. |
required |
Source code in medspacy/section_detection/sectionizer.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | |
filter_end_lines(doc, matches)
Filter a list of matches to only contain spans where the start token is followed by a new line.
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int]]
|
A list of match tuples (match_id, start, end) that meet the filter criteria. |
Source code in medspacy/section_detection/sectionizer.py
503 504 505 506 507 508 509 510 511 512 513 514 | |
filter_start_lines(doc, matches)
Filter a list of matches to only contain spans where the start token is the beginning of a new line.
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int]]
|
A list of match tuples (match_id, start, end) that meet the filter criteria. |
Source code in medspacy/section_detection/sectionizer.py
490 491 492 493 494 495 496 497 498 499 500 501 | |
register_default_attributes()
classmethod
Register the default values for the Span attributes defined in DEFAULT_ATTRIBUTES.
Source code in medspacy/section_detection/sectionizer.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
set_assertion_attributes(spans)
Add Span-level attributes to entities based on which section they occur in.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spans
|
Iterable[Span]
|
the spans to modify. |
required |
Source code in medspacy/section_detection/sectionizer.py
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 | |
set_parent_sections(sections)
Determine the legal parent-child section relationships from the list of in-order sections of a document and the possible parents of each section as specified during direction creation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sections
|
List[Tuple[int, int, int]]
|
a list of spacy match tuples found in the doc |
required |
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int, int]]
|
A list of tuples (match_id, start, end, parent_idx) where the first three indices are the same as the input |
List[Tuple[int, int, int, int]]
|
and the added parent_idx represents the index in the list that corresponds to the parent section. Might be a |
List[Tuple[int, int, int, int]]
|
smaller list than the input due to pruning with |
Source code in medspacy/section_detection/sectionizer.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | |
section
Section
Bases: object
Section is the object that stores the result of processing by the Sectionizer class. A Section contains information describing the section's category, title span, body span, parent, and the rule that created it.
Section category is equivalent to label_ in a basic spaCy entity. It is a normalized name for the section type
determined on initialization, either created manually or through the Sectionizer pipeline component.
Section title, defined with title_start, title_end, and title_span represents the section title or header
matched with the rule. In the text "Past medical history: stroke and high blood pressure", "Past medical history:"
would be the title.
Section body is defined with body_start, body_end, and body_span. It represents the text between the end of
the current section's title and the start of the title for the next Section or when scope is set in the rule or by
the Sectionizer. In the text "Past medical history: stroke and high blood pressure", "stroke and high blood
pressure" would be the body.
Parent is a string that represents the conceptual "parent" section in a section->subsection->subsubsection hierarchy. Candidates are determined by category in the rule and matched at runtime.
Source code in medspacy/section_detection/section.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
body_span
property
Gets the span of the section body.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start and end indexes of the section body. |
section_span
property
Gets the span of the entire section, from title start to body end.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start index of the section title and the end index of the section body. |
title_span
property
Gets the span of the section title.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start and end indexes of the section title. |
__init__(category, title_start, title_end, body_start, body_end, parent=None, rule=None)
Create a new Section object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
Union[str, None]
|
A normalized name for the section. Equivalent to |
required |
title_start
|
int
|
Index of the first token of the section title. |
required |
title_end
|
int
|
Index of the last token of the section title. |
required |
body_start
|
int
|
Index of the first token of the section body. |
required |
body_end
|
int
|
Index of the last token of the section body. |
required |
parent
|
Optional[str]
|
The category of the parent section. |
None
|
rule
|
Optional[SectionRule]
|
The SectionRule that generated the section. |
None
|
Source code in medspacy/section_detection/section.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
from_serialized_representation(serialized_representation)
classmethod
Load the section from a json-serialized form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
serialized_representation
|
Dict[str, str]
|
The dictionary form of the section object to load. |
required |
Returns:
| Type | Description |
|---|---|
|
A Section object containing the data from the dictionary provided. |
Source code in medspacy/section_detection/section.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
serialized_representation()
Serialize the Section.
Returns:
| Type | Description |
|---|---|
|
A json-serialized representation of the section. |
Source code in medspacy/section_detection/section.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |
section_rule
SectionRule
Bases: BaseRule
SectionRule defines rules for extracting entities from text using the Sectionizer.
Source code in medspacy/section_detection/section_rule.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | |
__init__(literal, category, pattern=None, on_match=None, max_scope=None, parents=None, parent_required=False, metadata=None)
Class for defining rules for extracting entities from text using TargetMatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
literal
|
str
|
The string representation of a concept. If |
required |
category
|
str
|
The semantic class of the matched span. This corresponds to the |
required |
pattern
|
Optional[Union[List[Dict[str, str]], str]]
|
A list or string to use as a spaCy pattern rather than |
None
|
on_match
|
Optional[Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]]
|
An optional callback function or other callable which takes 4 arguments: |
None
|
max_scope
|
Optional[int]
|
A number of tokens to explicitly limit the size of a section body. If None, the scope will include the entire doc up until either the next section header or the end of the doc. This variable can also be set at a global level as `Sectionizer(nlp, max_scope=...), but if the attribute is set here, the rule scope will take precedence. If not None, this will be the number of tokens following the matched section header Example: In the text "Past Medical History: Pt has hx of pneumonia", SectionRule("Past Medical History:", "pmh", max_scope=None) will include the entire doc, but SectionRule("Past Medical History:", "pmh", max_scope=2) will limit the section to be "Past Medical History: Pt has" This can be useful for limiting certain sections which are known to be short or allowing others to be longer than the regular global max_scope. |
None
|
parents
|
Optional[List[str]]
|
A list of candidate parents for determining subsections |
None
|
parent_required
|
bool
|
Whether a parent is required for the section to exist in the final output. If true and no parent is identified, the section will be removed. |
False
|
metadata
|
Optional[Dict[Any, Any]]
|
Optional dictionary of any extra metadata. |
None
|
Source code in medspacy/section_detection/section_rule.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
from_dict(rule_dict)
classmethod
Reads a dictionary into a SectionRule list. Used when reading from a json file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule_dict
|
the dictionary to convert |
required |
Returns:
| Name | Type | Description |
|---|---|---|
item |
the SectionRule created from the dictionary |
Source code in medspacy/section_detection/section_rule.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
from_json(filepath)
classmethod
Read in a lexicon of modifiers from a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
the .json file containing modifier rules |
required |
Returns:
| Name | Type | Description |
|---|---|---|
section_rules |
List[SectionRule]
|
a list of SectionRule objects |
Source code in medspacy/section_detection/section_rule.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |
to_dict()
Converts TargetRules to a python dictionary. Used when writing section rules to a json file.
Returns:
| Name | Type | Description |
|---|---|---|
rule_dict |
the dictionary containing the TargetRule info. |
Source code in medspacy/section_detection/section_rule.py
123 124 125 126 127 128 129 130 131 132 133 134 135 | |
sectionizer
Sectionizer
The Sectionizer will search for spans in the text which match section header rules, such as 'Past Medical History:'. Sections will be represented in custom attributes as: category: A normalized title of the section. Example: 'past_medical_history' section_title: The Span of the doc which was matched as a section header. Example: 'Past Medical History:' section_span: The entire section of the note, starting with section_header and up until the end of the section, which will be either the start of the next section header of some pre-specified scope. Example: 'Past Medical History: Type II DM'
Section attributes will be registered for each Doc, Span, and Token in the following attributes: Doc..sections: A list of namedtuples of type Section with 4 elements: - section_title - section_header - section_parent - section_span. A Doc will also have attributes corresponding to lists of each (ie., Doc..section_titles, Doc..section_headers, Doc..section_parents, Doc..section_list) (Span|Token)..section_title (Span|Token)..section_header (Span|Token)..section_parent (Span|Token)._.section_span
Source code in medspacy/section_detection/sectionizer.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 | |
input_span_type
property
writable
The input source of entities for the component. Must be either "ents" corresponding to doc.ents or "group" for a spaCy span group.
Returns:
| Type | Description |
|---|---|
|
The input type, "ents" or "group". |
rules
property
Gets list of rules associated with the Sectionizer.
Returns:
| Type | Description |
|---|---|
List[SectionRule]
|
The list of SectionRules associated with the Sectionizer. |
section_categories
property
Gets a list of categories used in the Sectionizer.
Returns:
| Type | Description |
|---|---|
Set[str]
|
The list of all section categories available to the Sectionizer. |
span_group_name
property
writable
The name of the span group used by this component. If input_type is "group", calling this component will
use spans in the span group with this name.
Returns:
| Type | Description |
|---|---|
str
|
The span group name. |
__call__(doc)
Call the Sectionizer on a spaCy doc. Sectionizer will identify sections using provided rules, then evaluate any section hierarchy as needed, create section spans, and modify attributes on existing spans based on the sections the entities spans in.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
Doc
|
The processed spaCy Doc. |
Source code in medspacy/section_detection/sectionizer.py
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 | |
__init__(nlp, name='medspacy_sectionizer', rules='default', language_code='en', max_section_length=None, phrase_matcher_attr='LOWER', require_start_line=False, require_end_line=False, newline_pattern='[\\n\\r]+[\\s]*$', input_span_type='ents', span_group_name='medspacy_spans', span_attrs='default', apply_sentence_boundary=False)
Create a new Sectionizer component.
Args:
nlp: A SpaCy Language object.
name: The name of the component.
rules: The rules to load. Default is "default", loads rules packaged with medspaCy that are derived from
SecTag, MIMIC-III, and practical refinement at the US Department of Veterans Affairs. If None, no rules
are loaded. Otherwise, must be a path to a json file containing rules. Add SectionRules directly through
`Sectionizer.add`.
language_code: Language code to use (ISO code) as a default for loading resources. See documentation
and also the /resources directory to see which resources might be available in each language.
Default is "en" for English.
max_section_length: Optional argument specifying the maximum number of tokens following a section header
which can be included in a section body. This can be useful if you think your section rules are
incomplete and want to prevent sections from running too long in the note. Default is None, meaning that
the scope of a section will be until either the next section header or the end of the document.
phrase_matcher_attr: The token attribute to use for PhraseMatcher for rules where `pattern` is None. Default
is 'LOWER'.
require_start_line: Optionally require a section header to start on a new line. Default False.
require_end_line: Optionally require a section header to end with a new line. Default False.
newline_pattern: Regular expression to match the new line either preceding or following a header
if either require_start_line or require_end_line are True. Default is r"[
]+[\s]*$"
span_attrs: The optional span attributes to modify. Default option "default" uses attributes in
DEFAULT_ATTRIBUTES. If a dictionary of custom attributes, format is a dictionary mapping section
categories to a dictionary containing the attribute name and the value to set the attribute to when a
span is contained in a section of that category. Custom attributes must be assigned with
Span.set_extension before creating the Sectionizer. If None, sectionizer will not modify span
attributes.
input_span_type: "ents" or "group". Where to look for spans when modifying attributes of spans
contained in a section if span_attrs is not None. "ents" will modify attributes of spans in doc.ents.
"group" will modify attributes of spans in the span group specified by span_group_name.
span_group_name: The name of the span group used when input_span_type is "group". Default is
"medspacy_spans".
apply_sentence_boundary: Optionally end sentence before and after section header boundary. This ensures
the section header is considered its own sentence.
Source code in medspacy/section_detection/sectionizer.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
add(rules)
Adds SectionRules to the Sectionizer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
A single SectionRule or a collection of SectionRules to add to the Sectionizer. |
required |
Source code in medspacy/section_detection/sectionizer.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | |
filter_end_lines(doc, matches)
Filter a list of matches to only contain spans where the start token is followed by a new line.
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int]]
|
A list of match tuples (match_id, start, end) that meet the filter criteria. |
Source code in medspacy/section_detection/sectionizer.py
503 504 505 506 507 508 509 510 511 512 513 514 | |
filter_start_lines(doc, matches)
Filter a list of matches to only contain spans where the start token is the beginning of a new line.
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int]]
|
A list of match tuples (match_id, start, end) that meet the filter criteria. |
Source code in medspacy/section_detection/sectionizer.py
490 491 492 493 494 495 496 497 498 499 500 501 | |
register_default_attributes()
classmethod
Register the default values for the Span attributes defined in DEFAULT_ATTRIBUTES.
Source code in medspacy/section_detection/sectionizer.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
set_assertion_attributes(spans)
Add Span-level attributes to entities based on which section they occur in.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spans
|
Iterable[Span]
|
the spans to modify. |
required |
Source code in medspacy/section_detection/sectionizer.py
366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 | |
set_parent_sections(sections)
Determine the legal parent-child section relationships from the list of in-order sections of a document and the possible parents of each section as specified during direction creation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sections
|
List[Tuple[int, int, int]]
|
a list of spacy match tuples found in the doc |
required |
Returns:
| Type | Description |
|---|---|
List[Tuple[int, int, int, int]]
|
A list of tuples (match_id, start, end, parent_idx) where the first three indices are the same as the input |
List[Tuple[int, int, int, int]]
|
and the added parent_idx represents the index in the list that corresponds to the parent section. Might be a |
List[Tuple[int, int, int, int]]
|
smaller list than the input due to pruning with |
Source code in medspacy/section_detection/sectionizer.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 | |
util
This module will contain helper functions and classes for common clinical processing tasks which will be used in medspaCy's sectionizer.
is_end_line(idx, doc, pattern)
Check whether the token at idx occurs at the end of the line.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx
|
int
|
The token index to check. |
required |
doc
|
Doc
|
The doc to check in. |
required |
pattern
|
Pattern
|
The newline pattern to check with. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the token occurs at the end of a line. |
Source code in medspacy/section_detection/util.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
is_start_line(idx, doc, pattern)
Check whether the token at idx occurs at the start of the line.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
idx
|
int
|
The token index to check. |
required |
doc
|
Doc
|
The doc to check in. |
required |
pattern
|
Pattern
|
The newline pattern to check with. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
Whether the token occurs at the start of a line. |
Source code in medspacy/section_detection/util.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | |
sentence_splitting
PySBDSentenceSplitter
Source code in medspacy/sentence_splitting.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
__call__(doc)
Spacy component based on: https://github.com/nipunsadvilkar/pySBD improved to work with spacy 3.0
Source code in medspacy/sentence_splitting.py
13 14 15 16 17 18 19 20 21 | |
target_matcher
concept_tagger
ConceptTagger
ConceptTagger is a component for setting an attribute on tokens contained in spans extracted by TargetRules. This can be used for tasks such as semantic labeling or for normalizing tokens, making downstream extraction simpler.
A common use case is when a single concept can have many synonyms or variants and downstream rules would be simplified by matching on a unified token tag for those synonyms rather than including the entire synonym list in each downstream rule.
Source code in medspacy/target_matcher/concept_tagger.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
attr_name
property
The name of the attribute that will be set on each matched token.
Returns:
| Type | Description |
|---|---|
str
|
The attribute name. |
__call__(doc)
Call ConceptTagger on a doc. Matches spans and assigns attributes to all tokens contained in those spans, but does not preserve the spans themselves.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spaCy Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
Doc
|
The spaCy Doc processed. |
Source code in medspacy/target_matcher/concept_tagger.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | |
__init__(nlp, name='medspacy_concept_tagger', attr_name='concept_tag')
Creates a new ConceptTagger.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nlp
|
Language
|
A spaCy Language model. |
required |
name
|
str
|
The name of the ConceptTagger component. Must be a valid python variable name. |
'medspacy_concept_tagger'
|
attr_name
|
str
|
The name of the attribute to set to tokens. |
'concept_tag'
|
Source code in medspacy/target_matcher/concept_tagger.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | |
add(rules)
Adds a single TargetRule or a list of TargetRules to the ConceptTagger.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Union[TargetRule, List[TargetRule]]
|
A single TargetRule or a collection of TargetRules. |
required |
Source code in medspacy/target_matcher/concept_tagger.py
56 57 58 59 60 61 62 63 | |
target_matcher
TargetMatcher
TargetMatcher is a component for advanced direction-based text extraction. Rules are defined using
medspacy.target_matcher.TargetRule.
A TargetMatcher will use the added TargetRule objects to identify matches in the text and apply labels or modify
attributes. It will either modify the input spaCy Doc with the result or return the spans as a list.
In addition to extracting spans of text and setting labels, TargetRules can also define setting custom attributes and metadata. Additionally, each resulting span has an attribute span._.target_rule which maps a span to the TargetRule which set it.
Source code in medspacy/target_matcher/target_matcher.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
labels
property
Gets the list of labels for the TargetMatcher. Based on rules added to the TargetMatcher.
Returns:
| Type | Description |
|---|---|
Set[str]
|
A list of all labels that the TargetMatcher can produce. |
result_type
property
writable
The result type of the TargetMatcher. "ents" indicates that calling TargetMatcher will store the results in
doc.ents, "group" indicates that the results will be stored in the span group indicated by span_group_name,
and None indicates that spans will be returned in a list.
Returns:
| Type | Description |
|---|---|
Union[str, None]
|
The result type string. |
rules
property
Gets the list of TargetRules for the TargetMatcher.
Returns:
| Type | Description |
|---|---|
List[TargetRule]
|
A list of TargetRules. |
span_group_name
property
writable
The name of the span group used by this component. If result_type is "group", calling this component will
place results in the span group with this name.
Returns:
| Type | Description |
|---|---|
str
|
The span group name. |
__call__(doc)
Calls TargetMatcher on a Doc. By default and when result_type is "ents", adds results to doc.ents. If
result_type is "group", adds results to the span group specified by span_group_name. If result_type is
None, then returns a list of the matched Spans.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spaCy Doc to process. |
required |
Returns:
| Type | Description |
|---|---|
Union[Doc, List[Span]]
|
Returns a modified |
Union[Doc, List[Span]]
|
|
Source code in medspacy/target_matcher/target_matcher.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
__init__(nlp, name='medspacy_target_matcher', rules=None, phrase_matcher_attr='LOWER', result_type='ents', span_group_name='medspacy_spans', prune=True)
Creates a new TargetMatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nlp
|
Language
|
A spaCy Language model. |
required |
name
|
str
|
The name of the TargetMatcher component |
'medspacy_target_matcher'
|
rules
|
Optional[str]
|
An optional filepath containing a JSON of TargetRules. If None, then no rules will be added. Default None. |
None
|
phrase_matcher_attr
|
str
|
The token attribute to use for PhraseMatcher for rules where |
'LOWER'
|
result_type
|
Union[Literal['ents', 'group'], None]
|
"ents" (default), "group", or None. Determines where TargetMatcher will put the matched spans. "ents" will add spans to doc.ents and add to any existing entities. If conflicts appear, existing entities will take precedence. "group" will add spans to doc.spans under the specified group name. None will return the list of spans rather than saving to the Doc. |
'ents'
|
span_group_name
|
str
|
The name of the span group used to store results when result_type is "group". Default is "medspacy_spans". |
'medspacy_spans'
|
Source code in medspacy/target_matcher/target_matcher.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | |
add(rules)
Adds a single TargetRule or a list of TargetRules to the TargetMatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Union[TargetRule, Iterable[TargetRule]]
|
A single TargetRule or a collection of TargetRules. |
required |
Source code in medspacy/target_matcher/target_matcher.py
121 122 123 124 125 126 127 128 129 130 131 132 133 | |
target_rule
TargetRule
Bases: BaseRule
TargetRule defines rules for extracting entities from text using the TargetMatcher.
Source code in medspacy/target_matcher/target_rule.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
__init__(literal, category, pattern=None, on_match=None, attributes=None, metadata=None)
Creates a new TargetRule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
literal
|
str
|
The string representation of a concept. If |
required |
category
|
str
|
The semantic class of the matched span. This corresponds to the |
required |
pattern
|
Optional[Union[List[Dict[str, str]], str]]
|
A list or string to use as a spaCy pattern rather than |
None
|
on_match
|
Optional[Callable[[Matcher, Doc, int, List[Tuple[int, int, int]]], Any]]
|
An optional callback function or other callable which takes 4 arguments: |
None
|
attributes
|
Optional[Dict[str, Any]]
|
Optional custom attribute names to set for a Span matched by the direction. These attribute
names are stored under Span..[attribute_name]. For example, if |
None
|
metadata
|
Optional[Dict[Any, Any]]
|
Optional dictionary of any extra metadata. |
None
|
Source code in medspacy/target_matcher/target_rule.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | |
from_dict(rule_dict)
classmethod
Reads a dictionary into a ConTextRule. Used when reading from a json file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rule_dict
|
Dict
|
the dictionary to convert |
required |
Returns:
| Type | Description |
|---|---|
TargetRule
|
The ConTextRule created from the dictionary |
Raises:
| Type | Description |
|---|---|
ValueError
|
if the json is invalid |
Source code in medspacy/target_matcher/target_rule.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
from_json(filepath)
classmethod
Read in a lexicon of modifiers from a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
the .json file containing modifier rules |
required |
Returns:
| Name | Type | Description |
|---|---|---|
context_item |
List[TargetRule]
|
A list of ConTextRule objects. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If the dictionary contains any keys other than those accepted by ConTextRule.init |
Source code in medspacy/target_matcher/target_rule.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
to_dict()
Converts TargetRules to a python dictionary. Used when writing target rules to a json file.
Returns:
| Type | Description |
|---|---|
|
The dictionary containing the TargetRule info. |
Source code in medspacy/target_matcher/target_rule.py
119 120 121 122 123 124 125 126 127 128 129 130 | |
to_json(target_rules, filepath)
classmethod
Writes ConTextItems to a json file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_rules
|
List[TargetRule]
|
a list of TargetRules that will be written to a file. |
required |
filepath
|
str
|
the .json file to contain modifier rules |
required |
Source code in medspacy/target_matcher/target_rule.py
105 106 107 108 109 110 111 112 113 114 115 116 117 | |
util
This module will contain helper functions and classes for common clinical processing tasks which will be used in many medspaCy components.
_build_pipe_names(enable, disable=None)
Implement logic based on the pipenames defined in 'enable' and 'disable'. If enable and disable are both None, then it will load the default pipenames. Otherwise, will allow custom selection of components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enable
|
Union[str, Iterable[str]]
|
"all" loads components from ALL_PIPE_NAMES. "default" loads components from DEFAULT_PIPE_NAMES. Otherwise, loads he list of components as components. |
required |
disable
|
Optional[Iterable[str]]
|
The optional list of components to disable. Set difference of enable. |
None
|
Returns:
| Type | Description |
|---|---|
Tuple[Set[str], Set[str]]
|
A complete list of enabled and disabled components, with all components listed and empty intersection. |
Source code in medspacy/util.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 | |
load(model='default', medspacy_enable='default', medspacy_disable=None, language_code='en', load_rules=True, quickumls_path=None, **model_kwargs)
Load a spaCy language object with medSpaCy pipeline components.
By default, the base model will be a blank 'en' model with the
following components:
- "medspacy_tokenizer": A customized, more aggressive tokenizer than the default spaCy tokenizer. This is set to
nlp.tokenizer and is not loaded as a pipeline component.
- "medspacy_pyrush": PyRuSH Sentencizer for sentence splitting
- "medspacy_target_matcher": TargetMatcher for extended pattern matching
- "medspacy_context": ConText for attribute assertion
- "medspacy_quickumls": QuickUMLS for UMLS concept mapping
Args:
model: The base spaCy model to load. If 'default', will instantiate from a blank 'en' model. If it is a spaCy
language model, then it will simply add medspaCy components to the existing pipeline. If it is a string
other than 'default', passes the string to spacy.load(model, **model_kwargs).
medspacy_enable: Specifies which components to enable in the medspacy pipeline. If "default", will load all components
found in DEFAULT_PIPE_NAMES. These represent the simplest components used in a clinical NLP pipeline:
tokenization, sentence detection, concept identification, and ConText. If "all", all components in medspaCy
will be loaded. If a collection of strings, the components specified will be loaded.
medspacy_disable: A collection of component names to exclude. Requires "all" is the value for enable.
language_code: Language code to use (ISO code) as a default for loading additional resources. See documentation
and also the /resources directory to see which resources might be available in each language.
Default is "en" for English.
load_rules: Whether to include default rules for available components. If True, sectionizer and context will
both be loaded with default rules. Default is True.
quickumls_path: Path to QuickUMLS dictionaries if it is included in the pipeline.
model_kwargs: Optional model keyword arguments to pass to spacy.load().
Returns:
| Type | Description |
|---|---|
|
A spaCy Language object containing the specified medspacy components. |
Source code in medspacy/util.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |
tuple_overlaps(a, b)
Calculates whether two tuples overlap. Assumes tuples are sorted to be like spans (start, end)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
Tuple[int, int]
|
A tuple representing a span (start, end). |
required |
b
|
Tuple[int, int]
|
A tuple representing a span (start, end). |
required |
Returns:
| Type | Description |
|---|---|
|
Whether the tuples overlap. |
Source code in medspacy/util.py
192 193 194 195 196 197 198 199 200 201 202 203 | |
visualization
MedspaCyVisualizerWidget
Source code in medspacy/visualization.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 | |
__init__(docs, target_span_type='ents', span_group_name='medspacy_spans')
Create an IPython Widget Box displaying medspaCy's visualizers. The widget allows selecting visualization style ("Ent", "Dep", or "Both") and a slider for selecting the index of docs.
For more information on IPython widgets, see: https://ipywidgets.readthedocs.io/en/latest/index.html
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
docs
|
A list of docs processed by a medspaCy pipeline |
required |
Source code in medspacy/visualization.py
253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | |
display()
Display the Box widget in the current IPython cell.
Source code in medspacy/visualization.py
308 309 310 311 312 | |
set_docs(docs)
Replace the list of docs to be visualized.
Source code in medspacy/visualization.py
335 336 337 338 | |
_create_color_generator()
Create a generator which will cycle through a list of default matplotlib colors
Source code in medspacy/visualization.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
visualize_dep(doc, jupyter=True)
Create a dependency-style visualization for ConText targets and modifiers in doc. This will show the relationships between entities in doc and contextual modifiers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
The spacy Doc to visualize. |
required |
jupyter
|
bool
|
Whether it is being rendered in a jupyter notebook. |
True
|
Returns:
| Type | Description |
|---|---|
str
|
The visualization. |
Source code in medspacy/visualization.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | |
visualize_ent(doc, context=True, sections=True, jupyter=True, colors=None, target_span_type='ents', span_group_name='medspacy_spans')
Creates a NER-style visualization for targets and modifiers in Doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
A spacy doc to visualize. |
required |
context
|
bool
|
Whether to display the modifiers generated by medSpaCy's cycontext. If the doc has not been processed by context, this will be automatically changed to False. Default True. |
True
|
sections
|
bool
|
Whether to display the section titles generated by medSpaCy's sectionizer (still in development). If the doc has not been processed by sectionizer , this will be automatically changed to False. This may also have some overlap with cycontext, in which case duplicate spans will be displayed. Default True. |
True
|
jupyter
|
bool
|
If True, will render directly in a Jupyter notebook. If False, will return the HTML. Default True. |
True
|
colors
|
Dict[str, str]
|
An optional dictionary which maps labels of targets and modifiers to color strings to be rendered. If None, will create a generator which cycles through the default matplotlib colors for ent and modifier labels and uses a light gray for section headers. Default None. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
The visualization. |
Source code in medspacy/visualization.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |