medspacy.section_detection.section
Section
Bases: object
Section is the object that stores the result of processing by the Sectionizer class. A Section contains information describing the section's category, title span, body span, parent, and the rule that created it.
Section category is equivalent to label_ in a basic spaCy entity. It is a normalized name for the section type
determined on initialization, either created manually or through the Sectionizer pipeline component.
Section title, defined with title_start, title_end, and title_span represents the section title or header
matched with the rule. In the text "Past medical history: stroke and high blood pressure", "Past medical history:"
would be the title.
Section body is defined with body_start, body_end, and body_span. It represents the text between the end of
the current section's title and the start of the title for the next Section or when scope is set in the rule or by
the Sectionizer. In the text "Past medical history: stroke and high blood pressure", "stroke and high blood
pressure" would be the body.
Parent is a string that represents the conceptual "parent" section in a section->subsection->subsubsection hierarchy. Candidates are determined by category in the rule and matched at runtime.
Source code in medspacy/section_detection/section.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
body_span
property
Gets the span of the section body.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start and end indexes of the section body. |
section_span
property
Gets the span of the entire section, from title start to body end.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start index of the section title and the end index of the section body. |
title_span
property
Gets the span of the section title.
Returns:
| Type | Description |
|---|---|
|
A tuple (int,int) containing the start and end indexes of the section title. |
__init__(category, title_start, title_end, body_start, body_end, parent=None, rule=None)
Create a new Section object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
Union[str, None]
|
A normalized name for the section. Equivalent to |
required |
title_start
|
int
|
Index of the first token of the section title. |
required |
title_end
|
int
|
Index of the last token of the section title. |
required |
body_start
|
int
|
Index of the first token of the section body. |
required |
body_end
|
int
|
Index of the last token of the section body. |
required |
parent
|
Optional[str]
|
The category of the parent section. |
None
|
rule
|
Optional[SectionRule]
|
The SectionRule that generated the section. |
None
|
Source code in medspacy/section_detection/section.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
from_serialized_representation(serialized_representation)
classmethod
Load the section from a json-serialized form.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
serialized_representation
|
Dict[str, str]
|
The dictionary form of the section object to load. |
required |
Returns:
| Type | Description |
|---|---|
|
A Section object containing the data from the dictionary provided. |
Source code in medspacy/section_detection/section.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | |
serialized_representation()
Serialize the Section.
Returns:
| Type | Description |
|---|---|
|
A json-serialized representation of the section. |
Source code in medspacy/section_detection/section.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | |