Skip to content

medspacy.section_detection.section

Section

Bases: object

Section is the object that stores the result of processing by the Sectionizer class. A Section contains information describing the section's category, title span, body span, parent, and the rule that created it.

Section category is equivalent to label_ in a basic spaCy entity. It is a normalized name for the section type determined on initialization, either created manually or through the Sectionizer pipeline component.

Section title, defined with title_start, title_end, and title_span represents the section title or header matched with the rule. In the text "Past medical history: stroke and high blood pressure", "Past medical history:" would be the title.

Section body is defined with body_start, body_end, and body_span. It represents the text between the end of the current section's title and the start of the title for the next Section or when scope is set in the rule or by the Sectionizer. In the text "Past medical history: stroke and high blood pressure", "stroke and high blood pressure" would be the body.

Parent is a string that represents the conceptual "parent" section in a section->subsection->subsubsection hierarchy. Candidates are determined by category in the rule and matched at runtime.

Source code in medspacy/section_detection/section.py
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
class Section(object):
    """
    Section is the object that stores the result of processing by the Sectionizer class. A Section contains information
    describing the section's category, title span, body span, parent, and the rule that created it.

    Section `category` is equivalent to `label_` in a basic spaCy entity. It is a normalized name for the section type
    determined on initialization, either created manually or through the Sectionizer pipeline component.

    Section title, defined with `title_start`, `title_end`, and `title_span` represents the section title or header
    matched with the rule. In the text "Past medical history: stroke and high blood pressure", "Past medical history:"
    would be the title.

    Section body is defined with `body_start`, `body_end`, and `body_span`. It represents the text between the end of
    the current section's title and the start of the title for the next Section or when scope is set in the rule or by
    the Sectionizer. In the text "Past medical history: stroke and high blood pressure", "stroke and high blood
    pressure" would be the body.

    Parent is a string that represents the conceptual "parent" section in a section->subsection->subsubsection
    hierarchy. Candidates are determined by category in the rule and matched at runtime.
    """

    def __init__(
        self,
        category: Union[str, None],
        title_start: int,
        title_end: int,
        body_start: int,
        body_end: int,
        parent: Optional[str] = None,
        rule: Optional[SectionRule] = None,
    ):
        """
        Create a new Section object.

        Args:
            category: A normalized name for the section. Equivalent to `label_` for basic spaCy entities.
            title_start: Index of the first token of the section title.
            title_end: Index of the last token of the section title.
            body_start: Index of the first token of the section body.
            body_end: Index of the last token of the section body.
            parent: The category of the parent section.
            rule: The SectionRule that generated the section.
        """
        self.category = category
        self.title_start = title_start
        self.title_end = title_end
        self.body_start = body_start
        self.body_end = body_end
        self.parent = parent
        self.rule = rule

    def __repr__(self):
        return (
            f"Section(category={self.category} at {self.title_start} : {self.title_end} in the doc with a body at "
            f"{self.body_start} : {self.body_end} based on the rule {self.rule}"
        )

    @property
    def title_span(self):
        """
        Gets the span of the section title.

        Returns:
            A tuple (int,int) containing the start and end indexes of the section title.
        """
        return self.title_start, self.title_end

    @property
    def body_span(self):
        """
        Gets the span of the section body.

        Returns:
            A tuple (int,int) containing the start and end indexes of the section body.
        """
        return self.body_start, self.body_end

    @property
    def section_span(self):
        """
        Gets the span of the entire section, from title start to body end.

        Returns:
            A tuple (int,int) containing the start index of the section title and the end index of the section body.
        """
        return self.title_start, self.body_end

    def serialized_representation(self):
        """
        Serialize the Section.

        Returns:
            A json-serialized representation of the section.
        """
        rule = self.rule

        return {
            "category": self.category,
            "title_start": self.title_start,
            "title_end": self.title_end,
            "body_start": self.body_start,
            "body_end": self.body_end,
            "parent": self.parent,
            "rule": rule.to_dict() if rule is not None else None,
        }

    @classmethod
    def from_serialized_representation(cls, serialized_representation: Dict[str, str]):
        """
        Load the section from a json-serialized form.

        Args:
            serialized_representation: The dictionary form of the section object to load.

        Returns:
            A Section object containing the data from the dictionary provided.
        """
        rule = SectionRule.from_dict(serialized_representation["rule"])
        section = Section(
            **{k: v for k, v in serialized_representation.items() if k not in ["rule"]}
        )
        section.rule = rule

        return section

body_span property

Gets the span of the section body.

Returns:

Type Description

A tuple (int,int) containing the start and end indexes of the section body.

section_span property

Gets the span of the entire section, from title start to body end.

Returns:

Type Description

A tuple (int,int) containing the start index of the section title and the end index of the section body.

title_span property

Gets the span of the section title.

Returns:

Type Description

A tuple (int,int) containing the start and end indexes of the section title.

__init__(category, title_start, title_end, body_start, body_end, parent=None, rule=None)

Create a new Section object.

Parameters:

Name Type Description Default
category Union[str, None]

A normalized name for the section. Equivalent to label_ for basic spaCy entities.

required
title_start int

Index of the first token of the section title.

required
title_end int

Index of the last token of the section title.

required
body_start int

Index of the first token of the section body.

required
body_end int

Index of the last token of the section body.

required
parent Optional[str]

The category of the parent section.

None
rule Optional[SectionRule]

The SectionRule that generated the section.

None
Source code in medspacy/section_detection/section.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def __init__(
    self,
    category: Union[str, None],
    title_start: int,
    title_end: int,
    body_start: int,
    body_end: int,
    parent: Optional[str] = None,
    rule: Optional[SectionRule] = None,
):
    """
    Create a new Section object.

    Args:
        category: A normalized name for the section. Equivalent to `label_` for basic spaCy entities.
        title_start: Index of the first token of the section title.
        title_end: Index of the last token of the section title.
        body_start: Index of the first token of the section body.
        body_end: Index of the last token of the section body.
        parent: The category of the parent section.
        rule: The SectionRule that generated the section.
    """
    self.category = category
    self.title_start = title_start
    self.title_end = title_end
    self.body_start = body_start
    self.body_end = body_end
    self.parent = parent
    self.rule = rule

from_serialized_representation(serialized_representation) classmethod

Load the section from a json-serialized form.

Parameters:

Name Type Description Default
serialized_representation Dict[str, str]

The dictionary form of the section object to load.

required

Returns:

Type Description

A Section object containing the data from the dictionary provided.

Source code in medspacy/section_detection/section.py
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
@classmethod
def from_serialized_representation(cls, serialized_representation: Dict[str, str]):
    """
    Load the section from a json-serialized form.

    Args:
        serialized_representation: The dictionary form of the section object to load.

    Returns:
        A Section object containing the data from the dictionary provided.
    """
    rule = SectionRule.from_dict(serialized_representation["rule"])
    section = Section(
        **{k: v for k, v in serialized_representation.items() if k not in ["rule"]}
    )
    section.rule = rule

    return section

serialized_representation()

Serialize the Section.

Returns:

Type Description

A json-serialized representation of the section.

Source code in medspacy/section_detection/section.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def serialized_representation(self):
    """
    Serialize the Section.

    Returns:
        A json-serialized representation of the section.
    """
    rule = self.rule

    return {
        "category": self.category,
        "title_start": self.title_start,
        "title_end": self.title_end,
        "body_start": self.body_start,
        "body_end": self.body_end,
        "parent": self.parent,
        "rule": rule.to_dict() if rule is not None else None,
    }