medspacy.preprocess.preprocessing_rule
PreprocessingRule
This is a rule for handling preprocessing in the medspaCy Preprocessor. This class does not inherit from BaseRule, as it cannot be used in a spaCy pipeline. The Preprocessor and PreprocessingRules are designed to preprocess text before entering a spaCy pipeline to allow for destructive preprocessing, such as stripping or replacing text.
Source code in medspacy/preprocess/preprocessing_rule.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | |
__call__(text)
Apply a preprocessing direction. If the callback attribute of direction is None, then it will return a string using the direction sub method. If callback is not None, then callback function will be executed using the resulting match as an argument.
Source code in medspacy/preprocess/preprocessing_rule.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | |
__init__(pattern, repl='', flags=re.IGNORECASE, callback=None, desc=None)
Creates a new PreprocessingRule. Preprocessing rules define spans of text to be removed and optionally replaced from the text underneath a doc.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str
|
The text pattern to match and replace in a doc. Must be a string, which will be compiled as a regular expression. The patterns will lead to re.Match objects. |
required |
repl
|
Union[str, Callable[[Match], Any]]
|
The text to replace a matched string with. By default, repl is an empty string. If repl is a function, sends function to re.sub and it will be called on each Match object. More info here https://docs.python.org/3/library/re.html#re.sub |
''
|
flags
|
RegexFlag
|
A regex compilation flag. Default is re.IGNORECASE. |
IGNORECASE
|
callback
|
Optional[Callable[[str, Match], str]]
|
An optional callable which takes the raw text and a Match and returns the new copy of the text, rather than just replacing strings for the matched text. This can allow larger text manipulation, such as stripping out an entire section based on a header. |
None
|
desc
|
Optional[str]
|
An optional description. |
None
|
Source code in medspacy/preprocess/preprocessing_rule.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | |
from_dict(d)
classmethod
Creates a PreprocessingRule from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
Dict
|
The dict to read. |
required |
Returns:
| Type | Description |
|---|---|
PreprocessingRule
|
A PreprocessingRule from the dictionary. |
Source code in medspacy/preprocess/preprocessing_rule.py
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | |
from_json(filepath)
classmethod
Read a JSON file containing PreprocessingRule data at the key "preprocessing_rules".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
The filepath of the JSON to read. |
required |
Returns:
| Type | Description |
|---|---|
|
A list of PreprocessingRules from the JSON file. |
Source code in medspacy/preprocess/preprocessing_rule.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | |
to_dict()
Writes a preprocessing rule to a dictionary. Useful for writing all rules to a json later.
Returns:
| Type | Description |
|---|---|
|
A dictionary containing the PreprocessingRule's data. |
Source code in medspacy/preprocess/preprocessing_rule.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | |