medspacy.preprocess.preprocessor
Preprocessor
This is the medspacy Preprocessor class. It is designed as a wrapper for destructive preprocessing rules such as stripping or replacing text in a document before the text enters a spaCy pipeline.
This is NOT a spaCy component and cannot be added to a spaCy pipeline. Please use the preprocessor before
calling nlp("your text here"). SpaCy only allows for non-destructive processing on the text, but that is not
always advisable for every project, so this enables destructive preprocessing when required.
Source code in medspacy/preprocess/preprocessor.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
__call__(text, tokenize=True)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
|
required | |
tokenize
|
|
True
|
Returns:
Source code in medspacy/preprocess/preprocessor.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
__init__(tokenizer)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokenizer
|
|
required |
Source code in medspacy/preprocess/preprocessor.py
18 19 20 21 22 23 24 25 | |
add(rules)
Adds a PreprocessingRule or collection of PreprocessingRules to the Preprocessor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rules
|
Union[PreprocessingRule, Iterable[PreprocessingRule]]
|
A single PreprocessingRule or a collection of PreprocessingRules to add. |
required |
Source code in medspacy/preprocess/preprocessor.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |