%load_ext lab_black
Integration of nlp-insights with a FHIR Server
Although nlp-insights can be used as a standalone service, the primary intent of the service is to use it to enhance a bundle or resources, prior to posting those resources to a FHIR server. This notebook contains a demonstration of posting enriched resources to a FHIR server, and retrieving enriched insights and evidence for the insight.
Setup
This notebook was created with jupyter-lab 3.1.11 and python 3.9.6. Using a virtual envirnoment is recommended. Python source code is formatted with Black.
Start and configure the nlp-insights service
The examples have been written with the assumption that ACD is configured as the NLP backend for the nlp-insight service. You need to start and confgure the nlp-insights service. Configuring the server to use QuickUMLS is also an option, although the discovered insights will differ.
Start a local FHIR server
Although health patterns defines a much more sophisticated architecture for ingestion pipelines, these examples use the IBM FHIR server running locally in a container. This keeps things simple, and allows us to focus on the value of the nlp-insights server.
The server can be started locally by running the command:
docker run -p 9443:9443 -e BOOTSTRAP_DB=true ibmcom/ibm-fhir-server
Load FHIRPath Jars
FHIRPath is an HL7 standard for navigating and extracting parts of FHIR resources. These examples evaluate FHIRPath expressions by utilizing Java code built for the IBM FHIR Server. The advantage to using FHIRPath is that the FHIRPath language is aware of features specific to FHIR resources, which makes the queries simpiler in many cases. The python interface provided in this notebook does not provide full functionality, but it is complete enough for the examples.
You need to download the jars from maven centeral, and store them in the local directory indicated by FHIR_PATH_JARS (defined in a future cell).
These are steps to do that (You may need to install Apache Maven 3.5.4 or newer):
* Download the pom for the project curl https://repo1.maven.org/maven2/com/ibm/fhir/fhir-path/4.10.2/fhir-path-4.10.2.pom > pom.xml
* Download the jars mvn -DoutputDirectory=. -Dartifact="com.ibm.fhir:fhir-path:4.10.0" dependency:copy dependency:copy-dependencies
Third party libraries
The examples depend on a few other libraries to make processing easier. jpype1 is used to call Java code when evaluating FHIRPath expressions.
!pip install --upgrade pip
!pip install pandas==1.3.5
!pip install fhir.resources==6.1.0
!pip install jpype1==1.3.0
Requirement already satisfied: pip in ./nlp-insights/lib/python3.9/site-packages (21.3.1)
Requirement already satisfied: pandas==1.3.5 in ./nlp-insights/lib/python3.9/site-packages (1.3.5)
Requirement already satisfied: pytz>=2017.3 in ./nlp-insights/lib/python3.9/site-packages (from pandas==1.3.5) (2021.3)
Requirement already satisfied: numpy>=1.17.3 in ./nlp-insights/lib/python3.9/site-packages (from pandas==1.3.5) (1.22.0)
Requirement already satisfied: python-dateutil>=2.7.3 in ./nlp-insights/lib/python3.9/site-packages (from pandas==1.3.5) (2.8.2)
Requirement already satisfied: six>=1.5 in ./nlp-insights/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas==1.3.5) (1.16.0)
Requirement already satisfied: fhir.resources==6.1.0 in ./nlp-insights/lib/python3.9/site-packages (6.1.0)
Requirement already satisfied: pydantic[email]>=1.7.2 in ./nlp-insights/lib/python3.9/site-packages (from fhir.resources==6.1.0) (1.9.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./nlp-insights/lib/python3.9/site-packages (from pydantic[email]>=1.7.2->fhir.resources==6.1.0) (4.0.1)
Requirement already satisfied: email-validator>=1.0.3 in ./nlp-insights/lib/python3.9/site-packages (from pydantic[email]>=1.7.2->fhir.resources==6.1.0) (1.1.3)
Requirement already satisfied: dnspython>=1.15.0 in ./nlp-insights/lib/python3.9/site-packages (from email-validator>=1.0.3->pydantic[email]>=1.7.2->fhir.resources==6.1.0) (2.1.0)
Requirement already satisfied: idna>=2.0.0 in ./nlp-insights/lib/python3.9/site-packages (from email-validator>=1.0.3->pydantic[email]>=1.7.2->fhir.resources==6.1.0) (3.3)
Requirement already satisfied: jpype1==1.3.0 in ./nlp-insights/lib/python3.9/site-packages (1.3.0)
import requests
import base64
import json
import urllib3
import os
import base64
import pandas as pd
import numpy as np
pd.set_option("display.max_colwidth", None)
Wrapper code to Evaluate FHIRPath expression
This code is used to call into the Java FHIRPath evaluation code. The details of how it works are outside the scope of nlp-insights examples. If you need more details jpype is well documented and the Java code and documentation is available here. This seemed to be the easiest way to evaluate an expression from Python, although not all expressions are supported. If you would like to try a different implementation of FHIRPath, there are a few listed on the HL7 wiki.
###
# CHANGE THIS TO THE DIRECTORY WHERE YOU DOWNLOADED THE FHIRPath JARS!!!!!
###
FHIR_PATH_JARS = "/home/ntl/fhir/fhir-path/*"
import jpype
import jpype.imports
from jpype.types import *
print(f"looking for FHIRPath jars in {FHIR_PATH_JARS}")
if not jpype.isJVMStarted():
jpype.startJVM(classpath=[FHIR_PATH_JARS])
looking for FHIRPath jars in /home/ntl/fhir/fhir-path/*
from java.io import ByteArrayInputStream
import java.util.Collection
import java.lang.String
import java.lang.Integer
import java.math.BigDecimal
from com.ibm.fhir.path.evaluator import FHIRPathEvaluator
from com.ibm.fhir.model.parser import FHIRParser
from com.ibm.fhir.model.format import Format
import com.ibm.fhir.path.FHIRPathElementNode
import com.ibm.fhir.path.FHIRPathResourceNode
import com.ibm.fhir.path.exception.FHIRPathException as FHIRPathException
from json import JSONDecodeError
def convert_obj(java_obj):
"""Converts a FHIRPath Java Object to a python object"""
if java_obj is None:
return None
if isinstance(java_obj, com.ibm.fhir.path.FHIRPathResourceNode):
return str(java_obj.resource().toString())
if isinstance(java_obj, com.ibm.fhir.path.FHIRPathElementNode):
node = java_obj.element()
if node.hasValue():
node = node.getValue()
if isinstance(node, java.lang.String):
return str(node)
if isinstance(node, java.lang.Integer):
return int(node)
if isinstance(node, java.math.BigDecimal):
return int(node)
if isinstance(node, JArray):
return str(node)
try:
return json.loads(str(node.toString()))
except JSONDecodeError:
return str(node.toString())
elif isinstance(java_obj, java.util.Collection):
return [convert_obj(obj) for obj in java_obj]
else:
try:
return json.loads(str(java_obj.toString()))
except JSONDecodeError:
return str(java_obj.toString())
raise IllegalArgumentError(str((type(node), str(node))))
def evaluate_fhir_path(json_str, expr_str):
"""Evaluates an expression agains a FHIR Resource
Args:
json_str - FHIR resource as a json string
expr_str - FHIRPath expression to evaluate
Returns: Results of the evaluation, usually a list of String values.
May return None if no results were found
"""
resource = FHIRParser.parser(Format.JSON).parse(
ByteArrayInputStream(json_str.encode("utf-8"))
)
try:
nodes = FHIRPathEvaluator.evaluator().evaluate(resource, expr_str)
except FHIRPathException as ex:
raise RuntimeError(str(ex) + "\nWith expression:\n" + expr_str)
return convert_obj(nodes)
Local Server URLs and Ports
# We can be trusting of certificates for a local container
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
fhir_server = "https://fhiruser:change-password@localhost:9443/fhir-server/api/v4"
nlp_insights_server = "http://localhost:5000"
Health Checks
fhir_health_check = requests.get(f"{fhir_server}/$healthcheck", verify=False)
fhir_health_check.raise_for_status()
insights_health_check = requests.get(f"{nlp_insights_server}/config")
insights_health_check.raise_for_status()
POST Bundle with insights into FHIR Server
The input bundle is in a json file that can be viewed here.
The bundle is loaded, sent to the nlp-insights service for enrichment, and then posted to the FHIR server.
Load Bundle (without insights)
with open("./input_bundle.json", "r") as f:
bundle_json = json.load(f)
Input bundle Summary
We can get a rough idea of what is our input bundle using json_normalize to build a data frame. Using dataframes will make it easier to view the insights. Deeply nested JSON documents that represent FHIR resources are hard to look at; rows and columns are more familiar for human readers. Another reason for using rows and columns is that Ground Truth, or the insights that humans expect to be discovered, is often stored as rows and columns. The nlp-insights service does not include ground truth, nor is accuracy discussed in the documentation or tutorials - but accuracy must be measured for real use cases. Working with rows and columns here makes it easier to transition to other these types of analysis.
Using the data frame, it's easy to see which resources and text are in the initial bundle. Also be aware that there are no code values for in the Condition and AllergyIntolerance resources.
df = pd.json_normalize(bundle_json, record_path=["entry"])
df["report_text"] = df["resource.presentedForm"].apply(
lambda f: base64.b64decode(f[0]["data"]).decode("utf-8")
if not pd.isnull(f)
else np.NaN
)
# assert that columns for codes do not exist
assert "resource.medicationCodeableConcept.coding" not in df.columns
assert "resource.code.coding" not in df.columns
# print resource types and code text
df.loc[:, ["resource.resourceType", "resource.code.text", "report_text"]]
resource.resourceType | resource.code.text | report_text | |
---|---|---|---|
0 | Patient | NaN | NaN |
1 | DiagnosticReport | Chief complaint Narrative - Reported | The patient had a myocardial infarction in 2015 and was prescribed Losartan.The patient is taking Losartan exactly as prescribed and has had no side effects. |
2 | Condition | diabetes | NaN |
3 | AllergyIntolerance | peanut | NaN |
4 | AllergyIntolerance | amoxicillin | NaN |
Discover insights
The nlp-insights service is used to discover insights.
nlp_insights_response = requests.post(
f"http://localhost:5000/discoverInsights",
headers={"Content-Type": "application/fhir+json"},
json=bundle_json,
)
nlp_insights_response.raise_for_status()
enriched_bundle_json = json.loads(nlp_insights_response.text)
Enriched Bundle Summary
A quick summary of the updated bundle that was returned from the nlp-insights service verifies that a few new resources have been derived, and the prior condition and AllergyIntolerance resources have been enriched with additional codes.
df = pd.json_normalize(enriched_bundle_json, record_path=["entry"])
df["report_text"] = df["resource.presentedForm"].apply(
lambda f: base64.b64decode(f[0]["data"]).decode("utf-8")
if not pd.isnull(f)
else np.NaN
)
df.loc[df["resource.resourceType"] != "MedicationStatement", "codes"] = df.loc[
df["resource.resourceType"] != "MedicationStatement", "resource.code.coding"
].apply(
lambda codes: [(code["system"], code["code"]) for code in codes]
if isinstance(codes, list)
else np.NaN
)
df.loc[df["resource.resourceType"] == "MedicationStatement", "codes"] = df.loc[
df["resource.resourceType"] == "MedicationStatement",
"resource.medicationCodeableConcept.coding",
].apply(
lambda codes: [(code["system"], code["code"]) for code in codes]
if isinstance(codes, list)
else np.NaN
)
df["code_text"] = df.loc[:, "resource.code.text"].combine_first(
df.loc[:, "resource.medicationCodeableConcept.text"]
)
df.loc[
:,
[
"resource.resourceType",
"code_text",
"codes",
],
]
resource.resourceType | code_text | codes | |
---|---|---|---|
0 | Patient | NaN | NaN |
1 | DiagnosticReport | Chief complaint Narrative - Reported | NaN |
2 | Condition | diabetes | [(http://terminology.hl7.org/CodeSystem/umls, C0011849), (http://snomed.info/sct, 73211009), (http://hl7.org/fhir/sid/icd-9-cm, 250.00), (http://hl7.org/fhir/sid/icd-10-cm, E14.9)] |
3 | AllergyIntolerance | peanut | [(http://terminology.hl7.org/CodeSystem/umls, C0559470), (http://snomed.info/sct, 91935009), (http://hl7.org/fhir/sid/icd-9-cm, 995.3), (http://hl7.org/fhir/sid/icd-10-cm, Z91.010), (http://hl7.org/fhir/sid/icd-10-cm, Z91.0)] |
4 | AllergyIntolerance | amoxicillin | [(http://terminology.hl7.org/CodeSystem/umls, C0571417), (http://snomed.info/sct, 294505008), (http://hl7.org/fhir/sid/icd-9-cm, E930.0), (http://hl7.org/fhir/sid/icd-9-cm, 995.27), (http://hl7.org/fhir/sid/icd-10-cm, Z88.0)] |
5 | Condition | myocardial infarction | [(http://terminology.hl7.org/CodeSystem/umls, C0027051), (http://snomed.info/sct, 22298006), (http://hl7.org/fhir/sid/icd-9-cm, 410.90), (http://hl7.org/fhir/sid/icd-10-cm, I21.9)] |
6 | MedicationStatement | Losartan | [(http://terminology.hl7.org/CodeSystem/umls, C0126174), (http://www.nlm.nih.gov/research/umls/rxnorm, 52175)] |
Post resources with insights to the FHIR server
Posting the updated bundle creates the resources on the FHIR server. This also assigns identifier values to the resources. We will retrieve the patient location from the response, so that we can retrieve the resources from the server.
fhir_server_response = requests.post(
f"{fhir_server}/",
headers={"Content-Type": "application/fhir+json"},
json=enriched_bundle_json,
verify=False,
)
fhir_server_response.raise_for_status()
patient_loc = evaluate_fhir_path(
fhir_server_response.text,
"Bundle.entry.response.location.where(startsWith('Patient')).getValue()",
)[0]
print(f"The patient's location in the FHIR Server is: {patient_loc}")
The patient's location in the FHIR Server is: Patient/17e9d5ddf75-824e1c98-1484-4079-8330-63141202c23b/_history/1
Search for all the patient's resources
In the real world, there will be many resources. The sever may respond with a page at a time, and we might be interested in only a subset of resources. For this example, we'll retrieve everything for the patient; the number of resources is small enough that paging and performance cost is not a consideration.
all_resources_response = requests.get(
f"{fhir_server}/{patient_loc}/$everything",
headers={"Accept": "application/fhir+json"},
verify=False,
)
all_resources_response.raise_for_status()
Convert the search bundle into a DataFrame
The bundle is split into rows, where each row represents a resource in the bundle.
from fhir.resources.bundle import Bundle
resources_df = pd.DataFrame(
[
{
"resource_id": entry.resource.id,
"resource_type": type(entry.resource).__name__,
"resource_json": entry.resource.json(),
}
for entry in Bundle.parse_raw(all_resources_response.text).entry
]
)
pd.set_option("display.max_colwidth", 75)
display(resources_df)
pd.set_option("display.max_colwidth", None)
original_resources_df = resources_df # save for later
resource_id | resource_type | resource_json | |
---|---|---|---|
0 | 17e9d5ddf75-824e1c98-1484-4079-8330-63141202c23b | Patient | {"id": "17e9d5ddf75-824e1c98-1484-4079-8330-63141202c23b", "meta": {"la... |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | {"id": "17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720", "meta": {"ex... |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | {"id": "17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be", "meta": {"ex... |
3 | 17e9d5ddf76-2a1b470e-3808-442d-a1af-857b7532b740 | MedicationStatement | {"id": "17e9d5ddf76-2a1b470e-3808-442d-a1af-857b7532b740", "meta": {"ex... |
4 | 17e9d5ddf75-b3eb69d7-fd03-4caf-8fe6-d8e4f2a73b2c | DiagnosticReport | {"id": "17e9d5ddf75-b3eb69d7-fd03-4caf-8fe6-d8e4f2a73b2c", "meta": {"la... |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | {"id": "17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119", "meta": {"ex... |
6 | 17e9d5ddf76-f5a481e8-1c92-4179-bb4d-2c668cc9bd66 | Condition | {"id": "17e9d5ddf76-f5a481e8-1c92-4179-bb4d-2c668cc9bd66", "meta": {"ex... |
Retrieve Evidence for Derived Resources
This section describes how to identify derived resources, and how to determine what information was used to derive the resource.
Extension URLs
All insight related data is stored in FHIR extensions. These extensions are defined in the Alvearie Implementation Guide. The type of extension is defined by the URL for the extension.
summary_ext_url = "http://ibm.com/fhir/cdm/StructureDefinition/insight-summary"
category_ext_url = "http://ibm.com/fhir/cdm/StructureDefinition/category"
insight_id_ext_url = "http://ibm.com/fhir/cdm/StructureDefinition/insight-id"
insight_ext_url = "http://ibm.com/fhir/cdm/StructureDefinition/insight"
insight_detail_ext = "http://ibm.com/fhir/cdm/StructureDefinition/insight-detail"
insight_reference_ext = "http://ibm.com/fhir/cdm/StructureDefinition/reference"
insight_reference_path = "http://ibm.com/fhir/cdm/StructureDefinition/reference-path"
insight_result_ext = "http://ibm.com/fhir/cdm/StructureDefinition/insight-result"
insight_span_ext = "http://ibm.com/fhir/cdm/StructureDefinition/span"
insight_offset_begin_ext = "http://ibm.com/fhir/cdm/StructureDefinition/offset-begin"
insight_offset_end_ext = "http://ibm.com/fhir/cdm/StructureDefinition/offset-end"
Function to pretty print a data frame
Some dataframes have muliple lines of text in a column. This function prints those nicer for human readers.
from IPython.display import display, HTML
def print_df(df):
"""This function prints a dataframe
that has newline characters in a column a little nicer in a notebook"""
# https://stackoverflow.com/questions/50644066/pandas-dataframe-and-multi-line-values
return display(HTML(df.to_html().replace("\\n", "<br>")))
Function to get the code text
This function evaluates a FHIRPath expression against a resource to return the text associated with the code. We use this to provide a quick idea of what this resource is about.
def get_code_text(resource) -> str:
if txt := evaluate_fhir_path(
resource,
"Condition.code.text | "
"AllergyIntolerance.code.text | "
"MedicationStatement.medication.text",
):
return txt[0]
return np.NaN
Retrieve Derived Resources
When nlp-insights creates a derived resource, it adds an insight summary extension to the resource. The summary extension contains the insight id for the insight that created the resource. We need this ID to locate the details of the insight (The details are stored in the resource's meta element). The insight identifier's system and value will be used together to uniquely identify the insight.
Function Retrieve Insight Identifier from summary extension
This function evaluates a FHIRPath exrpression to compute the insight id's system and value in the summary extension.
def get_derived_resource_insight_id(resource):
"""returns a string value with 'system, value' for the insight id."""
expr_str = (
f"extension('{summary_ext_url}').where("
f" extension('{category_ext_url}').value.coding.code = 'natural-language-processing'"
f")"
f".extension('{insight_id_ext_url}').value.select(system + ',' + value)"
)
insights = evaluate_fhir_path(resource, expr_str)
return insights if insights else np.NaN
Construct Data Frame
This dataframe contains rows for derived resources. The insight identifier's system and value are included as columns. We'll use this information to reteive the evidence for the insight that caused the resource to be derived.
In addition, the acd in the identifier's system URI tells us that these resources were derived using ACD.
resources_df["text"] = resources_df.loc[:, "resource_json"].apply(get_code_text)
resources_df["derived_by_insight"] = resources_df.loc[:, "resource_json"].apply(
get_derived_resource_insight_id
)
resources_df = resources_df.explode("derived_by_insight")
resources_df[["insight_id_system", "insight_id_value"]] = resources_df[
"derived_by_insight"
].str.split(",", expand=True)
resources_df = resources_df.drop(labels=["derived_by_insight"], axis="columns")
resources_df.dropna(subset=["insight_id_system"], inplace=True)
print_df(
resources_df.loc[
:,
[
"resource_id",
"resource_type",
"text",
"insight_id_system",
"insight_id_value",
],
]
)
resource_id | resource_type | text | insight_id_system | insight_id_value | |
---|---|---|---|---|---|
3 | 17e9d5ddf76-2a1b470e-3808-442d-a1af-857b7532b740 | MedicationStatement | Losartan | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 2c3514d1168072dcf3bb4a5992c76e7c37e6d7ea98cac9c169d29d12 |
6 | 17e9d5ddf76-f5a481e8-1c92-4179-bb4d-2c668cc9bd66 | Condition | myocardial infarction | urn:alvearie.io/health_patterns/services/nlp_insights/acd | dc5541f39215bb39dd3619539d2655e172978ce61da98b0fa2206fe9 |
Retrieve source text that was used to derive resources
In this section, we will use the insight extension in the meta of the Resource to determine what was used to derive the resource.
Function to Retrieve Reference and Path
This function evaluates a FHIRPath expression to retrieve the resource containing the text that was used to derive this resource, and the path to that text. This information can be used to load the source text. These are refered to as the "reference" and "reference path" in alvearie.
def get_derived_from(resource, insight_id_system, insight_id_value):
"""Returns reference;path (separated by a semicolon)"""
# Reference and path are in the insight detail extension of the insight
# that we are interested in
expr_str = (
f"meta"
f".extension('{insight_ext_url}').where("
f" extension('{insight_id_ext_url}').value.where("
f" system = '{insight_id_system}' and "
f" value = '{insight_id_value}'"
f" ).exists()"
f")"
f".extension('{insight_detail_ext}')"
f".select("
f" extension('{insight_reference_ext}').value.reference + ';' "
f" + extension('{insight_reference_path}').value "
f")"
)
return evaluate_fhir_path(resource, expr_str)
Construct Data Frame
This builds a dataframe for each resource and includes the resource and path that the insight was derived from.
resources_df["from"] = resources_df.apply(
lambda row: get_derived_from(
row["resource_json"], row["insight_id_system"], row["insight_id_value"]
),
axis=1,
)
resources_df = resources_df.explode("from")
resources_df[["derived_from_resource", "derived_from_path"]] = resources_df[
"from"
].str.split(";", expand=True)
resources_df.drop(labels=["from"], axis="columns", inplace=True)
print_df(
resources_df.loc[
:,
[
"resource_id",
"resource_type",
"text",
"derived_from_resource",
"derived_from_path",
],
]
)
resource_id | resource_type | text | derived_from_resource | derived_from_path | |
---|---|---|---|---|---|
3 | 17e9d5ddf76-2a1b470e-3808-442d-a1af-857b7532b740 | MedicationStatement | Losartan | DiagnosticReport/17e9d5ddf75-b3eb69d7-fd03-4caf-8fe6-d8e4f2a73b2c | DiagnosticReport.presentedForm[0].data |
6 | 17e9d5ddf76-f5a481e8-1c92-4179-bb4d-2c668cc9bd66 | Condition | myocardial infarction | DiagnosticReport/17e9d5ddf75-b3eb69d7-fd03-4caf-8fe6-d8e4f2a73b2c | DiagnosticReport.presentedForm[0].data |
Retrieve source text
In this example, both resources were derived from the same source text in the diagnostic report. The source resource can be easily retrieved from the FHIR server, and the path expression evaluated to get the text.
source_resource, source_path = (
resources_df.loc[:, ["derived_from_resource", "derived_from_path"]]
.drop_duplicates()
.iloc[0]
)
def get_source_text(resource_loc, text_path):
"""Retrieve the resource from the FHIR server and resolve the path to the text"""
source_resource_fhir = requests.get(
f"{fhir_server}/{resource_loc}",
headers={"Accept": "application/fhir+json"},
verify=False,
)
source_resource_fhir.raise_for_status()
return evaluate_fhir_path(source_resource_fhir.text, source_path)[0]
get_source_text(source_resource, source_path)
'The patient had a myocardial infarction in 2015 and was prescribed Losartan.The patient is taking Losartan exactly as prescribed and has had no side effects.'
Retrieve spans
Clinical notes are usually longer than a few sentences. It is helpful to know which words and phrases in the text caused an insight to be derived. This section shows how to retrieve the spans associated with the insight for a derived resource.
Function to retrieve spans
This function retrieves spans for a specific reference & path within an insight. The spans are returned as a list of (start-offset, end-offset) string values.
def get_spans(resource, insight_id_system, insight_id_value, reference, path):
# spans are within
# -> Insight (must match expected system and id)
# -> insight detail (must match reference & path)
# -> insight result
# -> span (may repeat)
expr_str = (
f"meta"
f".extension('{insight_ext_url}').where("
f" extension('{insight_id_ext_url}').value.where("
f" system = '{insight_id_system}' and value = '{insight_id_value}'"
f" ).exists()"
f" )"
f".extension('{insight_detail_ext}').where("
f" extension('{insight_reference_ext}').value.reference = '{reference}' and "
f" extension('{insight_reference_path}').value = '{path}'"
f")"
f".extension('{insight_result_ext}')"
f".extension('{insight_span_ext}')"
f".select("
f" extension('{insight_offset_begin_ext}').value.toString() + ',' +"
f" extension('{insight_offset_end_ext}').value.toString() "
f" )"
)
return evaluate_fhir_path(resource, expr_str)
Construct Data Frame
There will be multiple rows for some insights/resources in this data frame, because there are mutliple spans that caused the resource to be derived.
resources_df["spans"] = resources_df.apply(
lambda row: get_spans(
row["resource_json"],
row["insight_id_system"],
row["insight_id_value"],
row["derived_from_resource"],
row["derived_from_path"],
),
axis=1,
)
resources_df = resources_df.explode("spans")
resources_df[["span_begin", "span_end"]] = resources_df["spans"].str.split(
",", expand=True
)
resources_df.drop(labels=["spans"], axis="columns", inplace=True)
resources_df.loc[
:,
[
"resource_id",
"resource_type",
"text",
"derived_from_resource",
"derived_from_path",
"span_begin",
"span_end",
],
]
resource_id | resource_type | text | derived_from_resource | derived_from_path | span_begin | span_end | |
---|---|---|---|---|---|---|---|
3 | 17e9d5ddf76-2a1b470e-3808-442d-a1af-857b7532b740 | MedicationStatement | Losartan | DiagnosticReport/17e9d5ddf75-b3eb69d7-fd03-4caf-8fe6-d8e4f2a73b2c | DiagnosticReport.presentedForm[0].data | 67 | 75 |
3 | 17e9d5ddf76-2a1b470e-3808-442d-a1af-857b7532b740 | MedicationStatement | Losartan | DiagnosticReport/17e9d5ddf75-b3eb69d7-fd03-4caf-8fe6-d8e4f2a73b2c | DiagnosticReport.presentedForm[0].data | 98 | 106 |
6 | 17e9d5ddf76-f5a481e8-1c92-4179-bb4d-2c668cc9bd66 | Condition | myocardial infarction | DiagnosticReport/17e9d5ddf75-b3eb69d7-fd03-4caf-8fe6-d8e4f2a73b2c | DiagnosticReport.presentedForm[0].data | 18 | 39 |
Display source text with spans highlighted
Once the previous dataframe has been created, it's not hard to group by the source resource and path, and display the text from that location the spans highlighted. In this example, the spans related to medication statements are in bold, and spans related to conditions are in italics.
This type of processing is important for an application that needs to present the information that was derived from some text to a user.
def group_of_spans_to_html(group_rows):
"""Custom aggregate of the a data frame with "resource_type", "span_begin" and "span_end"
columns. The group name is a tuple of (derived_resource_location, derived_from_text_path)
"""
source_text = get_source_text(group_rows.name[0], group_rows.name[1])
markup_points = []
# Points is a series of (resource_type, begin or end, offset) tuples
# sorted in offset ascending order
points = (
resources_df.apply(
lambda row: [
(row["resource_type"], "begin", int(row["span_begin"])),
(row["resource_type"], "end", int(row["span_end"])),
],
axis=1,
)
.explode()
.sort_values(key=lambda series: [e[2] for e in series], ascending=True)
)
# tags is used to to figure out what type of HTML to insert at a given point
tags = {
"Condition": {"begin": '<I><span style="color: green">', "end": "</span></I>"},
"MedicationStatement": {
"begin": '<B><span style="color: blue">',
"end": "</span></B>",
},
}
# build the result string
result = []
cur_end = 0
for pt in points:
result.append(source_text[cur_end : pt[2]])
result.append(tags[pt[0]][pt[1]])
cur_end = pt[2]
result.append(source_text[cur_end:])
return "".join(result)
sources = resources_df.groupby(by=["derived_from_resource", "derived_from_path"]).apply(
group_of_spans_to_html
)
sources = sources.to_frame().reset_index().rename(columns={0: "text"})
display(HTML(pd.DataFrame(sources).to_html(escape=False)))
derived_from_resource | derived_from_path | text | |
---|---|---|---|
0 | DiagnosticReport/17e9d5ddf75-b3eb69d7-fd03-4caf-8fe6-d8e4f2a73b2c | DiagnosticReport.presentedForm[0].data | The patient had a myocardial infarction in 2015 and was prescribed Losartan.The patient is taking Losartan exactly as prescribed and has had no side effects. |
Retrieve Evidence for enriched resources
When nlp-insights derives an additional code for a resource's codings, it adds a summary extension to the code element. We'll use this extension to find the derived codes, and the evidence for those codes
# reset datafram to all resources for the patient
resources_df = original_resources_df
resources_df["text"] = resources_df.loc[:, "resource_json"].apply(get_code_text)
Define a function to retrieve codes
This FHIRPath expression returns all the codes on a resource, derived or not. The result is a list of "system,code" strings.
def get_all_codes(resource):
# returns system,code for code
expr_str = f"Condition.code.coding.select(system + ',' + code) | AllergyIntolerance.code.coding.select(system + ',' + code)"
return evaluate_fhir_path(resource, expr_str)
Define a function to retrieve the summary extension for a code
This uses a FHIRPath expression to look for the insight summary extension on a code, and retrieve a string "insight-id-system,insight-id-value"
def get_summary_extension_for_code(resource, code_system, code_value):
expr_str = (
f"(Condition | AllergyIntolerance).code.coding.where("
f" system = '{code_system}' and code = '{code_value}'"
f")"
f".extension('{summary_ext_url}').where("
f" extension('{category_ext_url}').value.coding.code = 'natural-language-processing'"
f" )"
f".extension('{insight_id_ext_url}').value.select(system + ',' + value)"
)
return evaluate_fhir_path(resource, expr_str)
Construct a data frame of Derived codes
This code constructs a dataframe that contains columns with the insight id system and insight id value...we can use this information to determine where the code was derived from.
The acd in the insight id system tells us that ACD was used to derive the code.
# Dataframe for All codes
resources_df["code"] = resources_df.apply(
lambda row: get_all_codes(row["resource_json"]), axis=1
)
resources_df = resources_df.explode("code")
resources_df.dropna(subset=["code"], inplace=True)
resources_df[["code_system", "code_value"]] = resources_df["code"].str.split(
",", expand=True
)
resources_df.drop(labels=["code"], axis="columns", inplace=True)
# Filter to only include codes with associated insights
resources_df["summary"] = resources_df.apply(
lambda row: get_summary_extension_for_code(
row["resource_json"], row["code_system"], row["code_value"]
),
axis=1,
)
resources_df = resources_df.explode("summary")
resources_df.dropna(subset=["summary"], inplace=True)
resources_df[["insight_id_system", "insight_id_value"]] = resources_df[
"summary"
].str.split(",", expand=True)
resources_df.loc[
:,
[
"resource_id",
"resource_type",
"text",
"code_system",
"code_value",
"insight_id_system",
"insight_id_value",
],
]
resource_id | resource_type | text | code_system | code_value | insight_id_system | insight_id_value | |
---|---|---|---|---|---|---|---|
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://terminology.hl7.org/CodeSystem/umls | C0559470 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 31d8f5eaf30190bba2cab9a18c95306901c197f3e14a6fc68f9dc276 |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://snomed.info/sct | 91935009 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 31d8f5eaf30190bba2cab9a18c95306901c197f3e14a6fc68f9dc276 |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-9-cm | 995.3 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 31d8f5eaf30190bba2cab9a18c95306901c197f3e14a6fc68f9dc276 |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-10-cm | Z91.010 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 31d8f5eaf30190bba2cab9a18c95306901c197f3e14a6fc68f9dc276 |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-10-cm | Z91.0 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 31d8f5eaf30190bba2cab9a18c95306901c197f3e14a6fc68f9dc276 |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://terminology.hl7.org/CodeSystem/umls | C0571417 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 6134e9e926a775975004b69d42e225f467b50690aa4992d98922715e |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://snomed.info/sct | 294505008 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 6134e9e926a775975004b69d42e225f467b50690aa4992d98922715e |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-9-cm | E930.0 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 6134e9e926a775975004b69d42e225f467b50690aa4992d98922715e |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-9-cm | 995.27 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 6134e9e926a775975004b69d42e225f467b50690aa4992d98922715e |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-10-cm | Z88.0 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | 6134e9e926a775975004b69d42e225f467b50690aa4992d98922715e |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://terminology.hl7.org/CodeSystem/umls | C0011849 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | ba6da20b9ef2b1b4ed2eb1b064fb96a00126b1aacc8e9a5519df1050 |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://snomed.info/sct | 73211009 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | ba6da20b9ef2b1b4ed2eb1b064fb96a00126b1aacc8e9a5519df1050 |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://hl7.org/fhir/sid/icd-9-cm | 250.00 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | ba6da20b9ef2b1b4ed2eb1b064fb96a00126b1aacc8e9a5519df1050 |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://hl7.org/fhir/sid/icd-10-cm | E14.9 | urn:alvearie.io/health_patterns/services/nlp_insights/acd | ba6da20b9ef2b1b4ed2eb1b064fb96a00126b1aacc8e9a5519df1050 |
Determine how the code was derived
The insight extension in the meta contains the information about where the code was derived from. In the case of enrichment, this information is pretty simple. The code is always derived from the text associated with the enclosing code structure. The reference resource is always the same resource as the one being enriched. However these facts are explicitly stated in the insight extension.
We can verify this with the get_derived_from method that we created earlier for the derived resources example.
Construct a Data Frame with reference resource and path
resources_df["from"] = resources_df.apply(
lambda row: get_derived_from(
row["resource_json"], row["insight_id_system"], row["insight_id_value"]
),
axis=1,
)
resources_df = resources_df.explode("from")
resources_df[["derived_from_resource", "derived_from_path"]] = resources_df[
"from"
].str.split(";", expand=True)
resources_df.drop(labels=["from"], axis="columns", inplace=True)
resources_df.loc[
:,
[
"resource_id",
"resource_type",
"text",
"code_system",
"code_value",
"derived_from_resource",
"derived_from_path",
],
]
resource_id | resource_type | text | code_system | code_value | derived_from_resource | derived_from_path | |
---|---|---|---|---|---|---|---|
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://terminology.hl7.org/CodeSystem/umls | C0559470 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://snomed.info/sct | 91935009 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-9-cm | 995.3 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-10-cm | Z91.010 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-10-cm | Z91.0 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://terminology.hl7.org/CodeSystem/umls | C0571417 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://snomed.info/sct | 294505008 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-9-cm | E930.0 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-9-cm | 995.27 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-10-cm | Z88.0 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://terminology.hl7.org/CodeSystem/umls | C0011849 | Condition/17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition.code.text |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://snomed.info/sct | 73211009 | Condition/17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition.code.text |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://hl7.org/fhir/sid/icd-9-cm | 250.00 | Condition/17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition.code.text |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://hl7.org/fhir/sid/icd-10-cm | E14.9 | Condition/17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition.code.text |
Retrieve source text
We can use the path from the previous data frame to retrieve the source text. While we could use the source resource's id to retrieve the source resource from the FHIR server, we know that this is always going to be the same resource as the enriched one, so for simplicity and performance we won't retrieve the resource (although we could if we wanted to).
Finding the source text is then a simple matter of evaluating the derived_from_path in the previous data frame against the enriched resource.
resources_df["source_text"] = resources_df.apply(
lambda row: evaluate_fhir_path(row["resource_json"], row["derived_from_path"]),
axis=1,
)
resources_df = resources_df.explode("source_text")
resources_df.loc[
:,
[
"resource_id",
"resource_type",
"text",
"code_system",
"code_value",
"derived_from_resource",
"derived_from_path",
"source_text",
],
]
resource_id | resource_type | text | code_system | code_value | derived_from_resource | derived_from_path | source_text | |
---|---|---|---|---|---|---|---|---|
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://terminology.hl7.org/CodeSystem/umls | C0559470 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text | peanut |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://snomed.info/sct | 91935009 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text | peanut |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-9-cm | 995.3 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text | peanut |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-10-cm | Z91.010 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text | peanut |
1 | 17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance | peanut | http://hl7.org/fhir/sid/icd-10-cm | Z91.0 | AllergyIntolerance/17e9d5ddf75-bc72c640-739c-4c48-b26f-774fa5446720 | AllergyIntolerance.code.text | peanut |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://terminology.hl7.org/CodeSystem/umls | C0571417 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text | amoxicillin |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://snomed.info/sct | 294505008 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text | amoxicillin |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-9-cm | E930.0 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text | amoxicillin |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-9-cm | 995.27 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text | amoxicillin |
2 | 17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance | amoxicillin | http://hl7.org/fhir/sid/icd-10-cm | Z88.0 | AllergyIntolerance/17e9d5ddf76-7293ecc8-8b6a-497a-881c-b6eeed1250be | AllergyIntolerance.code.text | amoxicillin |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://terminology.hl7.org/CodeSystem/umls | C0011849 | Condition/17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition.code.text | diabetes |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://snomed.info/sct | 73211009 | Condition/17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition.code.text | diabetes |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://hl7.org/fhir/sid/icd-9-cm | 250.00 | Condition/17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition.code.text | diabetes |
5 | 17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition | diabetes | http://hl7.org/fhir/sid/icd-10-cm | E14.9 | Condition/17e9d5ddf75-51cfac33-b53e-4b65-95e3-781549aab119 | Condition.code.text | diabetes |
Spans
For enrichment, there are no span extensions stored in the insight detail extensions.
That's because the entire text is used for the insight, along with the context of the text. (For example 'peanut' in an AllergyIntolerance should result in peanut allergy codes, not a code for a plant or food.)
These text fragments contain only information related to the derived codes and the context is defined by the resource. As a result span and confidence values are not interesting.
Summary
In this tutorial we:
- Created a bundle
- Enriched the bundle using the nlp-insights service
- Posted the resources to a FHIR Server
- Retrieved resources derived by NLP and determined what caused them to be derived
- Retrieved derived codes for enriched resources and determined what caused them to be derived