Accessing Data for Clinical Research: The Boston Medical Center Clinical Data Warehouse (CDW)

April 2019 Issue

Author(s):

  • Linda Rosen, MSEE
    Research Manager, Clinical Data Warehouse, Boston Medical Center

  • Melissa Hofman, MSIS
    Clinical Data Warehouse, Boston Medical Center

PRINT

  • Introduction
  • Data in the Clinical Data Warehouse
  • The Clinical Data Warehouse Research Team
  • Types of Data Sets Available
  • Applying to use data in the Clinical Data Warehouse
  • IRB Review Process
  • Characteristics of an effective data request
  • How to submit a data request
  • Summary

 

Introduction

Information that is collected for clinical purposes plays a key role in human subjects research:  from underpinning the design of clinical trials, to identifying potential subjects for recruitment, to elucidating important relationships between patient characteristics and clinical outcomes.  In this CR TIMES article, we focus on how to gain access to the electronic health record information from patients at Boston Medical Center (BMC). 

To facilitate the analysis of electronic health information, BMC has brought together those data into a data warehouse—a repository of historical data organized for reporting and analysis. A data warehouse facilitates data access by having data from many sources in one place, linked together, and searchable.  The regulatory requirements ensure that the data are protected from misuse or accidental disclosure while still allowing access for important research purposes. 

Created in 2005, the BMC CDW collects data spread throughout the many BMC hospital systems into a consolidated, organized and accessible database for analysis, reporting, and research purposes. Researchers can access data from clinical electronic records, EPIC, and the legacy systems (inpatient, outpatient, billing, ED, surgery, etc.) with the help of the CDW Research Team. There is a charge for using the CDW for research purposes.  Click on this link to learn more about the charges. The CDW enables data from one hospital system to be cross-referenced with data from others; facilitates searches of electronic information that would previously have been a prohibitively time-consuming (and manual) endeavor; permits counting of populations meeting specified criteria; enables the development of automated reports; and much more.

 

Data in the Clinical Data Warehouse

Data from BMC’s electronic medical record are collected and reorganized into the CDW—a distinct, comprehensive database—on a daily or weekly schedule, making the data well-suited for retrospective and even prospective research.  The data warehouse is not real-time, so it cannot be used for real-time decision making.

Clinical data from EPIC and the BMC legacy systems include the following information:

  • Patient demographics, including patient and provider contact data

  • Visit diagnosis (ICD-9 or ICD-10 codes), procedure (CPT codes) information related to a visit, visit type and location, admission and discharge date/time, reason and insurance data

  • Past and future appointment information

  • Problem lists, flowsheet and observation entries

  • Labs, immunizations, allergies, orders and medications from inpatient and outpatient settings

  • Tumor Registry and Surgical Data.

 

The Clinical Data Warehouse Research Team

Linda Rosen and Melissa Hofman are available to compile data sets for researchers.  As the liaisons to the CDW, they can help researchers discover ways to relate data and to get information that may not be as simple as date, diagnosis, or lab value.

Because the CDW contains information from a variety of BMC’s systems, data can be cross-referenced among those different systems; and searching for information is more efficient.  However, one must understand the relationship of the thousands of database tables, know how and where information is stored in order to retrieve useful information, and be knowledgeable in how to query a large database.   Individual researchers cannot access the database on their own, but the CDW Research Team is dedicated to helping clinical researchers find the data in which they are interested (and for which they have authorization).

 

Types of Data Sets Available

The CDW contains information in a relational database.  The term relational database refers to how data in a database are arranged. A relational database is a collection of relations (frequently called tables). Data from one set of tables can be “related” to data in another set of tables using a unique identifier like a medical record number and a date, so that information (like the results of a patient’s pathology report) can be linked to a surgical procedure and procedure date.

The variety and scope of the data accessible to clinical researchers is limited only by their imaginations and what is captured in the CDW. While much of the data from EPIC is represented in the CDW, not all of the data are trivial to extract. Categories of data requests include counts of data:   anonymous or de-identified, identified or identifiable data.  The CDW can set up recurring reports that automatically update.

Below are a few recent requests from clinical researchers:

Example Case I – simple counts

How many patients were admitted to the ICU in CY 2015 with sepsis (ICD-10 code A41.9)? What was the average length of stay? What proportion of these admissions ended in death?

Example Case II – de-identified/anonymous data

For patients referred by a pediatrician, what is the number of missed appointments at the subspecialty clinic prior to the first kept appointment? What is the number of rescheduled appointments prior to the first kept appointment?

Example Case III – identifiable data

For the provided set of medical record numbers and hospital admission dates, what was the value and date of the most recent WBC, C-Reactive Protein, and pre-albumin before the admission date?

Example Case IV – recurring report

For the list of Primary Care Physicians provided, create a recruitment report of the English-speaking patients between 50 and 75 years old (PCP contact info, patient contact info, insurance info, patient demographics) who have appointments in the next month, have not had a colonoscopy, fecal occult blood test, or flexible sigmoidoscopy and have no family history of colon cancer.

 

Applying to use data in the Clinical Data Warehouse

The CDW takes very seriously the obligation to safeguard patients’ protected health information. The proposed use of the CDW data must comply with HIPAA and human subjects regulations, and go through appropriate IRB review relevant to retrieving that data.   Researchers receiving any information that could potentially identify patients must have robust systems in place that comply with institutional data security standards (see link) for protecting the data from misuse and accidental release.

The researcher must be able to answer the following questions: 

  1. Do I need to have data that could identify patients?
  2. Will I potentially require a follow-up data query to find additional information after a data set is created (meaning that the data cannot be recorded anonymously and are thus identifiable)?
  3. Does my data request restrict a count to a very small number?
All CDW studies except for some of those asking for simple counts (see below) require submission of a proposal for review to the IRB.

 

IRB Review Process

The IRB review process depends on how you will be using the CDW data, in the following three major categories. It is important to read the questions in INSPIR carefully and provide all requested information to help speed up the IRB review process. 

Data Counts: no IRB submission or review

If the data you are requesting from the Clinical Data Warehouse is a count (e.g., the number of hospitalized patients with pneumonia in 2018), then simply fill out a request at this link and it will be submitted directly to the CDW Research Team for review.  As long as the criteria for the count are not so specific as to identify patients, then you are not receiving identifiable private information. Thus, an application to the IRB is not required. However, counts fewer than 6 will not be reported because they risk identification (e.g., the number of patients with both hepatitis C and rabies seen in a 6-month period in the pediatric clinics). If this situation arises and actual counts are desired, the request will then need IRB review and perhaps a waiver of HIPAA authorization.

Chart Reviews: IRB submission and review

If your study consists of analyzing clinical data on individuals, whether or not they are identified, then you must submit an INSPIR application to the IRB. INSPIR has a separate application pathway for chart reviews. Depending on whether your study is externally funded and whether the data are identifiable from the HIPAA and human subjects definitions, then your chart review may receive a determination of Not Human Subjects Research, an exempt determination, or approval.

Clinical Data for Subject Recruitment: IRB submission and review

If you need CDW data to identify potential subjects as part of a larger study, then use the INSPIR pathway appropriate to the larger study, not the chart review pathway. Be sure to describe your requested use of CDW data in the relevant sections of the INSPIR application (Recruitment, HIPAA). Again, the IRB will determine the proper level of review.

 

Characteristics of an effective data request

It is a common experience among researchers that formulating a data request reveals intricacies of complex data that can affect their research. The CDW team highly recommends that you discuss your project with them before submitting your proposal to the IRB in order to create a more informed data request, to assure a better research project, and to avoid multiple iterative requests.  Consider the following questions (not an exhaustive list):

  • What is the question I am asking and what are the relevant data points needed that may impact the answer?
  • What data did I forget to include in my data request or not even consider?
  • How might I simplify the request?
  • What do I know as a clinician/researcher that might help the CDW team provide better, more nuanced data?
  • What do I not know as a consumer of clinical data that might help to more fully explore my research hypothesis or question?
  • How do I envision the data results for ease in my analysis or manipulation?

Formulating a precise data request often involves a consideration of “known unknowns”: consequential elements often overlooked by researchers unfamiliar with data science, but known to and addressable by the CDW team. Surfacing and addressing such elements is a key reason to consult with CDW team members before requesting data.

Typical turnaround between a data request and data delivery is variable, so researchers need to plan accordingly.  The CDW team uses the complexity and type of the request to determine the priority and expected delivery date of data.

Last-minute data requests will not be accommodated.

 

How to submit a data request

Request data from the CDW by filling out a request at the Office of Clinical Research Data Access web site at link after you have received the appropriate approvals required for the data you are requesting.  The CDW Research Team will review the request and verify that the study has appropriate approvals, and that the data being requested matches the data contained in the proposal documents.

Someone from the CDW Research team will contact you and may schedule a meeting to clarify data requests and discuss potential issues.

Please contact Ms. Rosen or Ms. Hofman prospectively to discuss your data needs.

 

Summary

The CDW contains rich clinical information from BMC that is maintained and organized to support human subjects research. The CDW Research Team is able to facilitate gathering these data as long as the researcher has defined the data needed and obtained the appropriate level of review and approval.