Algorithms Can Replicate or Remedy Racial Biases in Healthcare Resource Allocation

A healthcare algorithm trained on cost data to predict patients’ health risk score were found to demonstrate algorithmic bias in underrating the severity of Black patients’ health conditions relative to their white counterparts, leading to under-provision of health care to Black patients.

Reviewed by Penny Sun

Introduction

Obermeyer et al. note both the growing attention to potential racial and gender biases within algorithms and the difficulty of obtaining access to real world algorithms – including the raw data used to design and train them – in order to understand how and why bias could appear in them. This study is important because it has obtained access to the inputs, outputs, and real world outcomes of a health care algorithm that performs a widely used function within the healthcare sector. Further, it is widely representative of the type of logic used by algorithms in other social sectors. In particular, this algorithm identifies which patients to recommend for a care management program where they will receive additional resources. The algorithm simplifies this task into identifying the patients with the greatest care needs, patients at the top 3 percentile of need automatically qualifying for entry into the program, and those at the top 45 percentile of need assessed for entry by their primary care physician.

As prominent researchers at the nexus of machine learning and health, the researchers were able to convince the manufacturer of this algorithm, a leader in the field, to consider changing its algorithm, with the hope that this may change the norms within the entire sector.

Methods and Findings

Obermeyer et al. collected input data of all primary care patients enrolled in risk-based contracts at a large academic hospital from 2013-2015. They defined the population of Black and white, non-Hispanic patients based on patient self-identification. The researchers also used outcomes from electronic health records to assess patients’ health needs and insurance claims to assess patients’ costs including: all diagnoses, key quantitative laboratory studies, vital signs, utilization, outpatient visits, emergency visits, hospitalizations, and health care costs.

Primary findings:

  • Based on the number of comorbid conditions and severity of markers of chronic disease, Black patients with the same level of predicted risk as white patients, according to the algorithm, were substantially sicker than their white counterparts. Thus, if the algorithm identified high risk patients solely on health needs, significantly more Black patients should have been included in the care management program. 
  • Previous health care costs were the driving force behind Black patients’ lower entry to the care management program. For unknown reasons, the gap between care needed and care received among Black patients was significantly larger. Thus, a seemingly neutral factor of previous health care spending could turn into a racially biased one, due to external social conditions. This demonstrates the difficulty of “problem formulation” in data science – how to turn complex, interactive, and vague social conditions into a concrete, measurable variable in a dataset.
  • The study explores other ways to define the input variables that the algorithm considers when distinguishing patients’ health risks, including considering only total costs, avoidable costs (based on emergency visits and hospitalizations), and health needs (based on number of chronic conditions). These three options all do fairly well in predicting patient outcomes, but considering health needs results in almost twice the number of Black patients that make it into the highest risk group compared to considering only total costs. 
  • Doctor judgment can marginally increase the number of Black patients that make it into the care management program compared to the cost-based algorithm. But, an algorithm that is adjusted to take health needs into consideration is even better than doctors at identifying high-risk Black patients. Thus, an improved algorithm has greater potential for improving the ratio of Black patients who are rated as “high-risk” than relying on individual doctors’ judgement.  
  • The manufacturer of this algorithm independently confirmed Obermeyer et al.’s findings. Working together, the researchers and manufacturer demonstrated that adjusting the algorithm to take health needs into account reduces racial bias in outcomes by 84%.

Conclusions

Obermeyer et al. recommend caution in identifying and defining what measures algorithms are trained to look for to ensure that the algorithm collects truly relevant input and output data for health outcomes. Although this seems difficult and costly, the private sector has clearly demonstrated that it is possible, and technically the additional labor is merely in validating the conceptual relationships within the algorithm rather than the statistical technique it relies on. With the high stakes involved in the health and social sectors, this kind of investment is necessary, and can yield great results while minimizing harm. 

With the start of a partnership with the manufacturer of this algorithm, Obermeyer et al. hope to find solutions to this type of error together, and with the combined leadership in academia and industry, they hope that their findings could change the norms used to create algorithms within the healthcare sector and wider social sectors.

Topics

Thank you for visiting RRAPP

Please help us improve the site by answering three short questions.