Predicting Patterns for Breast Cancer Risk: How AI Maps Chemical Exposures and Health
Dr. Dimitri Abrahamson is a scientist at UCSF, working alongside Dr. Kimberly Badal in her ongoing chemical mixtures study. As a computational chemist using virtual laboratories, in addition to test tubes, Dr. Abrahamson’s expertise is in the connection of technology, medicine, and chemistry. Camille Sytko (Science Communications Intern) and Lianna Hartmour (Zero Breast Cancer Program and Communications Director) at the Collaborative for Health & Environment (CHE) had the opportunity to interview him to better understand the important role machine learning plays in the study, and what benefits machine learning may continue to bring to the field of environmental health.
What is machine learning and how is it used in this work?
Machine learning is a type of artificial intelligence (AI) that teaches a computer to learn from experience, rather than giving it a strict list of instructions. It uses math to find meaningful patterns in large, complicated datasets. Machine learning is a form of AI that behaves like an extremely powerful calculator that can process thousands of millions of data points. Popularized tools such as ChatGPT and Google Gemini are a different type of AI that mimic how humans speak.
In the chemical mixtures study, the researchers will measure the amounts of thousands of chemicals present in the blood of women who did not get breast cancer and compare it to those who did get breast cancer. They will use a laboratory method called non-targeted analysis, which searches for every chemical present in a blood sample rather than just a specific few. It will identify millions of chemicals within one sample, resulting in a large and complex dataset. As compared to traditional environmental health studies, which often examine the toxicity of one chemical at a time, the non-targeted approach provides a more complete picture of the wide array of chemicals people are exposed to. This complicated dataset is not possible to analyze with traditional methods that rely on a human brain, which creates the need for machine learning tools.
The researchers will use a machine learning model to help them find chemical exposures in the dataset that increase the risk of developing breast cancer. The study also explores whether a mixture of chemicals might be more harmful than exposure to a single chemical. Dr. Abrahamson explained “these machine learning models enable us to find what the rules are for predicting the right outcome, just by using the data, without having to type out the rules one by one.”
Dr. Abrahamson says the study aims to use pattern recognition to answer the question, “Can we use these super large datasets and machine learning to predict whether a person will develop breast cancer?” Hopefully, the patterns in this study will reveal relationships between the chemicals detected in a patient’s blood and whether they will eventually develop breast cancer.
Implications for the future
A key outcome of this study will be the machine learning model itself, which, if all goes well, will be used to make accurate predictions about a person’s future cancer risk based on the chemical mixtures present in their blood along with other traditional risk factors. Eventually, the model could be used in clinics to inform patients of the kinds of environmental exposures that may be contributing to their cancer risk. This knowledge could improve preventative care, promote lifestyle changes where people avoid certain chemicals, and ideally lead to policy changes where harmful chemicals are banned for everyone.
When asked how he sees environmental health research changing now that these tools are developing, Dr. Abrahamson explained that as machine learning is becoming more accurate, the technologies needed to collect environmental exposure data from people's blood (like in this study), homes, food, air, and water are also advancing.
Additionally, some chemicals have cause and effect relationships that are not consistent. While some chemical exposures are linked to health harms at a very low level, there might be a tipping point (or threshold) where a chemical seems safe at low levels but becomes harmful once a certain amount is reached. Chemicals mixed together might also be more harmful than single chemicals alone. Dr. Abrahamson thinks that “where machine learning models will be super useful is when they are able to identify the rules or patterns that might not be as simple as we might have thought.”
As Dr. Abrahamson and Dr. Badal continue their work, the goal remains clear: turn massive amounts of data into actionable health protections. This intersection of computer science and chemistry isn't just about "big data” – it’s about building a future where we can predict, and ultimately prevent, environmental health risks before they start.
--
Written by Camille Sytko, CHE Science Communications Intern, and Lianna Hartmour, MA, NBC, HWC, Zero Breast Cancer Communications and Program Director, CHE