CASE STUDY

Efficient prediction of TCGA cancer subtypes using compact multi-omic signatures for personalized medicine

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets

Kyle Ellrott, Christopher K. Wong, Christina Yau, Mauro A. A. Castro, Jordan A. Lee, Brian J. Karlberg, Jasleen K. Grewal, Vincenzo Lagani, Bahar Tercan, Verena Friedl, Toshinori Hinoue, Vladislav Uzunangelov, Lindsay Westlake, Xavier Loinaz, Ina Felau, Peggy I. Wang, Anab Kemal, Samantha J. Caesar-Johnson, Ilya Shmulevich, Alexander J. Lazar, Ioannis Tsamardinos, Katherine A. Hoadley, The Cancer Genome Atlas Analysis Network, A. Gordon Robertson, Theo A. Knijnenburg, Christopher C. Benz, Joshua M. Stuart, Jean C. Zenklusen, Andrew D. Cherniack, Peter W. Laird,

Joint work with the Tumor Molecular Pathology (TMP) Analysis Working Group (AWG) of the US National Institute of Health (NIH) Center for Cancer Genomics (CCG)

Digital Library: https://doi.org/10.1016/j.ccell.2024.12.002

Summary

Molecular subtypes, such as defined by The Cancer Genome Atlas (TCGA), delineate a cancer’s underlying biology, bringing hope to inform a patient’s prognosis and treatment plan. However, most approaches used in the discovery of subtypes are not suitable for assigning subtype labels to new cancer specimens from other studies or clinical trials. Here, we address this barrier by applying five different machine learning approaches to multi-omic data from 8,791 TCGA tumor samples comprising 106 subtypes from 26 different cancer cohorts to build models based upon small numbers of features that can classify new samples into previously defined TCGA molecular subtypes—a step toward molecular subtype application in the clinic. We validate select classifiers using external datasets. Predictive performance and classifier-selected features yield insight into the different machine-learning approaches and genomic data platforms. For each cancer and data type we provide containerized versions of the top-performing models as a public resource.

Methodology

JADBio along with four other machine learning methods were used to train multi-omics classifiers on TCGA tumor samples. Data comprised of 5 different types: mRNA, MicroRNA, CNV, Mutation, and Methylation. The goal was to build models with as few biomarkers as possible for creating compact cancer testing panels and kits to clinically subtype non-TCGA patient tumor samples.

Results

Robust Cross-Platform Subtype Classification. The developed machine-learning models accurately classify non-TCGA cancer samples into TCGA-defined molecular subtypes across diverse tumor types and data platforms.

Minimal Feature Sets. Each classifier relies on a small, optimized set of omic features—genes, methylation sites, etc.—demonstrating that high accuracy doesn’t require large datasets, making practical panel construction feasible. Specifically, between 70 to 150 samples are required for cancer subtype classification.

High Predictive Performance. Models achieved strong precision, recall, and F1‑scores for most subtypes tested. While some very rare subtypes with minimal training samples were excluded, performance remained robust for the majority.

Resource for Panel Development. The resulting feature sets are explicitly proposed as a foundation for designing compact clinical panels or kits aimed at classifying tumor subtypes—bridging the gap between genomic profiling and personalized oncology.

Conclusions

Small sets of carefully selected multi-omic features can accurately predict cancer subtypes defined by TCGA. These compact models work well across different datasets and platforms, making them a practical public resource for use in clinical settings. This approach can help create efficient diagnostic panels that support personalized cancer treatment by linking molecular subtype information to therapeutic decisions.

How did JADBio perform?

JADBio showed strong predictive performance in 96% of cases (25/26)

Had the highest performance in 16 out of 26 (62%) of cancer types
Had tied best performance in 1 cancer type (Prostate, PRAD)
Performed within 1 standard deviation from the best in 30% (8/26) of the remaining ones.

JADBio’s feature selection algorithms select the fewest biomarkers in 85% of cases (22/26)

Rapid panel creation, no specialized expertise required.
Unique and unbiased identification of multiple equally predictive subsets with high biological relevance.
Non-redundant biomarker selection that improves classification using streamlined, more parsimonious biomarker sets, surpassing traditional feature ranking methods

Methods that provided the best predictive model in each of the 26 cancer types from the TCGA cohorts

Thymoma analysis results

OTHER

Do you have questions?

JADBio can meet your needs. Ask one of our experts for an interactive demo.

Stay connected to get our news first!

REQUEST A DEMO

STAY IN TOUCH

Do you have questions?

JADBio can meet your needs. Ask one of our experts for an interactive demo.

REQUEST A DEMO

Join the JADai Community!

Name	Domain	Purpose	Expiry	Type
wpl_user_preference	jadbio.com	WP GDPR Cookie Consent Preferences	1 year	HTTP
__stripe_mid	app.jadbio.com	For processing payment and to aid in fraud detection.	1 year	HTTP
__stripe_sid	app.jadbio.com	Stripe Cookie to process payments	Session	HTTP

Name	Domain	Purpose	Expiry	Type
_ga	jadbio.com	Google Universal Analytics long-time unique user tracking identifier.	2 years	HTTP
_gid	jadbio.com	Google Universal Analytics short-time unique user tracking identifier.	1 days	HTTP
IDE	doubleclick.net	Google advertising cookie used for user tracking and ad targeting purposes.	2 years	HTTP

Name	Domain	Purpose	Expiry	Type
sp_t	spotify.com	---	1 year	---
sp_landing	spotify.com	---	1 days	---
muxData	open.spotify.com	---	20 years	---
_gcl_au	jadbio.com	---	3 months	---
_gat_UA-150261121-1	jadbio.com	---	Session	---
test_cookie	doubleclick.net	A generic test cookie set by a wide range of web platforms.	Session	HTTP
_lfa	jadbio.com	---	2 years	---
drift_campaign_refresh	jadbio.com	---	Session	---
m	m.stripe.com	---	2 years	---

CASE STUDY

Efficient prediction of TCGA cancer subtypes using compact multi-omic signatures for personalized medicine

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets

Summary

Methodology

Results

Conclusions

How did JADBio perform?

OTHER

CASE STUDIES

Do you have questions?

Do you have questions?

Join the JADai Community!

US

GREECE

QUICK LINKS

FOLLOW US

CONTACT

LEGAL

CASE STUDY

Efficient prediction of TCGA cancer subtypes using compact multi-omic signatures for personalized medicine

Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets

Summary

Methodology

Results

Conclusions

How did JADBio perform?

OTHER

CASE STUDIES

Do you have questions?

Do you have questions?

Join the JADai Community!

Sign up with aFREE Basic plan!

US

GREECE

QUICK LINKS

FOLLOW US

CONTACT

LEGAL

Sign up with a
FREE Basic plan!