Renal AML Classification Project

Building upon my previous experience

Original writeup and relevant code can be found on [Github]

During the spring of 2022, I served as a lead researcher within the Informatics Skunkworks group at the University of Wisconsin - Madison. In this role, I led a team of undergraduate students to explore whether machine learning could accurately predict the growth patterns of renal angiomyolipomas (AMLs), a type of kidney tumor.

Research Objective & Data Engineering

The primary goal was to develop models that could either predict the specific growth rate of a tumor (regression) or categorize it into “high growth” versus “low growth” groups (classification).

  • Dataset Composition: The raw data included patient clinical info and extracted features, totaling 653 entries with 706 initial columns.

  • Pipeline Development: I developed a data transformation pipeline to clean the dataset. This involved removing medically irrelevant features, handling missing data, and performing one-hot encoding for categorical variables to prevent models from inferring an artificial order.

  • Feature Refinement: After preprocessing and removing features to prevent data leakage, the feature set was narrowed to 399 medically relevant columns.

Methodological Approach

My team and I implemented a dual-pronged analysis using several machine learning and deep learning frameworks:

Approach Models Utilized Metrics Tracked
Regression PyTorch Neural Networks (4-layer linear), XGBoost, and Gaussian Process Regressor (GPR). Mean Squared Error (MSE), R2, and Parity Plots.
Classification Support Vector Machines (SVM), XGBoost, and Neural Networks. Accuracy, F1 Score, and Geometric Mean (Gmean).
Benchmarking Naive Classifiers (Majority, Minority, Random, and Stratified). Comparison baseline for model effectiveness.

Analysis & Key Findings

Despite iterating through various architectures and hyperparameter tuning (using tools like GridSearchCV), the results highlighted the significant challenges of predicting biological growth with limited, noisy datasets.

  • Regression Hurdles: Regression models consistently underperformed, often yielding negative R2 values (e.g., GPR at −1.248 and XGBoost at −1.249), indicating that the models were not capturing the underlying growth trends effectively.

  • Classification Limits: The trained classification models failed to consistently outperform naive classifiers. For instance, while an SVM achieved an accuracy of 0.75, a simple Majority Classifier achieved a similar accuracy of 0.766, suggesting the model was not gaining meaningful predictive power from the features.

  • Threshold Impacts: We attempted to improve certainty by focusing only on tumors with high growth rates (e.g., >1 cm/year), but this further reduced the available training data, which negatively impacted model performance.

Leadership & Academic Growth

While the technical results were not “successful” in terms of predictive accuracy, this project was important to my undergraduate research career. It provided a realistic look at the complexities of medical data and the necessity of rigorous benchmarking. Leading a team required me to delegate tasks, maintain a development schedule, and foster a collaborative environment where we could critically evaluate why certain approaches were failing.

The experience taught me the value of intellectual honesty in research—knowing when to conclude that a current approach or dataset is insufficient and being able to pivot to more promising directions.