Özçelik, Duygu and Taştan, Öznur (2022) A weakly supervised clustering method for cancer subgroup identification. Balkan Journal of Electrical and Computer Engineering, 10 (2). pp. 178-186. ISSN 2147-284X
PDF (Article)
balkan.pdf
Restricted to Registered users only
Download (2MB) | Request a copy
balkan.pdf
Restricted to Registered users only
Download (2MB) | Request a copy
Official URL: http://dx.doi.org/10.17694/bajece.1033807
Abstract
Identifying subgroups of cancer patients is important as it opens up possibilities for targeted therapeutics. A widely applied approach is to group patients with unsupervised clustering techniques based on molecular data of tumor samples. The patient clusters are found to be of interest if they can be associated with a clinical outcome variable such as the survival of patients. However, these clinical variables of interest do not participate in the clustering decisions. We propose an approach, WSURFC (Weakly Supervised Random Forest Clustering), where the clustering process is weakly supervised with a clinical variable of interest. The supervision step is handled by learning a similarity metric with features that are selected to predict this clinical variable. More specifically, WSURFC involves a random forest classifier-training step to predict the clinical variable, in this case, the survival class. Subsequently, the internal nodes are used to derive a random forest similarity metric among the pairs of samples. In this way, the clustering step utilizes the nonlinear subspace of the original features learned in the classification step. We first demonstrate WSURFC on hand-written digit datasets, where WSURFC is able to capture salient structural similarities of digit pairs. Next, we apply WSURFC to find breast cancer subtypes using mRNA, protein, and microRNA expressions as features. Our results on breast cancer show that WSURFC could identify interesting patient subgroups more effectively than the widely adopted methods.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Clustering, Cancer, Cancer Subgroups, Random Forest, Weakly Supervised Learning |
Divisions: | Faculty of Engineering and Natural Sciences |
Depositing User: | Öznur Taştan |
Date Deposited: | 07 Oct 2022 11:23 |
Last Modified: | 07 Oct 2022 11:23 |
URI: | https://research.sabanciuniv.edu/id/eprint/44614 |