4 Comments
User's avatar
Jake Shapiro's avatar

Thank you for this awesome series! Very refreshing to read about virtual cells that have a clear problem statement and concrete applications.

One feature I notice about the white-red heat map is that even in cluster Z, there are ~30% patients who have yellow-orange intensity that is similar to the bulk of the non-responsive cluster. Coupled with the fact that there is red intensity in the non-responsive cluster, could the "concept" you are looking at be adjacent to, but not directly causing, the stratified outcomes? More broadly, how would you "draw the line" in this case of who to include/exclude into certain categories?

Expand full comment
Abhishaike Mahajan's avatar

> could the "concept" you are looking at be adjacent to, but not directly causing, the stratified outcomes?

Absolutely! Strictly speaking, anything OCTO-VC tells us is correlational/anti-correlational. We're working on trying to bring in causality (another blog post about this soon), but right now what we have here are strong hints as how concepts relate to outcomes. When it comes to ideating about useful drugs for humans, the oncology field really has two options: reading tea leaves of human biology, or having a perfect map of non-human biology.

I'd say we're of the opinion that the latter has clearly not worked for the cancer field for the past 15~ years, and that theres a fair bit of juice in improving how 'good the tea leaves are' in the first category. It does force us to be a bit imprecise if we lack absolute response/non-response data, but combining the hints that OCTO-VC tells us with known human biology does mean that it can often lead us into interesting directions.

>More broadly, how would you "draw the line" in this case of who to include/exclude into certain categories?

If I understand your question correctly, the answer is that it is hard :) Even beyond us, the cancer field at large has a history of biomarker cut-offs being extremely difficult to perfectly define (cc, tumor mutational burden cutoffs being revisited: https://www.biorxiv.org/content/10.1101/2025.01.02.631104v1.full). I imagine we'll need response data to be very confident about how well our ML-guided categories match up with who responds, which we are getting quite a bit off over time!

Expand full comment
Jake Shapiro's avatar

Cool! Thanks for your response! Another curious question -- is OCTO-VC able to extract non-molecular-biology features of the samples? For example, can it predict/identify regions that are leaky and permeable vs. those that are tight? This question stems from the fact that not all >1% PD-L1 patients respond. One explanation could be that the antibodies just can't reach the tumors or the T-cells can't infiltrate because of physical features of the samples, even if the tumor expressed PD-L1 at a high level. I expect you can get the physical features of a sample from HE without ST, but maybe the ST can augment the HE images for this too!

Expand full comment
Abhishaike Mahajan's avatar

It almost certainly can, since we've been able to annotate interesting parts of a tumor (purposefully being vague) that are a mixture of cellular/non-cellular features! We suspect that features like this are wrapped up in the embedding space of OCTO-VC, and can be taken advantage of if we have access to responder/non-responder data (as was the case in the PD1 case study: https://www.noetik.blog/p/how-do-you-use-a-virtual-cell-to).

There are some people internally doing mech interp work to pull out these sorts of higher-order features in a scalable manner so we can study more of them. It feels likely that many of these are simply unknown to the cancer field, due to being either so weak/complex that smaller studies struggle to find them

Expand full comment