Scaling behavior of TARIO

Feb 24

Observations from building world models of biology

8 Comments

great article..proof that scaling works is pretty cool! why would H&E -> transcriptomics -> survival (double noise) work better than H&E -> survival though..

Reply (1)

Abhishaike Mahajan

Feb 25

It's a good question! My mental model is that, if we had infinite H&E -> response, it would all pencil out to be the same as H&E -> transcriptomics -> response. But in practice, nobody does, so having intermediate representations closely related to the clinical problem of interest (tumor microenvironment characterization) helps a *lot*. Spatial transcriptomics is extremely clinically relevant, and may be a much richer signal to train on. H&E certainly is clinically relevant, but maybe it requires far more samples for a model to learn from. I also have a nascent belief that this 'grounding' to a spatial transcriptomics space allows the resulting model to be much less likely to cheat/over-fit to the nuances of the pathology scan...though I don't have proof for this!

David Chu

Feb 24Edited

Hmm this is interesting that TARIO is tokenizing on the transcript level vs tokenizing at the cell level. Is this different than OCTO-VC?

Very different!

"Because of this, we’re currently training a model to bridge that gap: converting H&E into predicted spatial transcriptomes, which would unlock TARIO-level analysis for essentially any tumor sample that’s ever been sliced and stained."

Good article! But, reading this at the end kind of makes the potential model applications quite underwhelming. There is only so much information encoded in morphology/H&E, which is basically only cell type (and maybe copy number in tumours) and nothing even close to cell state (if you benchmark any purported method properly). And in tumours, that tend toward de-differentiation, it makes it even less useful. What's the actual value add if you are just predicting cell type (which can be with much smaller models and in conjuction with segmentation)? Or, do you disagree on the granularity of predictions, and what makes you think that you can resolve finer cell states?

Reply (1)

Abhishaike Mahajan

Mar 2

It's a very fair question!

I guess I consider 'finer cell states' to be ultimately a proxy metric for what we really care about, which are tissue-level organizational features that captures clinical prediction tasks we care about. Some of these are fine-grained cell states, yes, but you can imagine that there are others too, like tertiary lymphoid structure formation or basically anything that has to do with (cell-type-adjacency-to-tumor). Some of these are directly observable from raw H&E, but it may be the case that there are others that are less visually obvious. For example, MSI-high (https://pmc.ncbi.nlm.nih.gov/articles/PMC12800188/).

Also, reposting a reply I made on another comment:

>I also have a nascent belief that this 'grounding' to a spatial transcriptomics space allows the resulting model to be much less likely to cheat/over-fit to the nuances of the pathology scan

Ultimately though, this is an empirical question! A future post will discuss the response prediction work more directly to assess the actual utility :)

Reply (1)

ymvk

Mar 2

That's reasonable. I could see more 'macroscropic' spatial organization being a meaningful indicator with respect to therapy response, e.g. phenotypically distinct clones w.r.t to antibody drug conjugates, etc. However, I'd caveat that with the fact that there is work (https://www.nature.com/articles/s41586-023-06498-3) showing that the most predictive features of the tumour microenvironment in terms of response (e.g. immunotherapy) are not the 'macroscropic' features, but the actual physical interaction (contact) of more fine-grained cell states (e.g CD8 substate) with the tumour cells.

Reply (1)

Abhishaike Mahajan

Mar 2

Very possible! But we have our sights set on many non-ICB targets too, and the evidence in the literature for what spatial patterns there are most important for response prediction are (I believe) quite thin, so lots of interesting science left to be done!

Our upcoming response prediction post will be primarily ICB-focused (due to data privacy requirements on our end), but we have both types of response data internally