Partial reprogramming experiments using the Yamanaka pluripotency factors (SOKM; Sox2, Pou5f1/Oct4, Klf4, Myc) have demonstrated that epigenetic reprogramming can restore youthful function and reduce disease burden in aged animals and disease models. However, these factors carry neoplastic risks [1] that limit their therapeutic utility to indications that are amenable to ex vivo or local in vivo delivery.
Several experiments have shown that alternative reprogramming factors can rejuvenate the aged transcriptome while reducing neoplastic risks [2]. Alternative reprogramming payloads may therefore unlock therapeutic indications where Yamanaka Factor reprogramming is unacceptably risky.
We also have little reason to believe the Yamanaka Factor reprogramming approach is optimal for rejuvenation. The Yamanaka Factors were selected to optimize for iPSC generation in vitro (Takahashi 2006), an outcome that is directly linked to the neoplastic risks we’re trying to avoid, rather than the rejuvenation effects we’re hoping to maximize. By optimizing directly for the rejuvenative effects of the epigenetic reprogramming payload, we may be able to achieve a more effective epigenetic rejuvenation than has been observed with the Yamanaka Factors.
This argument is doubly true when we consider that different cell types may respond in distinct ways to the same epigenetic reprogramming intervention [3]. It’s likely that the optimal epigenetic reprogramming payload will depend on both the cell type and indication of interest.
We therefore require an approach to discover a specific reprogramming payload for each cell type and indication of interest.
Reprogramming factor discovery is a long-standing problem in developmental biology. Ever since Hal Weintraub discovered that the transcription factor MyoD could reprogram cell identity, scientists have been working to identify reprogramming factors that convert from any cell state into any other. Partial reprogramming is distinct from but related to the problem of converting cell identities, so reviewing the approaches taken in that field is informative.
Unfortunately, Weintraub got lucky. MyoD is one of the few natural TFs that can reprogram cell identity all on its own. Reprogramming identity usually requires a combination of natural TFs, and searching through all possible combinations is intractable. To test all combinations of 6 or fewer mouse TFs, you’d need to test $\sum_{n=1}^{N=6} {1000 \choose n} > 10^{15}$ combinations! Even if you narrow the space to 100 TFs with coarse heuristics, you’d still need $\sum_{n=1}^{N=6} {100 \choose n} > 10^{9}$ experiments.
Historically, researchers have narrowed the search space using heuristics based on the role of each TF in development, the evolutionary relationship between TFs, and the expression of the TF in the source and target cell state. They then test a small number of hypotheses (e.g. $<10^2$) and report the results of any strategies that succeed. Read-outs in these experiments are typically one-dimensional — a handful of marker genes are chosen and their expression is optimized.
Even these simple approaches have yielded remarkable results, including the discovery of pluripotent reprogramming. Nonetheless, most known reprogramming strategies remain inefficient (<1%) and only a handful of source to target cell state conversions have been discovered.
The simple numbers above make it clear that a naïve approach to discovering reprogramming factors is sharply limited. Using the current traditional strategy, researchers explore less than $10^{-13}\%$ of the hypothesis space for reprogramming with natural TFs alone. While synthetic TFs are in some ways simpler because they target fewer loci, the hypothesis space is just as complex. For a synthetic TF with 20 potential target loci, there are ~$10^6$ combinations of guides.
Discovering reprogramming factors for epigenetic rejuvenation is even more challenging. Unlike reprogramming screens that have been performed for cell identity conversion, epigenetic rejuvenation cannot be measured using only a handful of marker genes. Rather, we require a much richer representation of cell state to determine if a cell’s epigenome is rejuvenated.
In order to improve upon the current state-of-the art, we need to both:
Single cell genomics offers us one approach to address the first challenge, while “guided search” algorithms from machine learning offer an approach to achieve the latter.