Introduction

Cellular reprogramming is a term that encompasses multiple techniques that enable researchers to reverse or hijack the differentiation path of mature cells. Since 2007 (refs1,2), the validation of cellular reprogramming in human cells has opened the gates for a wealth of applications in stem cell biology, disease modelling, drug discovery and regenerative medicine3. Human induced pluripotent stem cells (hiPSCs) and direct reprogramming (Box 1) have solved, at least in part, the problem of limited availability of primary cells from patients, and have facilitated a variety of studies on the recapitulation of physiological and pathological mechanisms in patient-derived lines, resulting in more accurate disease modelling platforms2,4.

The ability to generate virtually any cell type by differentiating in vitro hiPSCs is particularly relevant in the field of neuroscience, owing to the limited access to primary cells from the human CNS and peripheral nervous system. Moreover, the recent development of genome editing techniques has enabled researchers to reduce the variability between healthy and diseased hiPSC clones by genetically correcting a diseased hiPSC line to generate a matched control cell line. hiPSCs have proved to be important in a variety of fields from virology, in which applications include modelling the target organs of viruses such as HIV and SARS-CoV-2 (refs5,6), to toxicology, in which hiPSCs are used to evaluate hepatotoxicity7, cardiotoxicy8 and nephrotoxicity9. hiPSCs are also frequently used in drug discovery and safety studies, for example, in drug development for Alzheimer disease (AD) with the aim of identifying compounds that can inhibit or lower the levels of amyloid-β10. The hiPSC approach has been popular in the field of AD as evidence indicates that the characteristics of hiPSC-derived neurons from individuals with the disease reflect the biomarker changes observed in vivo11. Disease modelling platforms based on hiPSCs have also been used for drug repositioning, a practice that builds on previous toxicological and safety studies to find new applications for known drugs12. For example, disease modelling experiments on hiPSCs derived from individuals with amyotrophic lateral sclerosis (ALS) were used to identify the anti-epileptic drug ezogabine as a potential treatment for ALS13,14.

hiPSC-derived cells themselves could also have applications as treatments in the regenerative medicine field. In particular, the regenerative potential of differentiated hiPSCs could be used to stabilize the progression of neurological disease or heal traumatic injuries of the nervous system15,16,17. The use of human embryonic stem cells (hESCs) as donor cells for regenerative therapy is already in the advanced stages of testing, and hundreds of progenitor cell-based therapies are currently under investigation for a number of neurodegenerative diseases18. The advantage of hiPSCs for this kind of treatment is that they can be generated from the same individual who will receive the therapy, thus minimizing the problem of graft rejection19. Owing to the technical difficulties involved in hiPSC generation and differentiation, only a handful of clinical trials of hiPSC-derived cell transplants have been performed so far20 (Box 2; Table 1). In contrast, many more studies aim to use patient-derived iPSCs for disease modelling (as listed on ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform).

Table 1 Clinical trials of therapeutic hiPSC-derived cells

This Perspective addresses the need for a new approach to the use of hiPSC lines, particularly in in vitro disease modelling. To date, disease modelling with hiPSCs has come in two ‘flavours’: recapitulation of known phenotypes, and discovery of new phenotypes (which need to be subsequently validated in patients). We believe that two more expressions of disease modelling are achievable: matching the heterogeneity of disease manifestations, and predicting phenotypes that will arise in patients. To this effect, the field should strive to integrate high-quality clinical and in vitro data to build effective integrative models that are able to predict age of onset, disease course and severity as well as the drug responsiveness in a cohort of patients. We use Huntington disease (HD) as a practical example to illustrate the feasibility of this approach.

Recognizing fatigue and complexity

A waste of resources?

A study that analysed research papers on hESCs and hiPSCs published between 2008 and 2016 found that, although the number of publications on hESCs is still much higher than the number of publications on hiPSCs, only 21 hESC lines are in use worldwide20. Indeed, two specific lines (H9 and H1) accounted for 69.9% of the hESC publications. In contrast, the authors estimated that, during the same time period, ~10,000 hiPSC lines were generated, most of which were derived from individuals with genetic conditions. Notably, derivation methods, the quality and type of the starting material, culture conditions and differentiation protocols are not standardized across laboratories, thus complicating the interpretation of hiPSC-derived modelling data. This inconsistency was also highlighted by an analysis of the literature describing the in vitro generation of dopaminergic neurons, the neuronal subtype most frequently derived from human pluripotent stem cells (both hESCs and hiPSCs)21. The authors focused on research published between November 2004 and May 2017 and identified 158 publications describing the directed differentiation of human pluripotent stem cells into dopaminergic neurons. Almost half of these publications described new differentiation protocols or substantial modifications to pre-existing protocols, resulting in a total of 74 different methods for generating human dopaminergic neurons in vitro. However, only five of these 74 methods were substantially re-used by other research groups; the two most-cited publications were those published in journals with the highest impact factors21,22,23.

The establishment of new differentiation protocols that are highly efficient and reproducible takes time, and requires a process of trial and error. Although this process is a natural part of the scientific inefficiency of a nascent field, we should be aware that it could also disperse effort and resources. Clearly, improving existing differentiation protocols and developing alternative approaches is extremely valuable. However, we should also direct resources towards the identification and use of robust indicators of the desired cell type, including genetic and proteic markers, and electrophysiological characteristics. This approach would enable us to funnel research efforts towards the achievement of high-quality products, increasing the efficiency of the field.

Several other factors contribute to the complexity involved in using hiPSCs for disease modelling. First, it has been clear from very early in the reprogramming field that hiPSCs show great interline variability24,25 owing to reprogramming-induced genetic and epigenetic aberrations (Table 2). These alterations can have profound consequences; for example, the retention of DNA methylation signatures characteristic of the parent cells — known as somatic memory — can restrict the differentiation potential of hiPSCs26,27. Second, reprogramming to pluripotency almost completely erases the epigenetic age of the donor28,29,30, although some epigenetic and mitochondrial signatures carried by cells from elderly donors can still be found in hiPSCs28,31. This removal of the majority of epigenetic landmarks renders hiPSCs similar to hESCs, and thus suitable for modelling developmental mechanisms and disorders, but poses some difficulties when attempting to recreate the status of an aged cell. Last, the variability encountered in the derivation of hiPSCs from human tissue is further increased by the variability introduced by the subsequent in vitro differentiation protocols, which can differ greatly between laboratories.

Table 2 Genetic and epigenetic alterations in hiPSCs

We believe that the stem cell research community has the skills to tackle these different sources of variability and the resulting loss of time, human resources and funds, and to leverage the full potential of hiPSC disease modelling. However, reliable descriptive and predictive disease modelling can be achieved only through the establishment of a high-quality protocol along clear guidelines, enabling a higher degree of data sharing and collaboration between laboratories, clinicians and industries. In this Perspective, we discuss these requirements of reliable disease modelling and focus on their implementation in the field of neurology.

Molecular-level scrutiny

Given the high degree of variability involved in the establishment of hiPSC lines, the standardized, methodical characterization of the starting cell population and resulting hiPSC lines, along with a high degree of protocol transparency, will be essential for developing new, shareable disease modelling tools. Ideally, for any given target disease, we should strive towards a standardization of the conditions for derivation, culturing, storing and differentiation of hiPSC lines. A practical example of the relevance of accurate phenotyping comes from the Parkinson disease (PD) research field, where specific cellular markers were found to correlate with the transplantation efficiency of dopaminergic progenitors in rats. The identification and use of these markers led to improvements in transplantation outcome and reproducibility32,33.

Prestigious scientific societies and international consortia had, and continue to have, an important role in promoting the excellence of stem cell science and its applications. For example, since 2006 the International Society for Stem Cell Research has issued hESC research guidelines with the aim of optimizing the use of these cells in preclinical and clinical studies34,35. More recently, the Innovative Medicine Initiative has led to the creation of a high-standard European Bank for induced Pluripotent Stem Cells (EBiSC) that is connected to the hiPSC registry36. Additionally, the New York Stem Cell Foundation repository contains a collection of disease-specific stem cell lines, some accompanied by a full genomic sequence, and the California Institute for Regenerative Medicine is in the process of collecting a large number of healthy and diseased tissues for the generation of an hiPSC repository with de-identified clinical and demographic information (see Related links). At the time of writing, the RIKEN BRC Cell Bank already contained 480 disease-specific hiPSC lines and 206 healthy control lines. Together with the Human Induced Pluripotent Stem Cell Initiative and the WiCell Research Institute, the repositories mentioned above constitute the largest hiPSC banks available today37. These entities all responded to the need for a unified framework to enable the sharing of high-quality cells across multiple stakeholders. Another attempt in this direction is provided by the journal Stem Cell Research, which publishes ‘Lab Resource’ articles describing the generation of new cell lines, including detailed information on line derivation methods and characteristics. These details are often omitted from conventional research articles.

We propose that the field should take the approaches described above as examples to generate one worldwide hiPSC certification system to confer the equivalent of a warranty label on each newly generated hiPSC line. In this approach, the cells would be stored in appropriate and controlled conditions in the institute where they were produced, or in a cell bank, and made readily available to the scientific community via a globally accepted process overseen by an international entity. This system would work in a similar way to the ISO-9000 quality management systems38. The cells would be delivered to the recipient, accompanied by a quality control summary containing information on key aspects such as karyotype, genomic content, pluripotency, passaging, derivation method, number of clones generated and differentiation potential (Table 3). This process would be similar to that employed by J. Toombs and colleagues for the generation of hiPSC lines from individuals in the Lothian Birth Cohort of 1936 (ref.39). The use of the quality control criteria listed in Table 3 would increase the quality of research grade hiPSCs, without the need to implement the more expensive and restrictive good manufacturing practice40 conditions that are required for the generation of clinical grade hiPSCs. Most importantly, this process would create a unified and global real-time database that, for any given cell line, synthesizes research data from a range of sources.

Table 3 Quality control panel for hiPSC lines

In addition to the creation of an hiPSC certificate, the performance of hiPSC lines in standard directed differentiation protocols should be validated. Therefore, we should define experimental end points that represent the benchmarks that a differentiating hiPSC must achieve to qualify as a differentiated neuron, hepatocyte or cardiomyocyte. The G-Force PD initiative was launched in 2014 and aims to apply stem cell-based therapies to PD. Importantly, entities participating in the initiative use different cells and protocols, but they have agreed to use common clinical end points for their first-in-human clinical trials in order for the different approaches to be comparable41. Similarly, the coalescence of specific directed differentiation benchmarks into unified experimental end points would allow the establishment of certified and recognized cell lines and protocols, which would greatly advance the knowledge on differentiated patient-derived hiPSCs as they would stem from a common high-quality and state-of-the-art pipeline.

In our opinion, the adoption of these ambitious models is the gateway to predictive disease modelling. Indeed, making a reliable prediction requires a stable and defined starting assumption, which is why we need to increase the level of information we have on the hiPSC lines currently in use.

Switching gears

After the discovery of reprogramming technology, research into neurological disorders42,43,44,45 and neurodegenerative diseases46,47,48,49 initially focused on establishing whether the reprogramming process was equally efficient in cells from healthy individuals and in cells from individuals with disease. The aim of this work was to ensure that hiPSCs would provide a valid system for modelling disease. During this phase of research, a profound transformation in the technical aspects of the procedure occurred, the most relevant aspect of which was the transition from integrative to non-integrative delivery systems3. Thereafter, a progressive interest in leveraging hiPSC technology as a disease modelling platform was accompanied by an increase in the number of studies that used multiple hiPSC lines, and more precise and robust differentiation strategies, to minimize the effect of their intrinsic variability42,43,44,45,46,47,48,49. Unfortunately, barriers to the use of hiPSC-based disease modelling for more than the straightforward comparison of control and disease-perturbed regulatory networks remain. These barriers include the use of low numbers of hiPSC and control lines, missing patient-level data, and the difficulties involved in designing and performing well-powered in vitro studies, all of which are preventing a new, bolder, modelling approach from taking hold.

First, the patient-derived hiPSCs available to preclinical researchers generally do not represent the whole spectrum of manifestations for a given disease, which can bias study results. Instead, sampling should take into account the diverse presentations that a disease can have; for example, the clinical manifestations of idiopathic diseases such as PD are influenced by the genetic background of the individual as well as environmental factors50. Similarly, individuals with monogenic diseases such as HD can exhibit different combinations of a wide range of symptoms, including motor and psychiatric disorders51. In addition to the lack of appropriate sampling, researchers conducting hiPSC studies are almost always blind to the clinical history of the donor patient, thus greatly limiting the interpretation of the resulting data. Indeed, in our experience, the majority of publications on patient-derived hiPSCs — including our own — include extremely limited information on the donor patient. This information typically consists of the donor’s disease, age at onset (when applicable), gender and age at the time of biopsy collection; very seldom could we find detailed information on the donor’s clinical symptomatology. This lack of information prevents researchers from being able to stratify hiPSC lines according to patient characteristics, confuses the interpretation of the resulting in vitro data and frustrates attempts at conducting reliable meta-analyses.

Second, only rarely do preclinical researchers have the chance to define ex ante the size of the patient-derived hiPSC population, and thus to organize their experimental pipeline to ensure a sufficiently controlled and powered study. The cells are, instead, graciously donated to research and to request a certain number of patient and control samples is not possible. As the number of donor patients is generally limited, in vitro hiPSC differentiation studies seem to often be underpowered, and correcting cells with gene editing to achieve a control cell line with the same genetic background as the patient cell line is an inferior substitute for a population of patient and control cells large enough for an adequately controlled and well-powered study.

If we attempt a comparison with clinical trials, we find that the 116 interventional studies for AD listed as currently active on ClinicalTrials.gov are using an average of 245 patients each. The appropriate cohort size depends on the aims of the specific study and intervention; however, until 2015 hiPSC studies of familial and sporadic AD used four or fewer distinct hiPSC lines or clones47. Unfortunately, this limitation is not unique to research into idiopathic conditions such as AD but is also observed in hiPSC studies of genetic diseases (for example, HD52) and complex psychiatric conditions (for example, schizophrenia45), both of which are usually studied in clinical trials that include large cohorts of participants who are carefully stratified to maximize signal-to-noise ratio.

In direct contrast to clinical trials, sample size estimation and power analyses are not required when planning in vitro experiments, even if those experiments use patient-derived cells. In our opinion, power analysis should be a mandatory element of the planning of hiPSC studies. In clinical trials and animal studies, ethical considerations are the main reasoning behind mandatory power analyses — it is unethical to perform an inadequately powered study. We believe that the costs involved in the derivation, maintenance, differentiation and analysis of hiPSCs are important limiting factors that should be viewed in a similar way to these ethical considerations.

Nevertheless, a priori sample sizing for cell-based in vitro studies is challenging because it needs to account for many variables, including the nature of the disease (multifactorial versus monofactorial), the differentiation protocol and the variability of the readouts, which make the resulting estimation highly unreliable. Power analyses performed on historical data or a pilot experiment can calculate the variance of the system, which can be used to estimate the sample size needed to detect the desired effect53. Unfortunately, in our experience the results of this computation often greatly exceed the number of available cell lines or impose a substantial economic burden to the experiment. Nevertheless, it would be highly desirable to identify and implement power analysis methodologies that can aid the design of in vitro experiments.

The feasibility of performing hiPSC experiments with large sample sizes is substantially influenced by the available cell-handling approaches and experimental throughput. For example, the need for hiPSC-based studies with sample sizes large enough to represent the diversity of the patient population discourages the use of traditional low-content approaches. Technological advances in automated cell manipulation54,55, microfluidic systems56, 3D bioprinting57,58, organ-on-chip59 and organoids60,61 can boost experimental throughput to hundreds of lines or clones while maintaining readouts with single-cell resolution.

With the above limitations in mind, we should strive to implement best practice from the clinical research field in an attempt to realize the full potential of patient-specific in vitro disease modelling. To reach this goal, we need closer collaboration between clinicians and preclinical researchers to enable the definition of appropriate sample sizes, keeping in consideration the epidemiological characteristics of the disease of interest. Moreover, a change in policies governing the use of donor patient clinical records is highly desirable62; better access to these records for preclinical researchers would enable us to gain more information from descriptive modelling studies and move us towards predictive modelling.

Predictive modelling

Modelling diseases in vitro is key to uncovering prognostic and predictive biomarkers at relevant surrogate end points. This knowledge is important for the development of preventive treatment approaches. To this end, there is a strong need to establish the most translatable and predictive in vitro cellular models, which we believe can be achieved via the precision medicine model.

Precision medicine is a relatively new operative paradigm that strives to optimize disease prevention and therapy by taking a predictive and preventive approach, as opposed to the reactive approach that is the current standard. The final goal of precision medicine is to predict the individual disease trajectory of each patient and to precisely intervene when the disease processes are preventable or reversible63. We believe that predictive disease modelling with hiPSCs should be considered in the context of the precision medicine guidelines64,65, and should encompass longitudinal studies that involve the stratification of patients and their cells, and the use of computational models to integrate data from different time points and generate information on disease trajectory.

Stratification

Stratification is a statistical procedure that splits a mass into several layers by grouping together units with common characteristics, and is a key aspect of precision medicine. The stratification of patients on the basis of a detailed molecular assessment, including biomarker analyses as well as the collection of genetic, epigenetic, phenotypic and psychosocial data66, enables the heterogeneity of complex multifactorial diseases to be broken down into simpler elements. For example, stratification based on multi-omics has proved extremely useful in oncology, where genomic data provide information on the state of key oncogenes or oncomiRs as well as the presence of fusion proteins or rearrangements, and are analysed together with proteomic and metabolic data67,68. Artificial intelligence and machine learning are being used to generate increasingly well-refined algorithms to interpret this multi-omic data and aid clinicians in tumour stratification, choice of treatment regimen, risk evaluation and prognosis69,70,71. Similar approaches are beginning to be applied in neurology; for example, genomic data has been successfully used to stratify individuals with familial ALS72 with the aim of delivering different therapies to different patient cohorts.

Today, the intrinsic variability of hiPSCs and the scarcity of hiPSC lines derived from single patients or patient cohorts compels us to average data from several lines to increase the strength of the recorded biological data. However, this approach might prevent the identification of biological mechanisms that are specific to a single donor or patient cohort. Therefore, during the creation of disease-specific hiPSC libraries, accurate molecular stratification of patients should be performed to ensure that hiPSC lines are generated from clinically relevant, homogeneous cohorts of patients. In the other direction, omics-based fingerprinting of patient hiPSCs could inform clinical-level patient stratification.

Longitudinal studies

Longitudinal follow-up of patients is an essential aspect of both the development and the implementation of precision medicine. Follow-up enables the disease trajectory of a cohort of patients to be studied with the aim of predicting future decline and intervening before the development of full symptomatology. For example, the Alzheimer Precision Medicine Initiative has established several experimental cohorts that include participants at a range of disease stages, from early asymptomatic individuals to patients with late-stage AD73. Similarly, the Parkinson’s Progression Markers Initiative and the Parkinson’s Associated Risk study are following up cohorts of healthy individuals and individuals with prodromal symptoms of PD, including anosmia, abnormal dopamine transporter imaging and REM sleep behaviour disorder. The expectation is that the individuals with prodromal symptoms are likely to ‘phenoconvert’ — that is, begin to display PD motor signs — enabling the identification of new prognostic biomarkers for the disease50.

In parallel with longitudinal patient follow-up, a cohort of patient-derived hiPSCs can be followed in vitro on a much shorter timescale and with the opportunity to modify the cell environment. This approach unlocks the full potential of longitudinal studies by enabling the correlation of in vivo and in vitro readouts. Therefore, sharing patient-level data is vital not only for the correct interpretation of in vitro modelling data, as discussed above, but also for the correlation of in vitro data with clinical phenotypes to uncover new prognostic biomarkers74. Conversely, re-use of clinical trial data on known patient biomarkers can inform the design of phenotypic assays in hiPSC-derived cells, and this generates more reliable modelling platforms12. Clearly, patient-level data need to be anonymized to protect patient privacy and avoid confidentiality violations. This anonymization can be achieved through dataset de-identification and quality control, as well as data access control — users should be authorized and legally bound to data sharing agreements75.

Integrative disease modelling

The last step in predictive disease modelling is the synthesis of data from patients and in vitro cellular models to generate in silico models of biological patterns and mechanisms. This systems biology perspective represents a shift from the current reductionist approach to a network-based scheme that models complex systems in their entirety50,64. Machine learning and artificial intelligence have the power to integrate diverse, longitudinal in vitro data from multiple sources with a wealth of clinical, radiological and biochemical data from patients to make predictions on the trajectory of the target disease76. For example, this approach has been used to create disease-specific molecular maps that enable navigation of the pathogenetic pathways involved in PD77 and AD78. In this framework, the omics-hiPSC approach can yield information that, when linked to the pathophysiology of a disease and its course in vivo, can lead to the identification of highly prognostic sets of biomarkers64.

Modelling disease trajectories has characteristics common to the modelling of the trajectories of other complex systems such as climate, ecosystems, societies, economics and finance76. These highly complex non-linear systems can undergo dramatic transitions that can be traced back to specific ‘tipping points’ or critical thresholds that, once passed, result in an abrupt change to the state of the system79. For example, critical thresholds (for example, specific bursts of brain activity measured with EEG) have been identified prior to epileptic seizures, and algorithms predicting an impending seizure have been under development since 1998 (ref.80). In 2013, after 10 years of unsuccessful attempts81, a first-in-human proof-of-concept study demonstrated seizure prediction with a sensitivity of 54–100% in ten patients with intractable epilepsy82. The continuous efforts of an international group of seizure prediction laboratories together with the recent advances in network theory, computational modelling, multimodal biosensing and multi-scale electrophysiology are now opening the gates to a life-changing innovation for individuals with epilepsy83.

Clearly, one aim of precision medicine is to be able to predict critical transitions in the course of disease development. In this respect, computational biology has introduced the concept of a dynamic network biomarker; that is, a marker or molecular module the appearance of which precedes a dramatic change in state that marks the transition between pre-disease and disease84. This model-free approach takes advantage of higher-order statistical information to predict disease thresholds without the use of machine learning algorithms85.

Predictive modelling of HD

A sustained, collective effort

HD, an autosomal dominant neurodegenerative condition, provides an excellent example for the potential implementation of predictive disease modelling as it is monogenic, has a range of clinical manifestations and a variable age of onset86. The basic cause of the condition is the expansion of a CAG repeat — encoding glutamine — in the 5′ end of the Huntingtin (HTT) gene. Importantly, remarkable collective efforts by geneticists, clinicians, biologists, epidemiologists, statisticians, funding agencies and patient associations have enabled the study, collection and deposition of HD-relevant information and patient-derived biological samples (peripheral cells, fluids) from several countries worldwide in a global, accessible repository87. This repository and collaborative framework provide the ideal setting for the realization of the predictive, stratified, longitudinal and integrative hiPSC-based disease modelling approach proposed in this Perspective (Fig. 1a).

Fig. 1: Proposed approach to predictive HD modelling based on patient-derived hiPSCs.
figure 1

a | Patients with Huntington disease (HD) are enrolled worldwide. b | Patients are stratified into the smallest clinically relevant cohorts on the basis of biographical information, clinical history, genetic information, neuroimaging data and liquid biomarker levels; these data are collected longitudinally. c | Human induced pluripotent stem cells (hiPSCs) derived from the stratified patients undergo strict quality control and certification (QC) before being deposited in a common resource together with information on the generation and origin of the quality-controlled cell line. d | The HD hiPSCs are used for in vitro disease modelling, drug discovery and repositioning, and implementation of clinical trials ‘in-a-dish’. e | Integrative disease modelling correlates clinical history and in vitro readouts to generate a predictive model of HD progression and therapeutic potential of new drugs. f | Data from the predictive HD model enable preventive medical intervention in at-risk patients before disease onset.

In individuals with HD, the CAG repeat has expanded in length beyond the threshold of 36–39 repetitions, resulting in extensive atrophy in cortical and subcortical striatal structures of the brain, which leads to the manifestation of motor and cognitive symptoms in mid-adult life88. Evidence collected over the past 20 years has shown that CAG expansion size is inversely correlated with disease severity and age at onset of motor symptoms89,90,91 and that the expansion can also lengthen over generations, owing to the intrinsic genetic instability of that DNA region92. CAG expansion also occurs in somatic tissue93, particularly in those brain regions that are more vulnerable to HD pathology94.

However, CAG repeat length only accounts for ~67% of the observed variation in disease severity and age at onset in individuals with HD95. The evidence collected so far indicates that part of the remaining variability is heritable, suggesting a genetic source96. Variants within and outside the CAG repeat region have been associated with age at onset of motor symptoms97,98,99,100,101.

Given this genetic complexity, patient-derived iPSCs represent an invaluable tool with which to explore the role and potential therapeutic value of subtle genetic variants in HD. For example, hiPSC-based screenings could be used to identify compounds that modulate the activity and function of these variants. In particular, we believe that the selection of relevant markers of disease progression, in combination with information on the genetic and clinical background of the donor patients, would enable researchers to stratify patient-derived hiPSCs in a clinically relevant manner and lay the foundations for predictive disease modelling in HD.

From patients to cells, and back

As discussed above, access to the detailed clinical history of donor individuals is essential for realizing the full potential of hiPSC-based disease modelling. In the case of HD, several biomarkers, and cognitive and motor criteria have been approved by medical societies and regulatory bodies as tools for the definition of disease stage (both before and after onset of motor symptoms) and the prediction of disease progression in clinical practice. The main tool for assessing disease stage is the Unified Huntington’s Disease Rating Scale (UHDRS)102, which has several components that score motor, cognitive and functional abilities103. Also under development, and in early use, are rater-independent tests such as the Q-Motor assessment, which is a precalibrated, automated system that provides standardized measurements of the patient’s motor abilities to avoid rater bias104,105. On the molecular side, large efforts have been directed towards identifying fluid biomarkers, but so far only cerebrospinal fluid and plasma levels of neurofilament seem to be reliable indicators of HD progression106. On the contrary, a wealth of data are available on neuroimaging markers of disease progression, most of which are based on structural MRI readouts107,108. Patients with HD can also be stratified on the basis of genetic data, that is CAG repeat length, genetic background or haplotype, and the presence of known genetic modifiers of disease course109. These multivariate and multimodal data are necessary for the accurate staging and stratification of patients, and the longitudinal follow-up of patients is fundamental to understand individual disease trajectories (Fig. 1b).

In order for this detailed information on disease stage and trajectory to benefit hiPSC-based studies, the complete clinical data need to be gathered in a harmonized database together with data from the patient-derived cell lines, following — and going beyond — the guidelines of the EBiSC36 (Fig. 1c; Table 3). Notably, a large collection of hiPSC lines derived from individuals with HD has been banked in the EBiSC by the CHDI Foundation and CENSO Biotechnologies. Although quality control information was submitted to guarantee high-quality hiPSC lines, no clinical data on the donor individuals are available.

In the future, we believe that all clinical trials that involve gathering HD patient samples for hiPSC generation should follow global guidelines, meet very strict criteria for patient enrolment and stratification, and implement quality control processes to ensure full traceability of the recovered biological specimens as well as the ability to communicate these data to researchers requesting them62. This approach would require a sizeable effort and a high degree of cooperation, transparency and understanding of common guidelines for study design, data acquisition and processing pipelines110. The HD field has substantial expertise in this area. For example, large observational studies such as ENROLL-HD have gathered longitudinal information and biological samples (including peripheral blood mononuclear cells that could potentially be used to generate hiPSCs) from >20,000 participants worldwide, and studies nested within ENROLL-HD aim to identify new prognostic biomarkers87. In this scenario, the final, missing link is effective collaboration and data sharing between clinicians and researchers to enable predictive HD modelling with hiPSCs.

Predicting onset and progression

Researchers have already generated and studied iPSCs from individuals with HD. In these studies, the presence of mutant HTT was associated with alterations in cell growth, cell adhesion, metabolism, apoptosis, proteasomal function, autophagic response and mitochondrial fragmentation111,112,113,114. In addition, our work and the work of other groups has identified defects in neuro-ectodermal fate acquisition, neural rosette formation and neural identity acquisition in HD hiPSCs115,116,117. This early phenotype was also observed in an isogenic hESC system118. Unfortunately, none of the above-mentioned studies included analyses of the clinical history of the donor individuals — either the data were not available to the researchers or they were heavily fragmented.

In the case of HD, having hiPSCs accompanied by in-depth and longitudinal clinical data would enable a wide range of additional experiments, with the potential to unravel key aspects of HD pathology. For example, modelling in vitro differentiation in hiPSCs derived from patients with the same CAG length and haplotype but a different age at onset could identify novel molecular mechanisms influencing age at disease onset. Cells could also be used to test the functional impact and temporal dynamics of new genetic modifiers identified through genome-wide association studies99,101,119. Conversely, we could investigate the molecular elements responsible for the same disease course and age at onset in patients with different CAG repeat lengths. Importantly, the young epigenetic age of hiPSCs would enable the correlation between early phenotypes (which might ‘set the stage’ for the changes that lead to atrophy later in life) and the disease course observed in the donor individuals. Furthermore, the derivation of hiPSCs from individuals with HD at different stages could enable us to establish whether disease vulnerability and disease progression are mechanistically distinct. However, reprogramming to pluripotency might hinder this kind of investigation, in which case direct reprogramming (Box 1) could be a useful approach as it better conserves the biological and epigenetic age of the donor cell120. Having clinical information on hiPSC donors could also permit the identification of molecular networks and mechanisms associated with HD phenotypes, thus opening in vitro modelling to a completely new branch of research that might incorporate studies using 3D systems and microfluidics56,121,122 (Fig. 1d).

Ultimately, complementing experimental data with longitudinal clinical data would constitute an important step towards the true implementation of clinical trials ‘in-a-dish’, an approach that would test the safety or efficacy of potential new therapies on cells derived from a representative sample of patients, greatly empowering the bench-to-bedside translational process. In the cardiology field, this approach has already been developed by the Comprehensive in vitro Proarrythmia Assay (CiPA), a global initiative that ultimately aims to develop a large drug screening platform that uses cardiomyocytes derived from hiPSCs to evaluate the cardiotoxicity profile of emerging drugs123,124. The ambitious concept of clinical trials in-a-dish includes the vital term ‘clinical’, highlighting the importance of bringing together clinical and in vitro preclinical research.

Finally, correlating and integrating experimental and clinical data from multiple time points during disease progression will take hiPSC disease modelling towards prediction (Fig. 1e). In the case of HD, we expect that this approach will identify one or several molecular mechanisms or networks in patient-derived hiPSCs that show changes that correlate with the disease trajectory — that is, age at onset, disease progression and presence of comorbidities — of that patient cohort. This information could be used to generate an integrative predictive model that could then be used to predict disease trajectories in new cohorts of patients on the basis of readouts from their hiPSCs, with the aim of modifying these trajectories before they move towards manifest HD125 (Fig. 1f). The great power of this system applied to HD lies in the speed with which hiPSCs can be generated, differentiated and studied — this happens on a much shorter timescale than disease progression, thus potentially enabling personalized predictions to be made early enough for preventive therapies to be effective.

Conclusions

We believe that the generation of effective and predictive in vitro model systems of human diseases is possible, and that relevant predictive modelling platforms can be used as clinical trials in-a-dish, thus greatly advancing the research for disease-modifying therapies. HD was used as an example here as it is a monogenic disease for which gene-silencing therapies are already in phase III clinical trials. The existence of these new therapies calls for biomarkers that can predict the emergence and progression of cognitive and motor decline, and thus identify the optimal point for therapeutic intervention and predict therapeutic efficacy in different patient cohorts. In our opinion, predictive modelling is also a valid approach for other diseases, monogenic and otherwise, as it relies on inclusive components of basic research that are applicable to all disease modelling approaches.

Several hiPSC lines have been generated from patients with HD to model the disease in vitro, and overall, these studies highlight the power of patient-specific cells in the discovery of pathogenic mechanisms and the remarkable flexibility of this technique, which enables the investigation of both neurodevelopmental and neurodegenerative mechanisms. However, we also highlighted a number of limitations of currently used hiPSC lines, including non-uniform reprogramming procedures, lack of standardized quality control of reprogrammed cells, underpowered representation of disease manifestations and lack of cell donor clinical history.

Now is the time to put in the effort required to make hiPSC modelling adherent to a common framework to extract maximal knowledge from clinical and in vitro data. We strongly advocate for a change of conduct in the use of hiPSCs for disease modelling, with the aim of producing robust guidelines and resources that the community can adopt to make hiPSC research effective in the delivery of predictive models that are intended to support the discovery of much-needed disease-modifying therapies.