Previous mathematical models of tumour population genetics
Many previous studies of tumour population genetics have used non-spatial branching processes21, in which cancer clones grow exponentially without interacting. Unless driver mutations increase cell fitness by less than 1%, these models predict lower clonal diversity and lower numbers of driver mutations than typically observed in solid tumours46. Among spatial models, a popular option is the Eden growth model (or boundary-growth model), in which cells are located on a regular grid with a maximum of one cell per site, and a cell can divide only if an unoccupied neighbouring site is available to receive the new daughter cell32,47,61. Other methods with one cell per site include the voter model32,62,63 (in which cells can invade neighbouring occupied sites) and the spatial branching process47 (in which cells budge each other to make space to divide). Further mathematical models have been designed to recapitulate glandular tumour structure by allowing each grid site or deme to contain multiple cells and by simulating tumour growth via deme fission throughout the tumour5,26 or only at the tumour boundary27. A class of models in which cancer cells are organized into demes and disperse into empty space has also been proposed36,52,64. Supplementary Table 2 summarizes selected studies representing the state of the art of stochastic modelling of tumour population genetics.
Our main methodological innovations are to implement all these distinct model structures, and additional models of invasive tumours, within a common framework, and to combine them with methods for tracking driver and passenger mutations at single-cell resolution. The result is a highly flexible framework for modelling tumour population genetics that can be used to examine consequences of variation not only in mutation rates and selection coefficients, but also in spatial structure and manner of cell dispersal65.
Simulated tumours in our models are made up of patches of interacting cells located on a regular grid of sites. In keeping with the population genetics literature, we refer to these patches as demes. All demes within a model have the same carrying capacity, which can be set to any positive integer. Each cell belongs to both a deme and a genotype. If two cells belong to the same deme and the same genotype then they are identical in every respect, and hence the model state is recorded in terms of such subpopulations rather than in terms of individual cells. For the sake of simplicity, computational efficiency and mathematical tractability, we assume that cells within a deme form a well-mixed population. The well-mixed assumption is consistent with previous mathematical models of tumour evolution5,26,27,36,64 and with experimental evidence in the case of stem cells within colonic crypts66.
A simulation begins with a single tumour cell located in a deme at the centre of the grid. If the model is parameterized to include normal cells, then these are initially distributed throughout the grid such that each demes population size is equal to its carrying capacity. Otherwise, if normal cells are absent, then the demes surrounding the tumour are initially unoccupied.
The simulation stops when the number of tumour cells reaches a threshold value. Because we are interested only in tumours that reach a large size, if the tumour cell population succumbs to stochastic extinction, then results are discarded and the simulation is restarted (with a different seed for the pseudo-random number generator).
Tumour cells undergo stochastic division, death, dispersal and mutation events, whereas normal cells undergo only division and death. The within-deme death rate is density-dependent. When the deme population size is less than or equal to the carrying capacity, the death rate takes a fixed value d0 that is less than the initial division rate. When the deme population size exceeds carrying capacity, the death rate takes a different fixed value d1 that is much greater than the largest attainable division rate. Hence, all genotypes grow approximately exponentially until the carrying capacity is attained, after which point the within-deme dynamics resemble a birthdeath Moran processa standard, well characterized model of population genetics.
In all spatially structured simulations, we set d0=0 to prevent demes from becoming empty. For the non-spatial (branching process) model, we set d0>0 and dispersal rate equal to zero, so that all cells always belong to a single deme (with carrying capacity greater than the maximum tumour population size).
When a cell divides, each daughter cell inherits its parents genotype plus a number of additional mutations drawn from a Poisson distribution. Each mutation is unique, consistent with the infinite-sites assumption of canonical population genetics models. Whereas some previous studies have examined the effects of only a single driver mutation (Supplementary Table 2), in our model there is no limit on the number of mutations a cell can acquire. Most mutations are passenger mutations with no phenotypic effect. The remainder are drivers, each of which increases the cell division or dispersal rate.
The programme records the immediate ancestor of each clone (defined in terms of driver mutations) and the matrix of Hamming distances between clones (that is, for each pair of clones, how many driver mutations are found in only one clone), which together allow us to reconstruct driver phylogenetic trees. To improve efficiency, the distance matrix excludes clones that failed to grow to more than ten cells and failed to produce any other clone before becoming extinct.
Whereas previous models have typically assumed that the effects of driver mutations combine multiplicatively, this can potentially result in implausible trait values (especially in the case of division rate if the rate of acquiring drivers scales with the division rate). To remain biologically realistic, our model invokes diminishing returns epistasis, such that the average effect of driver mutations on a trait value r decreases as r increases. Specifically, the effect of a driver is to multiply the trait value r by a factor of 1+s(1r/m), where s>0 is the mutation effect and m is an upper bound. Nevertheless, because we set m to be much larger than the initial value of r, the combined effect of drivers in all models in the current study is approximately multiplicative. For each mutation, the value of the selection coefficient s is drawn from an exponential distribution.
Depending on model parameterization, dispersal occurs via either invasion or deme fission (Supplementary Table 3). In the case of invasion, the dispersal rate corresponds to the probability that a cell newly created by a division event will immediately attempt to invade a neighbouring deme. This particular formulation ensures consistency with a standard population genetics model known as the spatial Moran process. The destination deme is chosen uniformly at random from the four nearest neighbours (von Neumann neighbourhood). Invasion can be restricted to the tumour boundary, in which case the probability that a deme can be invaded is 1N/K if NK and 0 otherwise, where N is the number of tumour cells in the deme and K is the carrying capacity. If a cell fails in an invasion attempt, then it remains in its original deme. If invasion is not restricted to the tumour boundary, then invasion attempts are always successful.
In fission models, a deme can undergo fission only if its population size is greater than or equal to carrying capacity. As with invasion, deme fission immediately follows cell division (so that results for the different dispersal types are readily comparable). The probability that a deme will attempt fission is equal to the sum of the dispersal rates of its constituent cells (up to a maximum of 1). Deme fission involves moving half of the cells from the original deme into a new deme, which is placed beside the original deme. If the dividing deme contains an odd number of cells, then the split is necessarily unequal, in which case each deme has a 50% chance of receiving the larger share. Genotypes are redistributed between the two demes without bias according to a multinomial distribution. Cell division rate has only a minor effect on deme fission rate because a deme created by fission takes only a single cell generation to attain carrying capacity.
If fission is restricted to the tumour boundary, then the new demes assigned location is chosen uniformly at random from the four nearest neighbours, and if the assigned location already contains tumour cells, then the fission attempt fails. If fission is allowed throughout the tumour, then an angle is chosen uniformly at random, and demes are budged along a straight line at that angle to make space for the new deme beside the original deme.
Our particular method of cell dispersal was chosen to enable comparison between our results and those of previous studies and to facilitate mathematical analysis. In particular, when the deme carrying capacity is set to 1, our model approximates an Eden growth model (if fission is restricted to the tumour boundary, or if dispersal is restricted to the tumour boundary and normal cells are absent), a voter model (if invasion is allowed throughout the tumour) or a spatial branching process (if fission is allowed throughout).
To fairly compare different spatial structures and manners of cell dispersal, we set dispersal rates in each case such that the time taken for a tumour to grow from one cell to one million cells is approximately the same as in the neutral Eden growth model with maximal dispersal rate. This means that, across models, the cell dispersal rate decreases with increasing deme size. Given that tumour cell cycle times are on the order of a few days, the timespans of several hundred cell generations in our models realistically correspond to several years of tumour growth. More specifically, if we assume tumours take between 5 and 50 years to grow and the cell cycle time is between 1 and 10 days (both uniform priors), then the number of cell generations is between 400 and 8,000 in 95% of plausible cases. This order of magnitude is consistent with tumour ages inferred from molecular data67.
We note that, in addition to gland fission, gland fusion has been reported in normal human intestine68, which raises the possibility that gland fusion could occur during colorectal tumour development. However, the rate of crypt fission in tumours is much elevated relative to the rate in healthy tissue, and must exceed the rate of crypt fusion (or else the tumour would not grow). Therefore, even if crypt fusion occurs in human tumours, we do not expect it to have a substantial influence on evolutionary mode. This view is supported by previous computational modelling69.
We chose to conduct our study in two dimensions for two main reasons. First, the effects of deme carrying capacity on evolutionary dynamics are qualitatively similar in two and three dimensions, yet a two-dimensional model is simpler, easier to analyse, and easier to visualize. Second, we aimed to create a method that is readily reproducible using modest computational resources and yet can represent the long-term evolution of a reasonably large tumour at single-cell resolution.
One million cells in two dimensions corresponds to a cross-section of a three-dimensional tumour with many more than one million cells. Therefore, compared to a three-dimensional model, a two-dimensional model can provide richer insight into how evolutionary dynamics change over a large number of cell generations. Developing an approximate, coarse-grained analogue of our model that can efficiently simulate the population dynamics of very large tumours with different spatial structures in three dimensions is an important direction for future research.
The programme implemented Gillespies exact stochastic simulation algorithm70 for statistically correct simulation of cell events. The order of event selection is (1) deme, (2) cell type (normal or tumour), (3) genotype, and (4) event type. At each stage, the probability of selecting an item (deme, cell type, genotype or event type) is proportional to the sum of event rates for that item, within the previous item. We measured elapsed time in terms of cell generations, where a generation is equal to the expected cell cycle time of the initial tumour cell.
We surveyed the multi-region and single-cell tumour sequencing literature to identify data sets suitable for comparison with our model results. Studies published before 2015 (for example, refs. 71,72,73,74) were excluded as they were found to have insufficient sequencing depth for our purposes. We also excluded studies that reconstructed phylogenies using samples from metastases or from multifocal tumours (for example, refs. 75,76,77,78,79,80) because our model is not designed to correspond to such scenarios. The seven studies we chose to include in our comparison are characterized by either high-coverage multi-region sequencing or large-sample single-cell sequencing of several tumours.
The ccRCC investigation81 we selected involved multi-region deep sequencing, targeting a panel of more than 100 putative driver genes. Three studies of NSCLC10, mesothelioma40 and breast cancer39 conducted multi-region whole-exome sequencing (first two studies) or whole-genome sequencing (latter study), and reported putative driver mutations. We also used data from single-cell RNA sequencing studies of uveal melanoma42 and breast cancer41, in which chromosome copy number variations were used to infer clonal structure, and a study of acute myeloid leukaemia (AML) that used single-cell DNA sequencing24. All seven studies constructed phylogenetic trees, which are readily comparable to the trees predicted by our modelling. The methodological diversity of these studies contributes to demonstrating the robustness of the patterns we seek to explain.
From each of the seven cohorts, we obtained data for between three and eight tumours. In the ccRCC data set, we focused on the five tumours for which driver frequencies were reported in the original publication. For NSCLC, we used data for the five tumours for which at least six multi-region samples were sequenced. In mesothelioma, we selected the six tumours that had at least five samples taken. From the breast cancer multi-region study, we used data for the three untreated tumours that were subjected to multi-region sequencing. From the single-cell sequencing studies of uveal melanoma and breast cancer, we used all the published data (eight tumours in each case), and from the AML study, we selected a random sample of eight tumours.
In multi-region sequencing data sets, it is uncertain whether all putative driver mutations were true drivers of tumour progression. One way to interpret the data (interpretation I1) is to assume that all putative driver mutations were true drivers that occurred independently. Alternatively, the more conservative interpretation I2 assumes that each mutational cluster (a distinct peak in the variant allele frequency distribution) corresponds to exactly one driver mutation, while all other mutations are hitchhikers. Thus, I1 permits linear chains of nodes that in I2 are combined into single nodes (compare Supplementary Figs. 9 and 10), and I1 leads to a higher estimate of the mean number of driver mutations per cell (our summary index n). If both the fraction of putative driver mutations that are not true drivers (false positives) and the fraction of true driver mutations that are not counted as such (false negatives) are low, or if these fractions approximately cancel out, then interpretation I1 will give a good approximation of n whereas I2 will give a lower bound. For the ccRCC, NSCLC and breast cancer cases in our data set, I1 generates values of n in the range 310 (mean 6.1), consistent with estimates based on other methodologies13,51, whereas for I2 the range is only 14 (mean 2.5). Accordingly, we used interpretation I1.
To measure clonal diversity, we used the inverse Simpson index defined as (D=1/{sum }_{i}{p}_{i}^{2}), where pi is the frequency of the ith combination of driver mutations. For example, if the population comprises k clones of equal size, then pi=1/k for every value of i, and so D=1/(k1/k2)=k. Clonal diversity has a lower bound D=1. The inverse Simpson index is relatively robust to adding or removing rare types, which makes it appropriate for comparing data sets with differing sensitivity thresholds. Further examples are illustrated in Supplementary Fig. 11.
D is constrained by an upper bound for trees with n<2, where n is the mean number of driver mutations per cell. Indeed, n=iipip1+2(1p1)=2p1, so p12n>0, since n<2. Therefore,
$$D=frac{1}{{sum }_{i}{p}_{i}^{2}}le frac{1}{{p}_{1}^{2}}le frac{1}{{(2-n)}^{2}}.$$
To see that this bound is tight, assume 1n<2 and consider a star-shaped tree with N nodes such that p1=2n and other nodes have equal weights pi=(1p1)/(N1)=(n1)/(N1) for i2. The mean number of driver mutations per cell is p1+2(1p1)=2p1=n, and the inverse Simpson index is
$$begin{array}{l}D=frac{1}{mathop{sum }nolimits_{i = 1}^{N}{p}_{i}^{2}}=frac{1}{{p}_{1}^{2}+mathop{sum }nolimits_{i = 2}^{N}{p}_{i}^{2}}\=frac{1}{{(2-n)}^{2}+(N-1){((n-1)/(N-1))}^{2}}=frac{1}{{(2-n)}^{2}+{(n-1)}^{2}/(N-1)}.end{array}$$
This quantity goes to 1/(2n)2 as the number of nodes N goes to infinity, so the bound 1/(2n)2 may be approached arbitrarily closely.
It is informative to derive the relationship between D and n for a population that evolves via a sequence of clonal sweeps, such that each new sweep begins only after the previous sweep is complete. For a given value of n, our simulations rarely produce trees with D values below the curves of this trajectory. Suppose that a population comprises a parent type and a daughter type, with frequencies p and 1p, respectively. If the daughter has m driver mutations, then the parent must have m1 driver mutations and n must satisfy m1nm. More specifically,
$$n=(m-1)p+m(1-p)=m-p Rightarrow p=m-n=1-{n},$$
where {n} denotes the fractional part of n (or 1 if n=m). The trajectory is therefore described by
$$D=frac{1}{{p}^{2}+{(1-p)}^{2}}=frac{1}{{(1-{n})}^{2}+{{n}}^{2}}.$$
We additionally calculated a curve representing the maximum possible diversity of linear trees. In the main text and below, we refer to this curve as corresponding to trees with an intermediate degree of branching. Specifically, this intermediate-branching curve is defined such that for every point below the curve (and with D>1), there exist both linear trees and branching trees that have the corresponding values of n and D, whereas for every point above the curve there exist only branching trees. Derivation of the curves equation is provided in Supplementary Information. A first-order approximation (accurate within 1% for n2.2) is D9(2n1)/8.
To assess the extent to which clusters of points (n, D) are well separated, we calculated silhouette widths using the cluster R package82. A positive mean silhouette width indicates that clusters are distinct.
Our diversity index fulfills the same purpose as the intratumour heterogeneity (ITH) index used in the TRACERx Renal study9, defined as the ratio of the number of subclonal driver mutations to the number of clonal driver mutations. However, compared to ITH, our index has the advantages of being a continuous variable and being robust to methodological differences that affect ability to detect low-frequency mutations. In calculating ITH from sequencing data, we included all putative driver mutations, whereas ref.9 used only a subset of mutations. For model output, we classified mutations with frequency above 99% as clonal and we excluded mutations with frequency less than 1%. ITH and the inverse Simpson index are strongly correlated across our models (Spearmans =0.98, or =0.81 for cases with D>2; Extended Data Fig. 9c).
The Shannon index, defined as ({sum }_{i}{p}_{i}{{mathrm{log}}},{p}_{i}), is another alternative to the Simpson index. The exponential of this index has the same units as the inverse Simpson index (equivalent number of types). Compared to the Simpson index, the Shannon index gives more weight to rare types, which makes it somewhat less suitable for comparing data sets with differing sensitivity thresholds.
In defining regions in terms of indices D and n (Table 1 and Fig. 3c), we first noted that if a population undergoes a succession of non-overlapping clonal sweeps, then at most two clones coexist at any time, and hence D2. Allowing for some overlap between sweeps, we defined the selective sweeps region as having D<10/3 and D below the intermediate-branching curve. We put the upper boundary at D=10/3 because this intersects with the intermediate-branching curve at n=2.
We used D=20 to define the boundary between the branching and progressive diversification regions. The TRACERx Renal study9 instead categorized trees containing more than 10 clones as highly branched, as opposed to branched. It is appropriate for us to use a higher threshold because our regions are based on true tumour diversity values, rather than the typically lower values inferred from multi-region sequencing data. Finally, we defined an effectively almost neutral region containing star-shaped trees with n<2 and D above the intermediate-branching curve.
It is possible to construct trees that do not fit the labels we have assigned to regions. For example (as shown in Supplementary Information), there exist linear trees within the branching and progressive diversification regions. Such exceptions are an unavoidable consequence of representing high-dimensional objects, such as phylogenetic trees, in terms of a small number of summary indices. Our labels are appropriate for the subset of trees that we have shown to arise from tumour evolution.
Conventionally, the balance of a tree is the degree to which branching events split the tree into subtrees with the same number of leaves, or terminal nodes. A balanced tree thus indicates more equal extinction and speciation rates than an unbalanced tree83. Tree balance indices are commonly used to assert the correctness of tree reconstruction methods and to classify trees. We considered three previously defined indices, all of which are imbalance indices, which means that more balanced trees are assigned smaller values. We subtracted each of these indices from 1 to obtain measurements of tree balance.
Let T=(V,E) be a tree with a set of nodes V and edges E. Let V=N, and hence E=N1 (since each node has exactly one parent, except the root). We defined l as the number of leaves of the tree. The root is labelled 1 and the leaves are numbered from Nl+1 to N. There is only one cladogram with two leaves, which is maximally balanced according to all the previously defined indices discussed below. We also considered the single-node tree to be maximally balanced with respect to these previously defined indices. The following definitions then apply when l3.
For each leaf j, we defined j as the number of interior nodes between j and the root, which is included in the count. Then a normalized version of Sackins index, originally introduced in ref.84, is defined as
$${I}_{S,mathrm{norm}}(T)=frac{mathop{sum }limits_{j=N-l+1}^{N}{nu }_{j}-l}{frac{1}{2}(l+2)(l-1)-l},$$
where to be able to compare indices of trees on different number of leaves l, we subtracted the minimal value for a given l and divided by the range of the index on all trees on n leaves, as in ref.85.
For an interior node i of a binary tree T, we defined TL(i) as the number of leaves subtended by the left branch of Ti, the subtree rooted at i, and TR(i) the number of leaves subtended by its right branch. Then, the unnormalized Colless index86 of T is
$${I}_{C}(T)=mathop{sum }limits_{i=1}^{N-l}| {T}_{L}(i)-{T}_{R}(i)| .$$
Since Colless index is defined only for bifurcating trees, we used the default normalized Colless-like index ({{mathfrak{C}}}_{{mathrm{MDM}},,{{mathrm{ln}}}(l+e),,{mathrm{norm}}}) defined in ref.85. This consisted of measuring the dissimilarity between the subtrees (T^{prime}) rooted at a given internal node by computing the mean deviation from the median (MDM) of the f-sizes of these subtrees. In this case, (f(l)={{mathrm{ln}}}(l+e)) and the f-size of (T^{prime}) is defined as
$$mathop{sum}limits_{vin V(T^{prime} )}{mathrm{ln}}({mbox{deg}}(v)+e).$$
These dissimilarities were then summed and the result was normalized as for Sackins index.
The cophenetic value (i,j) of a pair of leaves i,j is the depth of their lowest common ancestor (such that the root has depth 0). The total cophenetic index87 of T is then the sum of the cophenetic values over all pairs of leaves, and a normalized version is
$${I}_{{{Phi }},{mathrm{norm}}}(T)=frac{mathop{sum}limits_{N-l+1le i < jle N}phi (i,j)}{left({l}atop{3}right)},$$
where here the minimal value of the cophenetic index is 0 for all l (for a star-shaped tree with l leaves).
These three balance indices were designed for analysing species phylogenies and are thus defined on cladograms, which are trees in which leaves correspond to extant species and internal nodes are hypothetical common ancestors. Conventional cladograms have no notion of node size. Cladograms also lack linear components as each internal node necessarily corresponds to a branching event. The driver phylogenetic trees reported in multi-region sequencing studies and generated by our models are instead clone trees (also known as mutation trees), in which all nodes of non-zero size represent extant clones. To apply previous balance indices to driver phylogenetic trees, we first converted the trees to cladograms by adding a leaf to each non-zero-sized internal node and collapsing linear chains of zero-sized nodes.
Whereas diversity indices such as D are relatively robust to the addition or removal of rare clones, the balance indices described above are much less robust because they treat all clones equally, regardless of population size (Supplementary Figs. 6, 7 and 8). This hampered comparison between model results and data for two reasons. First, due to sampling error, even high quality multi-region sequencing studies underestimate the number of subclonal, locally abundant driver mutations by approximately 25%81. Second, bulk sequencing cannot detect driver mutations present in only a very small fraction of cells.
To overcome the shortcomings of previous indices, we have developed a more robust tree balance index based on an extended definition: tree balance is the degree to which internal nodes split the tree into subtrees of equal size, where size refers to the sum of all node populations.
Let f(v)>0 denote the size of node v. For an internal node i, let V(Ti) denote the set of nodes of Ti, the subtree rooted at i. We then define
$$begin{array}{l}{S}_{i}=mathop{sum}limits_{vin V({T}_{i})}f(v)=,{{mathrm{the}} {mathrm{size}} {mathrm{of}}},,{T}_{i},\ {S}_{i}^{* }=mathop{sum}limits_{vin V({T}_{i})atop {vne i}}f(v)=,{{mathrm{the}} {mathrm{size}} {mathrm{of}}},,{T}_{i},,{{mathrm{without}} {mathrm{its}} {mathrm{root}}},,i.end{array}$$
For i in the set of internal nodes (widetilde{V}), and j in the set C(i) of children of i, we define ({p}_{ij}={S}_{j}/{S}_{i}^{* }). We then computed the balance score ({W}_{i}^{1}) of a node (iin widetilde{V}) as the normalized Shannon entropy of the sizes of the subtrees rooted at the children of i:
$${W}_{i}^{1}=mathop{sum}limits_{jin C(i)}{W}_{ij}^{1},quad ,{{mbox{with}}},{W}_{ij}^{1}=left{begin{array}{ll}-{p}_{ij}{{{mathrm{log}}},}_{{d}^{+}(i)}{p}_{ij}&,{{mbox{if}}},,{p}_{ij} > 0,{{mbox{and}}},,{d}^{+}(i)ge 2,\ 0&,{{mbox{otherwise,}}},end{array}right.$$
where d+(i) is the out-degree (the number of children) of node i. Finally, for each node i, we weighted the balance score by the product of ({S}_{i}^{* }) and a non-root dominance factor ({S}_{i}^{* }/{S}_{i}.) Our normalized balance index is then
$${J}^{1}:= frac{1}{{sum }_{kin widetilde{V}}{S}_{k}^{* }}mathop{sum}limits_{iin widetilde{V}}{S}_{i}^{* }frac{{S}_{i}^{* }}{{S}_{i}}{W}_{i}^{1}.$$
Supplementary Fig. 11 illustrates the calculation of J1 for four exemplary trees. We further describe the desirable properties of this index, and its relationship to other tree balance indices, in another article43.
When n2 (where n is the mean number of driver mutations per cell), the non-root dominance factor cannot exceed n1, while the other factors in J1 are at most 1, which implies J1n1 for all n2. Also for n>2, we have J11
For each time point tt, we defined a clonal turnover index as
$${{Theta }}(t)=mathop{sum}limits_{i}{left({f}_{i}(t)-{f}_{i}(t-tau )right)}^{2},$$
where fi(t) is the frequency of clone i at time t, and is 10% of the total simulation time measured in cell generations. The mean value (overline{{{Theta }}}) over time measures the total extent of clonal turnover.
To determine whether clonal turnover mostly occurred early, late or throughout tumour evolution, we calculated the weighted average
$${overline{T}}_{{{Theta }}}=frac{1}{max (t)}left(mathop{sum}limits_{t}{{Theta }}(t)tbigg/mathop{sum}limits_{t}{{Theta }}(t)right),$$
where (max (t)) denotes the final time of the simulation. This quantity takes values between 0 and 1, and is higher if clonal turnover occurs mostly late during tumour growth. If the rate of clonal turnover is constant over time, then ({overline{T}}_{{{Theta }}}approx 0.55).
We randomly selected five tumours of each of four cancer types (colorectal cancer, clear cell renal cancer, lung adenocarcinoma and breast cancer) from The Cancer Genome Atlas (TCGA) reference database (http://portal.gdc.cancer.gov). Using QuPath v0.2.0m488, we manually delineated five representative groups of tumour cells in each image and automatically counted the number of cells in each group. We defined a group as a set of tumour cells directly touching each other, separated from other groups by stroma or other non-tumour tissue (Extended Data Fig. 3).
The number of cells per group ranged from 5 to 8,485, with 50% of cases having between 53 and 387 cells (Extended Data Fig. 4a). Variation in the number of cells per group was larger between rather than within tumours, whereas cell density was relatively consistent between tumours (Extended Data Fig. 4b). Because our cell counts were derived from cross sections, they would underestimate the true populations of three-dimensional glands. On the other hand, it is unknown what proportion of cells are able to self-renew and contribute to long-term tumour growth and evolution89. On balance, therefore, it is reasonable to assume that each gland of an invasive glandular tumour can contain between a few hundred and a few thousand interacting cells. This range of values is, moreover, remarkably consistent with results of a recent study that used a very different method to infer the number of cells in tumour-originating niches. Across a range of tissue types, this study concluded that cells typically interact in communities of 3001,900 cells30. Another recent study of breast cancer applied the Louvain method for community detection to identify two-dimensional tumour communities typically in the range of 10100 cells.29
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Read the original here:
Spatial structure governs the mode of tumour evolution - Nature.com