Daily Archives: December 7, 2021

Is There a Genetic Link to Being an Extremely Good Boy? – WIRED

Posted: December 7, 2021 at 5:33 am

Flash isnt your average puppy. A yellow Labrador, named after one of the first British guide dogs from 1931, she is playful, affectionate, and loves learning new commands. Flash is enrolled in an elaborate program herself, one that takes two years and nearly $50,000 to train her to become a guide dog for the blind and visually impaired. Her temporary caregiver Melanie will make sure she maintains a healthy routine: twice-daily walks in different environments, a train ride here, a trip to a mall there to get used to other people. But Melanie has already accomplished one of her most important tasks: When Flash was five months old, she swabbed the puppys cheek and mailed the saliva away to a team of researchers that is trying to decipher the link between dog genetics, health, and behavior.

Around half of the dogs that are bred for guiding dont end up doing that work because of health or behavioral problems. Modern dogs suffer from many genetic diseases, a side effect of keeping breeds separate and selecting them for desirable traits. Some of these purebreds might have the right looks, but not the right temperament, to become a working dog. But what if breeders could predict what makes a good guide dog and select against undesired traits, ensuring they arent passed on to the next generation?

More than 500 traits analogous to human genetic conditions have been described in dogsboth species can suffer from cancer, eye disease, or dysplasia of the hip, to name a few. Cheap DNA tests for canines can screen for changes, known as mutations, in a single gene. The causes of many other conditions, however, are more complex. They can be linked to multiple genes or to environmental factors like exercise, food, dust, or mold spores. We definitely want to get a handle on complex traits, says Tom Lewis, head of canine genetics at Guide Dogs. The charity breeds around 1,000 puppies a year, which spend their first year in the homes of volunteers before entering formal training.

Before joining Guide Dogs in January, Lewis worked at the Animal Health Trust and the Kennel Club in the United Kingdom, where he studied the genetic risk of hip dysplasia in breeds registered with the club. Dysplasia is one of the hereditary conditions that can be difficult to diagnose and treat. It is a malformation of the hip joint that develops during growth, though traumatic injury, being overweight, or lacking muscle strength can worsen the condition. For example, puppies raised in homes with hardwood flooring may build less muscle mass in their legsthey cant get traction on the floor, and slip and slide around, which is hard on their little joints. The constant pain can eventually turn into lameness and arthritis in grown dogs, making them unsuitable for guiding or assisting people with disabilities.

Good health is key for guide dogs, but temperament is just as important. They need to lead their owners around obstacles and other people while staying calm and obedient. They need to resist chasing after squirrels or getting too excited when meeting other dogs. Not every breed has what it takes. For example, the typical cocker spaniel is intelligent, affectionate, and a great option for families, but it is also too excitable. Even if you give them the same training, you would never expect a spaniel to be a guide dog. They're far too temperamentally unsuited, and that's probably a genetic thing, says Lewis.

The rest is here:
Is There a Genetic Link to Being an Extremely Good Boy? - WIRED

Posted in Human Genetics | Comments Off on Is There a Genetic Link to Being an Extremely Good Boy? – WIRED

Genetic Variant Discovered in Amish Protects from Heart Disease – The Scientist

Posted: at 5:32 am

A gene variant initially uncovered in the genomes of people belonging to the Old Order Amish has been linked in a new study to lower levels of fibrinogen (a blood clotting factor) and low-density lipoprotein (LDL) cholesterolboth of which, when elevated, increase a persons risk of developing cardiovascular disease. The work, published today (December 2) in Science,not only connects a missense mutation in the enzyme-coding gene beta-1,4-galactosyltransferase 1 (B4GALT1) with heart health in humans, but confirms the link in mice.

This is a very good example of the utility of small founder or isolated populations in predicting genetic effects of genes that could not easily be identified even in the very big human biobanks that are available worldwide, Caroline Hayward, who studies human genetics at the University of Edinburgh and did not participate in the study, writes in an email to The Scientist.

May Montasser, a genetic epidemiologist at the University of Maryland School of Medicine, and colleagues study the genomes of the Old Order Amish because approximately 35,000 people alive today can trace their family history back to a small number of founder families. Due to the small pool of genetic starting material, this population harbors less genetic diversity than the general population, meaning that variants that might disappear in larger groups can be maintained in Old Order Amish populations and therefore be easier to spot.

In the new study, Montasser and colleagues sequenced the exons of 6,890 Old Order Amish subjects and found a missense mutation in B4GALT1,which is expressed throughout the body and encodes an enzyme that transfers galactose to proteins. The variant was associated with lower LDL cholesterol and was present in six percent of the Amish genomes but is much rarer outside the Amish community. The researchers found it in only eight of 140,000 non-Amish genomes that are part of a National Heart, Lung, and Blood Institute database.

When the team looked at other factors related to cardiovascular health in people carrying the missense mutation, they found no association with triglycerides and a small association with high density lipoprotein cholesterol. The blood clotting factor fibrinogen, which can be a risk factor for cardiovascular disease when elevated, was lower in people with the variant.

To assess whether the missense mutation in B4GALTwas linked with overall cardiovascular health, the researchers shifted their focus outside the relatively healthy Amish population to the Geisinger Health System and the UK Biobank, two larger genomic databases. Because the specific variant they identified in the Amish is so rare in the general population, Montasser and her colleagues pulled out genomes with any variant of B4GALT.These individuals had decreased LDL, fibrinogen, and a 35 percent reduction in cardiovascular disease.

When the researchers generated knock-in mice with the B4GALTvariant, the animals also had lower LDL and fibrinogen. Knocking the gene down just in the rodents livers led to lower levels of circulating LDL. The authors explain in the paper that this finding could point to the usefulness of targeting B4GALT expression in the liver therapeutically to lower LDL cholesterol.

The study provides strong evidence that this newly discovered mutation is relevant across population and species, Kari North, a genetic epidemiologist at the University of North Carolina at Chapel Hill who did not participate in the work, writes in an email to The Scientist. B4GALT may represent a new drug target for decreasing LDL-cholesterol and downstream [cardiovascular disease]. However, years of work are still needed to develop this new discovery into a new pharmaceutical target, she adds.

Before getting to the point of clinical relevance, we have to make sure that there are not any harmful side effects [associated with] having this variant, Montasser explains. Having low LDL is great; having low fibrinogen is great, but is there anything else harmful? So far, based on all the other information we have, everything looks good. Those people look perfectly healthy, but we are doing even more deep phenotyping right now to make sure that we are not missing anything.

Another issue is that its not clear how we go from having this variant to having low LDL and low fibrinogen and protection from cardiovascular disease, she says. The research team is trying to characterize that mechanism in animal models and human samples now. Montasser says they will keep working on it [and] hopefully someday well have some form of therapy based on this.

See the original post:
Genetic Variant Discovered in Amish Protects from Heart Disease - The Scientist

Posted in Human Genetics | Comments Off on Genetic Variant Discovered in Amish Protects from Heart Disease – The Scientist

UAB researcher shines light on a rare disease that causes developmental and intellectual delays – The Mix

Posted: at 5:32 am

After years of researching the SON gene,Erin Eun-Young Ahn, Ph.D., may have found the cause behind an extremely rare disease.

After years of researching the SON gene, Erin Eun-Young Ahn, Ph.D., may have found the cause behind an extremely rare disease.(Photography: Nik Layman)Since the early 2000s, Erin Eun-Young Ahn, Ph.D., associate professor in theUniversity of Alabama at BirminghamDepartment of PathologysDivision of Molecular & Cellular Pathology, has been studying the SON protein and gene. The SON gene makes a protein, also called SON, that is required for the body to develop and grow normally. While Ahn is one of the worlds leading experts on the SON gene, she had no idea that her work would ultimately help determine the cause behind an extremely rare disease known as Zhu-Tokita-Takenouchi-Kim syndrome.

ZTTK syndrome is a severe multi-system developmental disorder characterized by delayed psycho-motor development and intellectual disability. Common clinical features of ZTTK include intellectual and developmental delays, brain malformations, muscle abnormalities and facial asymmetry. Little is known about this disease, including its cause, until Ahn discovered that ZTTK syndrome is the result of a genetic mutation of the SON gene.

Ahns journey began in 2014 when a physician from California contacted her about a pediatric patient who suffered from developmental and intellectual disabilities. These included late milestones in language and cognitive processing. Ahn has published research showing that SON function is important in the RNA editing step, called RNA splicing. RNA delivers instructions to cells from DNA on which proteins to produce. The physician found Ahns postdoctoral research publication and reached out for help.

The patient underwent standard genetic testing panels and tests for gene mutations on known genes, which all came back normal. Finally, the doctors ran exome sequencing a test developed in the last decade that can identify more undiscovered variants in an unbiased way. They found that the only gene mutation this patient had was in the SON gene.

This was the first finding of this specific gene mutation in humans, Ahn said. We knew that SON is overexpressed in several types of cancer cells, but we didnt know whether the mutations in the DNA sequence of the SON gene that cause loss-of-function really existed in the human patient. It was really eye opening.

Ahn made a case report on this single patient, showing how in cancer there is an overproduction of SON, and with an underproduction, there are developmental and intellectual delays like those present in the first patient. The journal to which she submitted the work suggested she locate other cases. With the assistance of a website tracking undiagnosed diseases and a database of gene mutations, Ahn found a few more incidents indicating DNA sequence changes in the SON gene. She reached out to those patients physicians and the involved researchers in the U.S. and Europe, finding they had the same symptoms as her original pediatric patient.

We went from one case to five cases, Ahn said. And a few months after that, the information began to spread rapidly. Clinicians and genetic counselors started talking, and the information became international.

Ahns group eventually analyzed clinical symptoms of 20 patients who carry mutations in the SON gene. They also conducted experiments using the cells obtained from the patients and demonstrated that an insufficient amount of the SON protein leads to defective RNA splicing in patients, which in turn causes abnormal brain development and metabolism. Their findings were published in the journal,American Journal of Human Genetics, in September 2016. The publication played a major role in designating this new disease caused by SON mutations as ZTTK syndrome.

Typically, the human body houses two copies of the SON gene, and Ahn found that those with ZTTK have one copy of the normal SON gene and one copy of the mutated SON gene. This mutation in the SON gene means that the body cannot produce as much of the SON protein needed for its cells to develop properly. This lack of cell development leads to the developmental and intellectual delays found in patients with this disease. There are currently no reported cases of mutations found in both copies of the SON gene, which suggests that mutations in both copies may be extremely detrimental to human development.

The mutations show up during development but are not inherited from the parents, which is why awareness of the syndrome by clinicians is key to connecting patients with resources, Ahn said. They have to talk to their doctor and a genetic counselor and get the exome sequencing done to get the diagnosis.

In her role as one of the worlds primary researchers on the SON gene, Ahn became a point of contact for patients and their families, connecting families all over the world. This led to the establishment of a Facebook group for the syndrome, which now has more than 245 members. Last year, several of the parents of children, together with Ahn, created theZTTK SON-Shine Foundationthat showcases personal stories of families learning how to live with this syndrome.

We developed the ZTTK SON-Shine Foundation, because we wanted to create a sense of community among people affected by this disease so they do not feel isolated, Ahn said. This is a way that parents of children with ZTTK can help and share information with each other in hopes of improving the quality of life for every patient with ZTTK syndrome.

In addition to connecting families with this challenging rare disease, the foundation hopes to spread awareness of the syndrome to clinicians in particular.

One of the goals of the ZTTK SON-Shine Foundation is making a network of clinicians and researchers who can share their expertise to help the families affected by this syndrome, Ahn said. Exome sequencing reports typically indicate the results in scientific terms, but when patients receive those, they dont understand what that means. So, I often provide the families with an explanation of what these genetic codes mean in laymans terms, so that they can have a clear understanding. I am very grateful that I have opportunities to help the patients and families.

Ahn is no longer alone in the study of ZTTK, but working with researchers across the globe on different manifestations of this syndrome in patients, from issues ranging from bone development to kidney function to metabolism and the immune system.

We hope some part of our finding will contribute to a medical treatment, Ahn says, And we know our findings can provide them with information. We dont know whether we can find a cure, but if we know more about how SON mutation affects patients metabolism, kidney issues, bone structure, neurological features, and immune system, we can do a lot for patient care and prevention and alleviate their symptoms.

See the original post:
UAB researcher shines light on a rare disease that causes developmental and intellectual delays - The Mix

Posted in Human Genetics | Comments Off on UAB researcher shines light on a rare disease that causes developmental and intellectual delays – The Mix

Toward a genome sequence for every animal: Where are we now? – pnas.org

Posted: at 5:32 am

Abstract

In less than 25 y, the field of animal genome science has transformed from a discipline seeking its first glimpses into genome sequences across the Tree of Life to a global enterprise with ambitions to sequence genomes for all of Earths eukaryotic diversity [H. A. Lewin etal., Proc. Natl. Acad. Sci. U.S.A. 115, 43254333 (2018)]. As the field rapidly moves forward, it is important to take stock of the progress that has been made to best inform the disciplines future. In this Perspective, we provide a contemporary, quantitative overview of animal genome sequencing. We identified the best available genome assemblies in GenBank, the worlds most extensive genetic database, for 3,278 unique animal species across 24 phyla. We assessed taxonomic representation, assembly quality, and annotation status for major clades. We show that while tremendous taxonomic progress has occurred, stark disparities in genomic representation exist, highlighted by a systemic overrepresentation of vertebrates and underrepresentation of arthropods. In terms of assembly quality, long-read sequencing has dramatically improved contiguity, whereas gene annotations are available for just 34.3% of taxa. Furthermore, we show that animal genome science has diversified in recent years with an ever-expanding pool of researchers participating. However, the field still appears to be dominated by institutions in the Global North, which have been listed as the submitting institution for 77% of all assemblies. We conclude by offering recommendations for improving genomic resource availability and research value while also broadening global representation.

The first animal genome sequence was published 23 y ago (1). The 97 millionbasepair (bp) (Mb) Caenorhabditis elegans genome assembly ushered in a new era of animal genome biology where genetic patterns and processes could be investigated at genome scales. As genome assemblies have accumulated for an increasingly diverse set of species, so too has our knowledge of how genomes vary and shape Earths biodiversity (e.g., refs. 2 and 3). Major shifts in genome availability and quality have been driven by two key events. First, the invention of high-throughput, short-read sequencing provided an economical means to generate millions of reads for any species from which sufficient DNA could be obtained. These 100-bp short reads could be assembled into useful, albeit fragmented, genome assemblies. Later, the rise of long-read sequencing allowed for similarly economical generation of reads that are commonly orders of magnitude longer than short reads, resulting in vastly more contiguous genome assemblies (4).

We have now entered an era of genomic natural history. Building on 250 y of natural history efforts to describe and classify the morphological diversity of life on Earth, we are gaining a complementary genomic perspective of Earths biodiversity. However, a baseline accounting of our progress toward a complete perspective of Earths genomic natural historywhere every species has a corresponding, reference-quality genome assembly availablehas not been presented. This knowledge gap is particularly important given the momentum toward sequencing all animal genomes, which is being driven by a host of sequencing consortia. For instance, the Vertebrate Genomes Project seeks to generate high-quality assemblies for all vertebrates (5), the Bird10K project seeks to generate assemblies for all extant birds (6), the i5K project plans to produce 5,000 arthropod genome assemblies (7), the Earth BioGenome Project aims to sequence all eukaryote genomes (8), and the Darwin Tree of Life project plans to sequence genomes for all eukaryotes in Britain and Ireland (https://www.darwintreeoflife.org/).

In this Perspective, we curated, quantified, and summarized genomic progress for a major component of Earths biodiversity: kingdom Animalia (Metazoa) and its roughly 1.66 million described species (9). We show that as of June 2021, 3,278 unique animals have had their nuclear genome sequenced and the assembly made publicly available in the National Center for Biotechnology Information (NCBI) GenBank database (10). This translates to 0.2% of all animal species. When viewed through the lens of major clades, massive disparities exist. For instance, 32 times more assemblies are available for chordates than arthropods (Fig. 1).

Variation in taxonomic richness and genome availability, quality, and assembly size across kingdom Animalia in GenBank (as of 28 June 2021). Taxonomic groups are clustered by phylogeny following ref. 11. Only groups with 30 or more available assemblies as of January 2021 are shown with the exception of Hominidae (n = 5 assemblies). In the tree, bold group names represent phyla and naming conventions follow those of the NCBI database. Of 34 recognized animal phyla, 10 do not have a representative genome sequence. (A) The total number of described species for each group following Zhang (9) and the references therein. (B) Genomic representation among animal groups for 3,278 species with available genome assemblies. Bars represent the magnitude of the observed minus the expected number of genomes given the proportion that each group comprises of described animal diversity. Significance was assessed with Fishers exact tests and significantly under- or overrepresented groups (P < 0.05) are denoted with asterisks. Gray numbers indicate the total number of species with available genome assemblies for each group. The number of available assemblies is not mutually exclusive with taxonomy; that is, a carnivore genome assembly would be counted in three categories (order Carnivora, class Mammalia, phylum Chordata). (C) The percentage of described species within a group with an available genome sequence (bars) and the percentage of those assemblies that have corresponding annotations (red circles). For many groups (e.g., arthropods), only a fraction of a percent of all species have an available genome assembly, making their percentage appear near zero. (D) Assembly size for all animal genome assemblies, grouped by taxonomy. (E) Contig N50 by taxonomic group. The sequencing technology used for each assembly is denoted by circle fill color: short-read (blue), long-read (yellow), or not provided (gray). In D and E, each circle represents one genome assembly and a few notable or outlier taxa are indicated with gray text.

To construct a database of the best available genome assembly for all animals, we downloaded metadata from GenBank for all kingdom Animalia taxa using the summary genome function in v.10.9.0 of the NCBI Datasets command-line tool on 4 February 2021. Next, we used the TaxonKit (12) lineage function to retrieve taxonomic information for each taxid included in the genome metadata. To gather additional data for each assembly (e.g., sequencing technology), we used a custom web scraper script. Both this web scraper script and the scripts used to download and organize the metadata are available in this studys GitHub repository (https://github.com/pbfrandsen/metazoa_assemblies). We later supplemented this initial dataset with a second round of metadata acquisition on 28 June 2021. For the full dataset, we hand-refined the NCBI taxonomy classifications to subdivide our dataset into three categories: species, subspecies, or hybrids (Dataset S1). If replicate assemblies for a taxon were present, we defined the best available assembly as the one with the highest contig N50 (the midpoint of the contig distribution where 50% of the genome is assembled into contigs of a given length or longer).

We filtered our data in several ways: We removed subspecies (unless they were the only representative for a species), hybrids, and assemblies that were shorter than 15.3 Mb [the smallest confirmed assembly size for a metazoan to date (13)] or had a contig N50 less than 1 kilobase (Kb). We also culled assemblies that were unusually short (i.e., 1 to 2.5 Mb) with information in their descriptions that indicated they were not true nuclear genome assemblies (e.g., exon capture). In total, we culled 407 assemblies based on the above criteria. The remaining assemblies were classified as short-read, long-read, or not provided if only short reads (e.g., Illumina) were used, any long-read sequences (e.g., PacBio) were used, or no information was available. We defined a species as having gene annotations available if any assembly for that taxon also had annotations in GenBank. When the best available assembly did not have annotations included or when multiple assemblies had annotations, we retained the annotations for the assembly with the highest contig N50. Finally, we used the submitting institution for each assembly as a surrogate for the institution that led the genome assembly effort. Using these data, we classified assemblies to a country, region (Africa, Asia, Europe, Middle East, North America, Oceania, South America, Southeast Asia), and the Global North (e.g., Australia, Canada, Europe, United States) or Global South (e.g., Africa, Asia including China, Mexico, Middle East, South America).

To test if clades were under- or overrepresented in terms of genome availability relative to their species richness, we compared the observed number of species with assemblies with the expected total for the group. We obtained totals for the number of described species overall and for each group from previous studies, primarily from Zhang (9) and the references therein. We assessed significance between observed and expected representation with Fishers exact tests (alpha = 0.05). We tested for differences in distributions of contig N50 or assembly size between short- and long-read genomes with Welchs t tests. For both display (i.e., Fig. 1) and analysis, we subdivided the dataset into the lowest taxonomic level that still contained 30 or more assemblies as of January 2021 (with the exception of hominids, which were given their own category due to their exceptionally high genomic resource quality).

Genome assemblies were available for 3,278 species representing 24 phyla, 64 classes, and 258 orders (Fig. 2A and Dataset S1). The dataset was exceptionally enriched for the phylum Chordata (which includes all vertebrates) with 1,770 assemblies for the group (54% of all assemblies) despite chordates comprising just 3.9% of animal species (P, Fishers < 1e-5; Fig. 1). Conversely, arthropods were underrepresented with 1,115 assemblies (34% of the dataset) for a group that comprises 78.5% of animal species (P, Fishers < 1e-5; Fig. 1). However, not all arthropods were underrepresented; five insect clades were overrepresented (Apidae [bees], Culicidae [mosquitoes], Drosophila [fruit flies], Formicidae [ants], and Lepidoptera [butterflies and moths]; all P, Fishers < 1e-3; Fig. 1). Collectively, of the 59 animal taxonomic groups included in our dataset, 14 groups were underrepresented, 17 were represented as expected, and 28 were overrepresented (primarily chordates; Fig. 1). Ten phyla had no publicly available genome sequence (Fig. 1). Over the 17-y GenBank genome assembly record, animal assemblies have been deposited at a rate of 0.52 species assemblies per day. Over the most recent year, however, this rate increased eightfold to 4.07 assemblies per day. If the most recent rate were maintained, all currently described animals would have a genome assembly available by 3136. To achieve this goal by 2031 instead, an average of 165,614 novel animal genomes would need to be sequenced and assembled each year (112 times faster than the rate for the most recent year).

Genome availability for kingdom Animalia versus taxonomic descriptions and over time. (A) The proportion of described taxonomic groups versus the number with sequenced genome assemblies from phyla to species. The gray plot (Right) is a zoomed-in perspective of the higher taxonomy-level categories in the full plot (Left). For genus through phylum, the number of described categories is based on the NCBI taxonomy. For species, the total number described is from Zhang (9). (B) The timeline of genome contiguity versus availability for animals according to the GenBank publication date (x axis; C). A rise in assembly contiguity has been precipitated by long-read sequencing. Particularly contiguous assemblies for a given time period are labeled. (C) The number of animal genome assemblies deposited in GenBank each month since February 2004. Several notable events are labeled. When specific dates are indicated, those (and the assemblies referred to) are included within that months total. For B and C, it is important to note that when a genome assembly is updated to a newer version, its associated date is also updated. Thus, the date associated with many early animal assemblies [e.g., C. elegans (1)] has shifted to be more recent with updates.

The average animal genome assembly was 1.02 gigabases (Gb) in length (SD 1.21 Gb) with a contig N50 of 2.26 Mb (SD 25.16 Mb; Fig. 1 D and E). Two animal genome assemblies were 25 Gb longer than all other assembliesthe axolotl [32.4 Gb (14)] and Australian lungfish [34.6 Gb (15)] (Fig. 1D). The smallest genome assembly in the dataset, the mite Aculops lycopersici, was over 1,000 times smaller, spanning just 32.5 Mb (16). Still smaller is the 15.3 Mb assembly of the marine parasite Intoshia variabili, which has the smallest animal genome currently known (13). But, since the I. variabili assembly was not available in GenBank as of June 2021, it was not included in our dataset.

Contiguity varied dramatically across groups. For instance, hominid assemblies (family Hominidae, n = 5) were the most contiguous with an average contig N50 of 24.2 Mb. Bird assemblies (class Aves, n = 515) were also highly contiguous (mean contig N50 = 1.4 Mb) despite being so numerous (and accumulating over a long period of time). On the other end of the spectrum, jellyfish and related species (phylum Cnidaria) exhibited some of the least contiguous genome assemblies with a mean contig N50 of 0.18 Mb (n = 65; Fig. 1E). Roughly 34% of animals with genome assemblies had corresponding annotations in GenBank but annotation rates differed substantially among groups (Fig. 1C). For example, the rate of arthropod annotations (22.3%) lags behind that for chordates (41.3%); however, much of this disparity appeared to be driven by the low and high annotation rates of butterflies and moths (order Lepidoptera) and birds (class Aves), respectively. Of 445 assemblies, just 6.5% of lepidopteran assemblies in GenBank have corresponding annotations versus 72.8% of birds (n = 519 assemblies; Fig. 1C). Notably, since most gene models are based on sequence similarity to known functional genes and not functional data, the true rate of annotation is likely even lower than reported here.

Animal genome assemblies have been contributed by researchers at institutions on every continent with permanent inhabitants, including 52 countries. From a regional perspective, institutions in North America (n = 1,331), Europe (n = 972), and Asia (n = 828) collectively accounted for 95.5% of all assemblies (Fig. 3A). And, nearly 70% of all animal genome assemblies have been submitted by researchers in just three countries: United States (n = 1,275), China (n = 676), and Switzerland (n = 317) (Fig. 3A). When countries were grouped by their inclusion in the Global North or South, similarly stark patterns emerged. Researchers affiliated with institutions in the Global North contributed roughly 75% of animal genome assemblies (Fig. 3B). From a taxonomic perspective, researchers at North American institutions have contributed the most insect and mammal assemblies, European researchers have contributed the most fish assemblies, and Asian researchers have contributed the most bird assemblies (Fig. 3A). The first assembly in GenBank from the Global North was deposited in 2004 and the first assembly from the Global South was deposited in 2011 (Fig. 3C). Since then, the number of assemblies deposited each year has steadily risen, with the proportions from the Global North and South staying relatively constant (Fig. 3C).

Where animal genome assemblies have been produced around the world according to the submitting institutions in GenBank. (A) For each geographic region, total numbers of genome assemblies are shown by dark circles with white lettering. This total is further broken down by country and taxon. For regions where more than four countries have contributed assemblies (e.g., Europe), an Other category represents all other countries. The same applies to all assemblies that are not insects, birds, fish, or mammals in the taxon plots. Countries are color-coded by assignment to the Global North or South. (B) The total number of genome assemblies contributed by countries in the Global North (e.g., United States, Europe, Australia) versus the Global South (e.g., Africa, South America, China, Mexico, Middle East). (C) The rate of genome assembly deposition by major sources in the Global North (Europe, United States) and Global South (China, Southeast [SE] Asia) as well as all other countries collectively in each (Other).

Use of long reads in genome assemblies and availability of key metadata also differ with geography. For assemblies deposited since 2018, researchers from the Global South have used long reads slightly more frequently than those from the Global North (25.7% versus 20.2%; Fig. 4A). However, researchers from the Global North were far less likely to report the types of sequence data used (19.9% of assemblies for the Global North versus 1.4% of assemblies for the Global South; Fig. 4A). Much of this difference appears to be driven by genome assemblies deposited by researchers at European institutions (Fig. 4B). This gap in metadata may reflect an issue with data mirroring between the European Nucleotide Archive (ENA) and GenBank. For instance, many new genome assemblies being generated by the United Kingdom, for example, are part of the Wellcome Sanger Institutes Darwin Tree of Life project, which is generating exceptionally high quality assemblies using long-read sequencing and depositing them into the ENA (Fig. 5). One region (Oceania) and three countries (Australia, Finland, India) reported long reads being used in more than 50% of deposited assemblies (Fig. 4 B and C).

Sequencing technologies used around the world (A) between the Global North versus Global South, (B) among regions, and (C) among countries. To limit bias due to the limited availability of long-read sequencing technologies before 2017 (Fig. 2B), only assemblies deposited on or after 1 January 2018 were included in the analysis and in C only countries that deposited five or more assemblies during the focal period (January 2018 to June 2021) are shown.

Examples of major contributors of genome assemblies for (A) butterflies (order Lepidoptera), (B) birds (class Aves), and (C) fish (primarily class Actinopterygii). Major contributors were defined as any consortium, organization, or project that has deposited more than 5% of all assemblies for butterflies and birds or 2.5% of all assemblies for fish.

Animal genome sequencing has dramatically progressed in the last 25 y. In that span, the field has moved from sequencing the first nuclear genome for any animal (1)a landmark achievementto targeting the generation of genome assemblies for all of Earths eukaryotic biodiversity (8). Here, we provided a contemporary perspective on progress toward this goal for the 1.6 million species in the animal kingdom (9). We showed that while tremendous progress has been made, major gaps and biases remain both in terms of taxonomic and geographic representation, at least within the most commonly used database of genomic resources, GenBank. For instance, a major bias exists in favor of vertebrates which are vastly overrepresented relative to their total species diversity (Fig. 1 AC). From the perspectives of biomedicine and human evolution, this bias is reasonable since humans are vertebrates. However, from a basic research perspective, particularly as it relates to genomic natural history and an overarching goal to sequence all animal genomes, there is a need to taxonomically diversify sequencing efforts.

At the highest taxonomic levels, 10 animal phyla still have no genomic representation. To illustrate the scale of this disparity versus other groups and the unique biology that is being overlooked, genome assemblies are available for 685 ray-finned fishes (class Actinopterygii) but none exists for phylum Nematomorpha, an 2,000-species clade of parasitic worms whose presence can dramatically alter energy budgets of entire stream ecosystems (17). Another phylum without genomic representationLoriciferawas first described in 1983 (18). This group of small, sediment-dwelling animals includes the only examples of multicellular species that spend their entire life cycles under permanently anoxic conditions (19). Loriciferans accomplish this feat by foregoing the energy-producing mitochondria found in virtually all animals in favor of hydrogenosome-like organelles akin to those found in prokaryotes inhabiting anaerobic habitats (19). Clearly, there is much to discover in terms of genomic diversity and functional biology in clades yet to be sampled.

A few select countriesprimarily the United States, several European nations, and Chinahave led the sequencing of the vast majority of animal genome assemblies (Fig. 3A). Aside from China, all of these countries are within the Global North. This pattern of geographic bias raises two potential issues for representation in animal genome science. First, the researcher population of animal genome sequencing likely does not reflect the global population. Second, sampling biases may exist toward the regions where most of the genome sequencing is occurring. Some of this bias is intentional and reflects funding goals for a given region. For instance, the Darwin Tree of Life project seeks to sequence the genomes of all 70,000 eukaryotic species living in Britain and Ireland. Still, however, similar to how sampling biases can yield skewed understanding of the natural world in other disciplines (e.g., ref. 20), so too could bias toward specific ecoregions, habitats, or other classifications skew genomic insight.

Inherently linked with questions of representation in animal genome science is the specter of parachute science (or helicopter research)the practice where international scientists, typically from wealthy nations, conduct studies in other countries that are often poorer without meaningful communication nor collaborations with local people (21). Parachute science has a long history in ecological research, and signatures of these practices have been observed for genome sciences. For instance, Marks etal. (22) found that the majority of plant genome assemblies for species that are native to South America and Africa were sequenced off-continent by researchers at European, North American, or Asian institutions. Given the sheer number of animal genome assemblies that have been submitted by a small number of countries and institutions, a similar pattern likely exists for animal genomes. However, to properly assess this issue, parsing authorship to quantify collaboration, at a minimum, would need to occur and this approach would still overlook key aspects of representation that need to be considered (e.g., if a researcher from the Global South is working at an institution in the Global North).

For the purpose of biological discovery, not all genome assemblies are created equal. As long-read sequencing technologies have matured, so too has the quality of assemblies being generated (4). In the last year alone, the largest ever animal genome assembly was deposited [Australian lungfish (15)] as well as the most complete human genome to date, a telomere-to-telomere assembly (23). Still, many species in GenBank only have low-quality assemblies available (i.e., contig N50 < 100 Kb with no corresponding gene annotations; Fig. 1). Since fragmentation and/or poor or missing gene annotations reduce the research value of an assembly, genome quality is important, particularly when the end goal is resource development for a broader community. As of April 2021, the Earth BioGenome Project sought assembly quality of 6.C.Q40 (https://www.earthbiogenome.org/assembly-standards) for reference genomes, where 6 refers to a 1e-6 contig N50 (i.e., 1 Mb). In our dataset, 568 assemblies (17.3%) reach this contiguity standard. And that number drops to 271 assemblies (8.3%) when contig N50 1 Mb and deposited gene annotations are both required. For reference, the C above refers to chromosomal scale scaffolding and Q40 to a less than 1/10,000 error rate. Neither of these metrics were assessed in this study.

Independent research laboratories, institutions, and consortia have contributed genome assemblies on both ends of the quality spectrum (Fig. 5). For example, among butterflies (order Lepidoptera), a bimodal quality distribution is being primarily driven by contributions made in 2021 by two submitting institutions, the Florida Museum of Natural History (e.g., ref. 24) and the Wellcome Sanger Institute (Fig. 5A). When viewing genome assembly contributions holistically across the animal Tree of Life, it is clear that two consortiathe Vertebrate Genomes Project (5) and the Darwin Tree of Life, part of the Wellcome Sanger Institutewarrant specific recognition for contributing exceptional genomic resources relative to closely related species (Fig. 5).

While animal genome science has dramatically matured in recent years, the field still rests on the cusp of massive change. Thousands of genome assemblies are now available for a wide range of taxa, a resource that can empower unprecedented scales of genomic comparison. Simultaneously, multiple consortia are building momentum toward their goals and generating some of the highest-quality genome assemblies ever produced. The field is also diversifying, with researchers around the world, particularly from the Global South, leading a rising number of efforts. These ongoing advances will yield higher-quality, more globally representative genome data for animals. As we collectively build toward this new genomic future, we offer recommendations to improve assembly quality and accessibility while also continuing to increase representation within the discipline.

The quality of a genome assembly is likely the most important factor dictating its long-term value. Genome assembly quality, however, is difficult to define. Here, we propose a holistic view on genome assembly quality that generally echoes the guidelines proposed by the Earth BioGenome Project and other consortia. Briefly, assemblies should reach minimum levels of contiguity (e.g., contig N50 > 1 Mb) and accuracy in order to be considered a reference that will likely not need to be updated for most applications. At a minimum, assemblies should also include high-quality gene annotations that perhaps take advantage of standardized pipelines [e.g., NCBI Eukaryotic Genome Annotation Pipeline (25)] to maximize compatibility across taxa. We recommend the field further improve the quality of genome assembly resources in two ways. First, refining and expanding the coordinated deposition of genome assemblies will improve the usability of the resources and reproducibility of analyses. It will also reduce duplications of effortthat is, when a group sequences a genome that has already been producedan issue that is likely to become increasingly common.

To refine and expand coordinated resource deposition, we recommend the continued use of GenBank (10) or one of the other archives that are members of the International Nucleotide Sequence Database Collaborationthe ENA and DNA Database of Japanas the central repositories for genome assemblies and their metadata given their tripartite data-sharing agreement. Next, we call on genetic archive administrators, consortia, and independent researchers to collectively improve the metadata submitted with each assembly and the mirroring of data across repositories. Too many assemblies lack basic information about the sequence data and methods used (e.g., Fig. 4) and, with the difficulty of linking assemblies to published studies (if available), it can be challenging or impossible to find this information. Further, an expansion of the metadata associated with each assemblyideally to make more of the categories required and expand demographic datawould make efforts to quantify geographic representation, for instance, far more straightforward. Alternatively, the metadata associated with genome assembly accessions could be integrated with existing efforts like the Genomic Observatories Metadatabase [GeOMe (26)]. Furthermore, a set of minimum quality characteristics for a genome assembly may need to be defined. A number of exceptionally low quality genome assemblies (e.g., with contig N50 values shorter than 1 Kb) that often cover only a small fraction of the expected total genome sequence length for a given group are present in GenBank. The presence of these assemblies raises the question: Where is the inflection point between resource quality and value to other researchers versus diluting the resources of a shared repository?

For our second recommendation, we amplify and expand the message of Buckner etal. (27) and Thompson etal. (28): Genome science needs specimen vouchers. Vouchers serve as a key physical link between taxonomy and molecular insight. Rarely, however, are vouchers referenced in publications of genome assemblies; only 11% of vertebrate assemblies included such a reference as of January 2020 (27). While vouchers represent a physical reference for assessing taxonomic classification or morphological variation, a properly stored voucher could also provide a long-term source of material for future resource improvement. If a physical specimen cannot be deposited, photographs and/or genomic DNA should be deposited in its place (e.g., ref. 29). Tied to the metadata discussion above, additional fields should be added to GenBank genome assembly accessions to directly link the assembly to a specimen, photo, or genomic DNA that has been deposited elsewhere.

Though geographic representation in animal genome science has improved in recent years, the discipline appears far from properly reflecting the global researcher pool. This issue is almost certainly multifaceted, likely stemming from a lack of infrastructure (e.g., fewer high-throughput sequencing platforms in developing countries), fewer resources for expensive molecular research, and a corresponding lack of training in genome data analysis. To bridge this gap and to empower a more diverse discipline, the nations and institutions that are devoting large amounts of resources to animal genome sequencing (e.g., China, United Kingdom, United States), and the researchers within those countries, should continue to develop meaningful collaborations with researchers within countries where their focal species reside (30). These meaningful collaborationswhere all parties are valued for their expertise and involved in decision makingimprove the science through transfer of local knowledge, provide a means for local researchers to expand their skillset and network while raising their scholarly profile, and, most importantly, can effectively end the practice of parachute science (30). Within-continent (or -country) initiatives also have transformative potential for people and genome research. For instance, the African-led effort to sequence 3 million African genomes over the next 10 y (the 3MAG project) will yield massive investment in African genomics, an incredible resource for understanding the full scope of human genetic diversity, and a new generation of African genome scientists (31). While focused on human genetics, the infrastructure and expertise that arise from the 3MAG project will no doubt translate to other taxa in the coming years.

A practical justification also exists for increasing representation in genome science, particularly as we seek to generate genome assemblies for every animal on Earth. The Global South is home to the bulk of the worlds biodiversity (32) and, as such, researchers in these regions have greater access to key habitats and specimens. Thus, it behooves everyone, including researchers in the Global North, to deepen collaborations with peers in the Global South while also helping to build indigenous capacity for collection, storage, and sequencing of new specimens.

Animal genome science continues to grow and expand at an exceptional rate. The coming years will surely see thousands, and perhaps tens of thousands, of new genome assemblies from across the Tree of Life, technological and analytical improvements, and some of the largest-scale and most in-depth studies of animal genome biology conducted to date. However, if we are to realize the ambitious goals of efforts like the Earth BioGenome Projecta self-described biological moonshotthe rate and mean quality of animal genome assembly production will have to increase by roughly two orders of magnitude. Regardless of rates and timelines, however, perhaps the most important goal for the future of animal genome science is that we empower a more diverse, representative researcher community in parallel with the generation of new resources.

All study data are included in the article and/or supporting information.

S.H. and J.L.K. were supported by NSF Award OPP-1906015. We thank Guangfeng Song, Eric Cox, and Anne Ketter from the Datasets development team at the NCBI for their responsiveness and receptiveness to improving this valuable tool for data science.

Author contributions: S.H., J.L.K., and P.B.F. designed research; S.H. and P.B.F. performed research; S.H. and P.B.F. analyzed data; and S.H., J.L.K., and P.B.F. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2109019118/-/DCSupplemental.

Read the rest here:
Toward a genome sequence for every animal: Where are we now? - pnas.org

Posted in Human Genetics | Comments Off on Toward a genome sequence for every animal: Where are we now? – pnas.org

A 45-year Legacy of Research and Collaboration < Yale School of Medicine – Yale School of Medicine

Posted: at 5:32 am

The first year that the National Institutes of Health (NIH) funded a group of Yale scientists to explore links between viruses and cancer, U.S. troops evacuated Vietnam, Gerald Ford was president, and the movie Jaws broke box office records.

The scientists wrote their 400-page proposal on typewriters and made 20 paper copies on Xerox machines. They put it all into a big box and sent it through the U.S. mail. It was 1975.

Their research pleased the NIH so much that the agency renewed the granteight times over 45 years. Titled Molecular Basis of Cancer Virus Replication, Transformation, and Innate Defense, it became the longest-running program project grant at Yale, and the third longest at the NIH. It brought more than $50 million to Yale labs and resulted in nearly 500 publications, many of them groundbreaking. The grant helped launch the careers of hundreds of scientists who trained under its leadership, including several on the Yale faculty.

Three of the grants principals are still at Yale: Daniel DiMaio, MD, PhD, Waldemar Von Zedtwitz Professor of Genetics, professor of therapeutic radiology, professor of molecular biophysics and biochemistry, and deputy director of Yale Cancer Center; Joan Steitz, PhD, Sterling Professor of Molecular Biophysics and Biochemistry; and I. George Miller, Jr., MD, John F. Enders Professor of Pediatrics and professor of epidemiology and of molecular biophysics and biochemistry. The grant has had a major impact on how we study viruses, said DiMaio, the principal investigator for the last 25 years. Otherwise, it wouldnt have lasted so long.

Theres lots of competition out there. Every five years the NIH looked at us closely to see if we were still productive and still a good investment. For many cycles of renewal, they decided that we were. After 45 years, he added, the grants three leaders decided not to reapply. Were sun-setting it. Its time to let a new generation take over. It is also time to applaud some of the grants research highlights. The human genome was sequenced about 20 years ago, but the first genome ever sequenced was funded by this NIH grant almost 25 years earlier, when Sherman Weissman, MD, Sterling Professor of Genetics and the grants first principal investigator, described the genetic makeup of a virus named SV40.

He developed some of the earliest techniques for sequencing nucleic acids, said DiMaio. That had a profound impact on medicine, and it came from studying tumor viruses. Before his death in 2020, another biochemist on the grant, Charles M. Redding, MD, Professor of Genetics, showed how DNA molecules can recombine to alter genes and proteins, which in turn can cause cancera crucial discovery. A former member of the program, David C. Ward, PhD, used the program funding to develop a technology called fluorescence in situ hybridization (FISH). It allows researchers to map chromosomes by locating specific DNA sequences and this technology is a standard diagnostic and research tool in labs worldwide. Steitz is a founding member of the grant program, which helped fund her landmark discovery of small noncoding RNAs made by viruses.

It turns out that RNAs arent just messengers, she said, but are also regulatory elements inside cells, and are important to be able to make an oncogenic virus. Weve discovered a lot of noncoding RNAs, and each new discovery brings all sorts of insights into how viruses are able to successfully infect cells.

Joan didnt just discover them, added DiMaio. She figured out how they work and discovered a lot of new chemistry and structural biology. It opened up a new field. Steitz identified some of those RNAs in collaboration with Miller, another founding member of the program grant. At the time, scientists knew that viruses caused cancer in animals, noted Miller, but nobody believed cancers in people were caused by viruses. Miller showed that Epstein-Barr Virus (EBV), a human virus, caused lymphomas in monkeys. Th is was the the first time a human virus had been shown to cause cancer in a primate, providing definitive evidence of its cancer-causing activity. Researchers now know that about 15 percent of all human cancers are caused by viruses. The grant also supported Millers groundbreaking discovery about how EBV gets activated.

Miller and Steitz collaborated to characterize a related virus that causes Kaposi sarcoma. The grant also supported DiMaios pioneering research into identifying viral oncogenes, and how turning them off stops cancer cells from growing. More recently, the grant funded his studies about how viruses get into cells.

It sounds simple, he said, but virus entry is a complicated process with hundreds of cellular proteins involved. Weve discovered some cellular proteins that are important for infection, determined how they work to support infection, and learned some new cell biology.

These breakthroughs stemmed from the basic science supported by the grant. Viruses educate us about every aspect of molecular biology and cell biology and immunology, said Miller. We keep on learning things from viruses that are applicable to cancer and to many other problems. If you want to make vaccines, for instance, you have to understand what the virus is doing.

The grant brought together people from many departments. We all look at virology from different perspectives, said Steitz. DiMaio is primarily a geneticist, Steitz a biochemist, and Miller a pediatrician. When we get together, continued Steitz, we have people coming in from many different disciplines and its great.

Their collaborations introduced each other to different approaches and techniques that influenced the direction of their research. Steitz started with bacterial viruses, then moved into animal viruses after conversations with Miller. Steitz helped Miller understand the advantages of using modern molecular techniques instead of cultivating viruses.

Weve really transferred knowledge back and forth, said DiMaio. Thats something very special about this grant. Were not working in isolation; we helped each other and molded each others careers. In turn, the partners in this program grant have molded the careers of several hundred grad students and postdocs who were trained under them and are now making their own contributions to the field and paying it forward with their own students. Its a long legacy, said DiMaio, like a huge extended family.

You can see evidence of that legacy in whats happening now with COVID19, said Steitz, whose career has helped us understand how RNA works. A lot of work on the immunology of this disease was done here, and the most effective COVID-19 vaccines are RNA-based vaccines.

Read the original here:
A 45-year Legacy of Research and Collaboration < Yale School of Medicine - Yale School of Medicine

Posted in Human Genetics | Comments Off on A 45-year Legacy of Research and Collaboration < Yale School of Medicine – Yale School of Medicine

The big idea: How much do we really want to know about our genes? – The Guardian

Posted: at 5:32 am

While at the till in a clothes shop, Ruby received a call. She recognised the womans voice as the genetic counsellor she had recently seen, and asked if she could try again in five minutes. Ruby paid for her clothes, went to her car, and waited alone. Something about the counsellors voice gave away what was coming.

The woman called back and said Rubys genetic test results had come in. She did indeed carry the mutation they had been looking for. Ruby had inherited a faulty gene from her father, the one that had caused his death aged 36 from a connective tissue disorder that affected his heart. It didnt seem the right situation in which to receive such news but, then again, how else could it happen? The phone call lasted just a few minutes. The counsellor asked if Ruby had any questions, but she couldnt think of anything. She rang off, called her husband and cried. The main thing she was upset about was the thought of her children being at risk.

Sign up to our Inside Saturday newsletter for an exclusive behind the scenes look at the making of the magazines biggest features, as well as a curated list of our weekly highlights

Over the next few weeks, she Googled, read journal articles, and tried to become an expert patient in what was quite a rare genetic disorder. There wasnt much to go on, and, not being a scientist herself, it was hard for her to evaluate what she did find. She learned that a link between mutations in this particular gene and connective tissue problems had only recently been discovered. A few years earlier this disease did not exist, or at least it had yet to be named.

Over time, some details emerged. Nobody had ever seen her own familys particular mutation in anyone else. So that meant it was very hard to know what to make of her situation. Her risk of a heart problem was surely increased, but nobody could say by how much.

From that initial phone call, it was six months before Ruby was seen by any other medical professional. She saw a cardiologist first, followed by a series of other specialists, since each appointment seemed to trigger a chain of others. The outcome was that Ruby would have regular body scans, and she began to take medication to lower her blood pressure, which she was told to do as a precaution for the rest of her life. She was also told to avoid anything that would cause her body to suddenly jolt. The vagueness of what this meant in practice became another source of worry. Should she carry on playing basketball, for example? She had always loved going abroad, but now travel insurance became exceptionally hard for her to get, partly because nobody knew how to categorise her.

Ruby believes that it was definitely better to have been informed of her genetic inheritance, because in her case there were things she could do to lower the risk of it becoming a real problem. But it took a long time for her to understand that she was not actually ill. She was only at risk of being ill. In fact, nothing had actually changed; she had only become aware of a possible future.

Every one of us is susceptible to one illness or another to some extent. As science progresses, many more of us will find ourselves in Rubys situation; drowning in estimates and probabilities that play games with our mind and our identity, and require us to make difficult decisions about our health and how we live. Every one of us will be shown to be subtly suboptimal. Or every one of us will be shown to be special. It depends on how you look at it. As Andrew Solomon writes in Far from the Tree: The general culture feels that deaf children are primarily children who lack something: they lack hearing. The Deaf culture feels that they have something membership in a beautiful culture.

We must be very careful in defining what constitutes disease or disability, especially as our ability to link genes with human traits expands. Bill Bryson puts it like this in The Body: A Guide for Occupants: Twenty years ago about 5,000 genetic diseases were known. Today it is 7,000. The number of genetic diseases is constant. What has changed is our ability to identify them.

Even in the hard data, things get messy. For example, someone who has inherited an immune system gene called HLAB27 is about 300 times more likely to develop the autoimmune disease ankylosing spondylitis. Around 8% of people in the UK have this gene variant and most do not suffer from the disease. Whats more, inheritance of this gene may be useful in fighting HIV. About one in 300 people infected with HIV are able to control the virus so that they dont go on to develop Aids, at least for a very long time, and HLAB27 occurs frequently in these people. So theres a yin and yang to genetic inheritance that is hard to fathom, even for experts.

One day, a watch that can measure a few simple things about your body will be seen as a laughably primitive tool. In the future, a whole cloud of information will be available and you must decide how much you want to delve into it. The agricultural, industrial and digital revolutions affected our environments and societies, but the genetic revolution equips us individually with new powers, and each of us will need to decide for ourselves if and when to deploy them. One way we should be preparing now is by making sure society is scientifically literate, and that our children are educated to understand risk, probability, genetic diversity and health.

Perhaps the insight to hold on to is that we are not merely our genes, our cells, our microbiome or our brain. We are all these things, but we are also more. How we see ourselves and others the stories we tell and the philosophies we live by are going to be just as important to our wellbeing.

Daniel M Davis is a professor of immunology at the University of Manchester and author of The Secret Body.

Far from the Tree: Parents, Children and the Search for Identity by Andrew Solomon (Vintage, 18.99)

The Body: A Guide for Occupants by Bill Bryson (Black Swan, 9.99)

The Code Breaker: Jennifer Doudna, Gene Editing and the Future of the Human Race by Walter Isaacson (Simon & Schuster, 30)

Link:
The big idea: How much do we really want to know about our genes? - The Guardian

Posted in Human Genetics | Comments Off on The big idea: How much do we really want to know about our genes? – The Guardian

Decode Genetics Publishes the Largest Ever Study of the Plasma Proteome – PRNewswire

Posted: at 5:32 am

REYKJAVIK, Iceland, Dec. 2, 2021 /PRNewswire/ -- In a study published today in Nature genetics, scientists at deCODE genetics , a subsidiary of the pharmaceutical company Amgen, demonstrate how measuring the levels of a large number of proteins in plasma at population scale when combined with data on sequence diversity and RNA expression dramatically increases insights into human diseases and other phenotypes.

Scientists at deCODE genetics have used levels of five thousand proteins in plasma targeted on a multiplex platform at population scale to unravel their genetic determinants and their relationship with human disease and other traits. Previous studies of the genetics of protein levels either consisted of much fewer individuals or tested far fewer proteins than the one published today.

Using protein levels in plasma measured with the Somascan proteomics assay, scientists at deCODE genetics tested the association of 27 million sequence variants with plasma levels of 4,719 proteins in 35,559 Icelanders. They found 18,084 associations between variants in the sequence and levels of proteins, where 19% are with rare variants identified with whole-genome sequencing. Overall, 93% of the associations are novel. Additionally, they replicated 83% and 64% of the reported associations from the largest existing plasma proteomic studies, based onthe Somascan methodand the antibody-based Olink assay, respectively.

The levels of proteins in plasma were tested for associations with 373 diseases and other traits and yielded 257,490 such associations. They integrated associations of sequence variants with protein levels and diseases and other traits, and found that 12% of around fifty thousand variants reported to associate with diseases and other traits also associate with protein levels.

"Proteomics can assist in solving one of the major challenges in genetic studies: to determine what gene is responsible for the effect of a sequence variant on a disease. In addition the proteome provides some measure of time because levels of proteins in blood rise and they fall as a function of time to and from events," said Kari Stefansson CEO of deCODE genetics and one of the senior authors on the paper.

Media contact:Thora Kristin AsgeirsdottirDecode genetics+354 894 1909

SOURCE deCODE genetics

View post:
Decode Genetics Publishes the Largest Ever Study of the Plasma Proteome - PRNewswire

Posted in Human Genetics | Comments Off on Decode Genetics Publishes the Largest Ever Study of the Plasma Proteome – PRNewswire

Humans were already the dominant predatory species on Earth 2 million years ago – Study Finds

Posted: at 5:32 am

TEL AVIV, Israel Two million years ago, were humans already the king of the hill on planet Earth? Researchers at Tel Aviv University say evidence points to early humans being apex predators, meaning they sat atop the food chain as the most formidable hunters around.

The study of prehistoric diets finds large mammals going extinct in many regions of the globe, along with the depletion of animal food supplies at the close of the Stone Age, forced humans to progressively expand plants into their diet, until they had no option but to tame both animals and plants and become farmers.

So far, attempts to reconstruct the diet of stone-age humans were mostly based on comparisons to 20th-century hunter-gatherer societies, explains Dr. Miki Ben-Dor in a media release. This comparison is futile, however, because two million years ago hunter-gatherer societies could hunt and consume elephants and other large animals while todays hunter-gatherers do not have access to such bounty. The entire ecosystem has changed, and conditions cannot be compared. We decided to use other methods to reconstruct the diet of stone-age humans: to examine the memory preserved in our own bodies, our metabolism, genetics, and physical build. Human behavior changes rapidly, but evolution is slow. The body remembers.

Dr. Ben-Dor and collaborators compiled roughly 25 examples from over 400 scholarly works addressing whether Stone Age people were specialist predators or generalized opportunistic feeders. Most of the teams evidence comes from studies of genomics, metabolic processes, physiology, and morphology of early humans.

One prominent example is the acidity of the human stomach, Dr. Ben-Dor adds. The acidity in our stomach is high when compared to omnivores and even to other predators. Producing and maintaining strong acidity require large amounts of energy, and its existence is evidence for consuming animal products. Strong acidity provides protection from harmful bacteria found in meat, and prehistoric humans, hunting large animals whose meat sufficed for days or even weeks, often consumed old meat containing large quantities of bacteria, and thus needed to maintain a high level of acidity.

Another indication of being predators is the structure of the fat cells in our bodies. In the bodies of omnivores, fat is stored in a relatively small number of large fat cells, while in predators, including humans, its the other way around: we have a much larger number of smaller fat cells, explained Dr. Ben-Dor. Significant evidence for the evolution of humans as predators has also been found in our genome. For example, geneticists have concluded that areas of the human genome were closed off to enable a fat-rich diet, while in chimpanzees, areas of the genome were opened to enable a sugar-rich diet.

The team used archaeological findings to enhance the data gathered from human biology. As an example, studies of stable isotopes found in the remains of ancient people, together with evidence of human-specific hunting behaviors, reveal that humans were expert hunters of big and mid-sized animals with a higher percentage of body fat. With this comparison, it became clear that humans were not only hypercarnivores but that they killed huge animals and obtained over 70 percent of their calories from meat as well.

Hunting large animals is not an afternoon hobby. It requires a great deal of knowledge, and lions and hyenas attain these abilities after long years of learning. Clearly, the remains of large animals found in countless archaeological sites are the result of humans high expertise as hunters of large animals. Many researchers who study the extinction of the large animals agree that hunting by humans played a major role in this extinction and there is no better proof of humans specialization in hunting large animals, Dr. Ben-Dor explains.

Most probably, like in current-day predators, hunting itself was a focal human activity throughout most of human evolution. Other archaeological evidence like the fact that specialized tools for obtaining and processing vegetable foods only appeared in the later stages of human evolution also supports the centrality of large animals in the human diet, throughout most of human history.

The collaborative model that scientists at Tel Aviv University (TAU) have been working on for over a decade offers a dramatic shift in the way we think about evolutionary history. Unlike the widely held belief that humans attribute their survivability to their nutritional adaptability, which enabled them to mix the killing of animals with the use of fruits and vegetables, the vision that is developing here shows that humans evolved primarily as carnivores of big animals.

Archaeological evidence does not overlook the fact that stone-age humans also consumed plants, the study author adds. But according to the findings of this study plants only became a major component of the human diet toward the end of the era.

Following the discovery of genetic variations and the style of unusual primitive tools for preparing plant foods, the investigators came to the conclusion that beginning approximately 85,000 years ago in Africa and approximately 40,000 years ago in Europe and Asia, progressive growth in plant food intake and dietary diversification occurred in line with changing ecological circumstances.

Additionally, the regional distinctiveness of the stone tool way of life grew, which is comparable to the variety of hunting tools in 20th century communities in terms of its origins and development. Throughout the two-million-year time frame in which humans were the most dominant species, scientists found extensive spans of uniformity and consistency in primitive tools, no matter how different the surrounding environment was.

Our study addresses a very great current controversy both scientific and non-scientific, says Prof. Ran Barkai. For many people today, the Paleolithic diet is a critical issue, not only with regard to the past but also concerning the present and future. It is hard to convince a devout vegetarian that his/her ancestors were not vegetarians, and people tend to confuse personal beliefs with scientific reality.

Our study is both multidisciplinary and interdisciplinary. We propose a picture that is unprecedented in its inclusiveness and breadth, which clearly shows that humans were initially apex predators, who specialized in hunting large animals. As Darwin discovered, the adaptation of species to obtaining and digesting their food is the main source of evolutionary changes, and thus the claim that humans were apex predators throughout most of their development may provide a broad basis for fundamental insights into the biological and cultural evolution of humans.

This study is published in the American Journal of Physical Anthropology.

Follow this link:
Humans were already the dominant predatory species on Earth 2 million years ago - Study Finds

Posted in Human Genetics | Comments Off on Humans were already the dominant predatory species on Earth 2 million years ago – Study Finds

Fulcrum Therapeutics Announces Additional HBG mRNA Induction from Higher Dose Cohorts in Phase 1 Healthy Adult Volunteer Trial of FTX-6058 for Sickle…

Posted: at 5:32 am

Achieved mean 5.6-fold HBG mRNA induction at 20mg and mean 6.2-fold at 30mg after 14 days of once-daily dosing, further supporting potential of FTX-6058 to provide a functional cure

Continues to be well-tolerated at higher doses with no serious adverse events observed to date

New mechanism data demonstrate potent downregulation of BCL11A and MYB, key repressors of fetal hemoglobin

On track to initiate enrollment in Phase 1b clinical trial in people with sickle cell disease and to submit an IND for treatment of other hemoglobinopathies by year-end 2021

Company to review results on conference call, including guest KOL Dr. Gerd Blobel, at 8:00 am ET today

CAMBRIDGE, Mass., Dec. 06, 2021 (GLOBE NEWSWIRE) -- Fulcrum Therapeutics, Inc. (Nasdaq: FULC), a clinical-stage biopharmaceutical company focused on improving the lives of patients with genetically defined rare diseases, today announced positive results from the 20mg and 30mg dose cohorts in healthy adult volunteers in its Phase 1 clinical trial of FTX-6058. The company also shared new preclinical mechanism data showing that FTX-6058 downregulated known repressors of fetal hemoglobin (HbF). FTX-6058 is an investigational oral HbF inducer that is being developed for the treatment of sickle cell disease and other hemoglobinopathies, such as beta-thalassemia.

Data from the 20mg and 30mg dose cohorts demonstrated a mean 5.6-fold induction and a mean 6.2-fold induction in HBG mRNA, respectively, at day 14. These increases were higher than those observed in the previously reported 2, 6 and 10mg dose cohorts. In preclinical studies of FTX-6058, increases in HBG mRNA have consistently translated to the same fold increases in HbF protein. Notably, human genetics show that 2-3-fold increases in HbF are associated with significantly improved outcomes, and even functional cures, in people with sickle cell disease. FTX-6058 has now demonstrated greater than a mean 2-fold induction starting with the 6mg dose.

Story continues

Despite progress in the treatment of sickle cell disease, existing therapies either offer limited benefit or, in the case of gene therapy, are not amenable to the great majority of patients and carry certain risks, said Gerd Blobel, MD, PhD, Frank E. Weise III Endowed Chair in Pediatric Hematology at Childrens Hospital of Philadelphia. The strategy of increasing the levels of fetal hemoglobin is based on solid genetic and clinical data. It can substantially reduce mortality and morbidity, and in cases where HbF reaches greater than 25-35% of total hemoglobin, lead to asymptomatic disease. The emerging clinical data on FTX-6058, combined with the new preclinical data showing that it downregulates BCL11A and MYB, two validated HbF repressors, is encouraging.

The data for FTX-6058 continue to exceed our expectations, said Bryan Stuart, Fulcrums president and chief executive officer. We believe the fold increases in HBG mRNA that we have now seen at multiple doses, starting at 6mg once-daily, have the potential to translate to levels of HbF protein that could provide a functional cure for people with sickle cell disease. Additionally, with the new insights into the mechanism of action, there's now a clear relationship between FTX-6058 and HbF induction that further affirms our conviction. We remain on track to begin enrolling people with sickle cell disease in our Phase 1b trial by year-end, with an eye toward reporting initial data, including HbF protein levels, in the second quarter of next year.

FTX-6058 Continues to be Well-Tolerated and Achieved Maximal HBG mRNA Induction at Higher Doses

The Phase 1 randomized, double-blind, placebo-controlled trial was designed to evaluate the safety, tolerability, and pharmacokinetics (PK) of ascending doses of FTX-6058 (NCT04586985). In the single-ascending dose (SAD) cohorts, healthy volunteers received one dose of either placebo or 2, 4, 10, 20, 30, 40 or 60mg of FTX-6058. In the multiple-ascending dose (MAD) cohorts, healthy volunteers received a once-daily dose of placebo or 2, 6, 10, 20 or 30mg of FTX-6058 for 14 consecutive days. Each MAD cohort had six subjects on drug and two on placebo. Food effect was also studied in a separate 20mg dose cohort. Exploratory measures were included in the MAD cohorts to assess target engagement, as well as changes in HBG mRNA and HbF-containing reticulocytes (F-reticulocytes). A 6mg dose cohort in people with sickle cell disease was recently added to this trial to further inform PK and pharmacodynamic modeling for future dose selection. All other cohorts in the trial have been completed, and data from the 2-40mg SAD cohorts and 2-10mg MAD cohorts were reported in August 2021.

Consistent with the earlier reported data, FTX-6058 has been generally well-tolerated with no serious adverse events reported to date and there were no discontinuations due to treatment-emergent adverse events (TEAEs) across all SAD and MAD cohorts. Across all cohorts, all TEAEs deemed possibly related to FTX-6058 were mild (Grade 1 or 2) and resolved. There was one Grade 4 TEAE in the 10mg MAD cohort and one Grade 3 TEAE in the food effect cohort, both of which were determined to be unrelated to FTX-6058. Data continued to show dose-proportional PK, with a mean half-life of approximately 6-7 hours in the MAD cohorts, supporting once-daily dosing, and no food effect was observed with FTX-6058. Data from the MAD cohorts continued to show robust target engagement, as evidenced by an approximately 75-95% reduction from baseline in H3K27me3 after 14 days of treatment.

The data also showed higher-fold induction of HBG mRNA at the higher doses, with FTX-6058 achieving maximal rate of HBG mRNA induction in the 20mg and 30mg cohorts. Maximal HBG induction has not yet been achieved with the higher doses of FTX-6058. Persistent HBG mRNA induction was observed for 7-10 days after treatment. F-reticulocytes also increased by a mean of 1.8-fold in the 20mg cohort and a mean of 2.4-fold in the 30mg cohort as of the safety follow up visit, which was seven to 10 days after conclusion of dosing. Increases in F-reticulocytes of any magnitude are a first indicator that HBG mRNA is translating to HbF protein production, which Fulcrum anticipates observing in the Phase 1b trial that will dose people with sickle cell disease for up to three months.

HBG mRNA Mean Fold Induction for FTX-6058 versus Placebo

2mg*

6mg*

10mg*

20mg

30mg

Mean FoldInduction

P-value

Mean FoldInduction

P-value

Mean FoldInduction

P-value

Mean FoldInduction

P-value

Mean FoldInduction

P-value

Day 7

1.28

0.3494

1.94

0.0135

2.08

0.0063

2.06

0.0072

2.29

0.0025

Day 14

1.20

0.5122

2.45

0.0025

3.54

<0.0001

5.63

<0.0001

6.15

<0.0001

Safety Follow-up (Day 21-24)

1.21

0.3736

2.75

<0.0001

3.22

<0.0001

6.45

<0.0001

6.13

<0.0001

F-Reticulocyte Mean Fold Increase for FTX-6058 versus Placebo

2mg*

6mg*

10mg*

20mg

30mg

Mean FoldIncrease

P-value

Mean FoldIncrease

P-value

Mean FoldIncrease

P-value

Mean FoldIncrease

P-value

Mean FoldIncrease

P-value

Day 7

0.53

0.1070

1.02

0.9524

0.83

0.6214

0.71

0.3831

1.50

0.2928

Day 14

0.88

0.6881

1.25

0.4895

2.23

0.0180

1.00

0.9880

1.71

More:
Fulcrum Therapeutics Announces Additional HBG mRNA Induction from Higher Dose Cohorts in Phase 1 Healthy Adult Volunteer Trial of FTX-6058 for Sickle...

Posted in Human Genetics | Comments Off on Fulcrum Therapeutics Announces Additional HBG mRNA Induction from Higher Dose Cohorts in Phase 1 Healthy Adult Volunteer Trial of FTX-6058 for Sickle…

Hub Genes as Prognostic Candidates of Thyroid Cancer | IJGM – Dove Medical Press

Posted: at 5:32 am

Introduction

Thyroid cancer (THCA) is one of the most common malignant tumors in human endocrine system and head and neck.1,2 The most common pathological type of thyroid carcinoma is papillary thyroid carcinoma (PTC), accounting for about 80% of the total number of THCA.35 Most thyroid cancers have good prognosis, the 5-year survival rate is more than 95%.6,7 Although the incidence rate of THCA is increasing year by year, the molecular biological mechanism of thyroid carcinogenesis and development is not clear.8,9 At present, the gold standard of preoperative diagnosis of THCA is fine needle aspiration biopsy (FNAB), but 2030% of FNAB results are uncertain or suspicious, and these patients need diagnostic surgery to identify the characteristics of tumors.10,11 The application of molecular markers is expected to help FNAB improve the ability of preoperative diagnosis of THCA.

Weighted gene co expression network analysis (WGCNA) is considered to be an effective network-based method, which can highlight the co-expressed gene modules and study the relationship between gene modules and phenotypes more effectively.1214 WGCNA has been successfully applied to explore the functional co expression modules and central genes of different diseases, such as pancreatic cancer,15 breast cancer16 and oral squamous cell carcinoma.17

In this study, we used WGCNA and other analytical methods to explore RNA data and clinical information of patients with thyroid tumor. Finally, four hub genes (CCDC146, SLC4A4, TDRD9 and MUM1L1) related to prognosis and transcription level were identified and verified, which showed good diagnostic potential and clinical relevance, and could be used as molecular markers in clinical diagnosis, treatment and prognosis of THCA.

The RNAseq expression data and related clinical traits of THCA were obtained from TCGA database (https://portal.gdc.cancer.gov/) and GEO database (http://www.ncbi.nlm.nih.gov/geo/). A total of 568 patient samples were obtained from TCGA data, including 568 samples, 510 THCA samples and 58 normal samples. Data preprocessing were used to process the raw data for perform background correction and quantile normalization, including robust multi-array average (RMA) background correction and the affy R package. The false discovery rate (FDR) <0.05 and log2FC 2 were used as the cut-off value to screen differentially expressed genes, which laid the foundation for further construction of co-expression network.

Co-expression modules are gene sets with high topological overlap similarity. The WGCNA package of R software was used to construct gene co-expression network of differentially expressed genes.11,12 This analysis procedure can identify highly related genes of differentially expressed genes, and genes with the same pathway or function can be clustered in similar gene modules. The cut-off value of co-expression module was set as P < 0.05. In order to further explore the dissimilarity of gene modules and visualize them, we select a cutting line for the module dendrogram and merge a few modules.

In order to explore the gene modules related to clinical features of THCA, the correlation between phenotype and module eigengenes was calculated, and the significance of each gene module was evaluated.18 The gene modules significantly related to clinical features (P < 0.05) were selected for further study.

To further explore potential function of the key modules, Gene Ontology (GO) term analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were conducted to be described and visualized (DAVID, http://david.abcc.ncifcrf.gov/).1921 The significance level was set as p-value <0.01 and FDR <0.05.

Hub genes usually have important biological functions and are highly associated with other nodes of the module. The module membership of each gene is calculated, and the module membership value of the hub gene is higher. To verify the reliability of the hub gene, GSE33630, GSE29265, GSE6004 and the Human Protein Atlas database (https://www.proteinatlas.org/)22 were used for further validation. Kaplan-Meier plotter was used for survival analysis.

Based on the differential analysis, a total of 3712 genes from TCGA for co-expression analysis were calculated. We also excluded cases with incomplete clinical information. The Pearsons correlation coefficient was used to cluster the sample. We draw a sample clustering tree after removing outliers (Figure 1A). Moreover, we selected the power of = 18 as the softthresholding (Figures 1B and C). Finally, 11 modules were screened out based on average hierarchical clustering and dynamic tree clipping (Figure 2). As shown in Figure 3A, the interaction between the 11 co-expression modules indicates that each gene module is independently verified in the network. Cyan module and grey module were highly correlated with sample type by Pearsons correlation analysis (Figure 3B). Therefore, these two modules were selected as clinically important modules for further analysis. In addition, the eigengene dendrogram and heatmap plotted were drawn to explore groups of related eigengenes and the dendrogram of all modules (Figures 3C and D).

Figure 1 Clustering of samples and determination of soft-thresholding power. (A)The clustering was based on the expression data of TCGA, which contained 568 samples, 510 THCA and 58 normal samples. The color intensity was proportional to sample type (normal and THCA), sex, age and disease status. (B) analysis of the scale-free fit index for various soft-thresholding powers (). (C) Analysis of the mean connectivity for various soft-thresholding powers.

Figure 2 Construction of co-expression modules by WGCNA package in R. (A) The cluster dendrogram of module eigengenes. (B) The cluster dendrogram of genes in TCGA. Each branch in the figure represents one gene, and every color below represents one co-expression module.

Figure 3 Identification of Key Modules. (A) Interaction relationship analysis of co-expression genes. Different colors of horizontal axis and vertical axis represent different modules. (B) Heatmap of the correlation between module eigengenes and the sample type of THCA. (C) Hierarchical clustering of module hub genes that summarize the modules yielded in the clustering analysis. (D) Heatmap plot of the adjacencies in the hub gene network.

Moreover, GO and KEGG analysis were conducted for the two key co-expression modules. The results show that the MEcyan module was involved in the important biological functions and signaling pathways related to tumorigenesis and development, such as protein binding, thyroid hormone synthesis, autophagy-animal, Insulin resistance, cell proliferation (Figures 4A and B). In the MEgrey module, functions are mainly enriched in transcriptional misregulation in cancer, cell adhesion molecules (CAMs), complexity and coagulation cascades, and ECM-receptor interaction and signal transduction (Figures 4C and D).

Figure 4 Plot of the enriched GO and KEGG terms in two key co-expression modules. (A) GO enrichment analysis of MEcyan module. (B) KEGG pathway enrichment analysis of MEcyan module. (C) GO enrichment analysis of MEgrey module. (D) KEGG pathway enrichment analysis of MEgrey module.

We screened the Mecyan module and MEgrey module as candidate prognosis genes. Through the Log rank test (p < 0.05) for further overall survival analysis, 4 hub genes (CCDC146, SLC4A4, TDRD9 and MUM1L1) were identified (Figure 5). Kaplan Meier survival curve of overall survival analysis showed that THCA patients with low expression levels of 4 hub genes had poor prognosis. Then, GSE33630, GSE29265, GSE6004 and the Human Protein Atlas database were used to validate the expression status of the 4 hub genes. As shown in Figures 6A and B, the volcano map and expression heat map of the differential RNAs in GSE33630 (45 normal samples and 60 THCA). The common genes between differential RNAs and the MEcyan and MEgrey module were identified by overlapping them in GSE33630 were presented in Figures 6C and D. The results showed that 4 hub genes in two key modules were also differential RNAs in GEO33630. Moreover, the transcriptional level of hub genes were verified in GSE29265 (Figure 7) and GSE6004 (Figure 8). In addition, the translational level of 4 hub genes also were verified by the human protein atlas database (IHC) (Figure 9).

Figure 5 Survival analysis of 4 hub genes based on the Kaplan-Meier plotter. The patients were stratified into high- and low- expression groups according to the median expression. (A) CCDC146. (B) SLC4A4. (C) TDRD9. (D) MUM1L1.

Figure 6 Validation of hub genes in GSE33630. (A) Volcano plot visualizing DEGs in GSE33630 (45 normal samples and 60 THCA). Fold Change=2, adj P=0.05. (B) heatmap hierarchical clustering reveals DEGs in cancer groups compared with those in control groups. (C) Identification of common genes between DEGs and the MEcyan module by overlapping them. The two hub genes in the MEcyan module were also DEGs in GSE33630 (D) Identification of common genes between DEGs and the MEgrey module by overlapping them. The two hub genes in the MEgrey module were also DEGs in GSE33630.

Figure 7 Validation of 4 hub genes in the transcriptional level. (AD) Validation of hub genes in GSE29265.(*P < 0.05, **P < 0.01, ***P < 0.001).

Figure 8 Validation of 4 hub genes in the transcriptional level. (AD) Validation of hub genes in GSE6004 (****P < 0.0001).

Figure 9 Validation of 4 hub genes in the translational level. (AD) Validation of 4 hub genes by The Human Protein Atlas database (IHC).

THCA is a rare malignant tumor, accounting for less than 1% of human malignant tumors.13 However, it is the most common cancer in the endocrine system and the cause of death for most endocrine cancers.4,5 The occurrence and development of THCA is a multifactorial disease process, involving a variety of molecular mechanisms.2 At present, many published studies mainly focus on the molecular mechanism of single gene in THCA, but ignore the interaction between genes due to its limitations.2326 Due to the development of big data, gene network is used to analyze the origin and development of various cancers. Therefore, in order to further explore novel and accurate molecular biomarkers for prognosis, we use RNA sequencing data and clinical information from TCGA and GEO databases to explore and verify potential key modules and hub genes through bioinformatics analysis of WGCNA.

In this study, we screened 2 key modules (MEcyan module and MEgrey module) from TCGA dataset by WGCNA analysis. 4 hub genes (CCDC146, SLC4A4, TDRD9 and MUM1L1) were then further screened and verified using the GEO database and survival analysis. Considering the functional and pathway enrichment analysis, the two key co-expression modules were significantly enriched in thyroid hormone synthesis, autophagy-animal, cell proliferation, transcriptional misregulation in cancer, cell adhesion molecules (CAMs), and ECM-receptor interaction and signal transduction. At the same time, we also found that these significantly expressed functional annotations and signaling pathways have been reported in THCA and many other cancers.2730

At present, studies on these five hub genes have been reported, and a large number of studies have shown that their expression plays an important role in the occurrence, development and prognosis of many tumors. It has been found that SLC4A4 contributes to the occurrence and development of tumors, and its involvement in tumor biological processes is specific.31 It is reported that mir-223-3p promotes tumor cell proliferation and metastasis by reducing the expression of SLC4A4 in renal clear cells.32 Gerber et al. Confirmed that the low expression of SLC4A4 in thyroid carcinoma and its diagnostic value.31 SLC4A4 has been reported to be associated with poor prognosis in patients with colon adenocarcinoma. The low expression of SLC4A4 is associated with lymph node invasion and distant metastasis of colon adenocarcinoma. At the same time, SLC4A4 expression is associated with the invasion of immune cells in colon adenocarcinoma. It may be a biomarker and therapeutic target for the diagnosis and prognosis of colon adenocarcinoma.33 Another study have reported that downregulation of expression in TDRD9-positive cell lines causes a decrease in cell proliferation, S-phase cell cycle arrest, and apoptosis, which can be used as a marker for prognosis and as a potential therapeutic target in a subset of lung carcinomas.34 More importantly, one study by Wang et al suggested that TDRD9 was significantly related to the prognosis of THCA. CCDC146 was a potential therapeutic strategy for lymph node metastasis of breast cancer.35 MUM1L1 has not been previously reported to be associated with cancer. The results show that the occurrence and development of tumor may be regulated by multiple genes, which may provide more research strategies for the diagnosis and treatment of THCA.

There are two deficiencies in this study. First, the results were not verified in clinical samples. Considering the high reliability of high-throughput sequencing expression data and the sufficient number of samples included in the study, this deficiency can be made up to a certain extent, but it can not completely replace the significance of clinical sample verification. Second, the selected molecules have no functional validation. Although the signal pathway of differential genes was analyzed by KEGG pathway in this study, the newly discovered molecules that have not been reported should be functional verified.

In summary, based on the TCGA database, we analyzed the gene expression profile of THCA and successfully identified four hub genes associated with THCA prognosis, which showed good diagnostic potential and clinical relevance as molecular markers for clinical diagnosis, treatment and prognosis of THCA.

THCA, thyroid cancer; WGCNA, weighted gene co-expression network analysis; TCGA, The Cancer Genome Atlas; GEI, Gene Expression Omnibus; PTC, papillary thyroid carcinoma; FNAB, fine needle aspiration biopsy; FMA, robust multi-array average; FDR, false discovery rate; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; CAMs, cell adhesion molecules.

This study was approved in accordance with the Ethical Standards of the Institutional Ethics Committee of University of Chinese Academy of Sciences - Shenzhen Hospital and with the 1964 Helsinki declaration and its later amendments or comparable Ethical Standards.

The results of this study are based on the data from TCGA (https://www.cancer.gov/tcga) and GEO database (http://www.ncbi.nlm.nih.gov/geo/). We thank all the authors who provided the data for this study.

This work was supported by the Startup Fund for scientific research, University of Chinese Academy of SciencesShenzhen Hospital (Grant No. HRF-2020012); and Guangming District Soft Science Research Project (Grant No. 2021R01063).

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1. Kitahara CM, Sosa JA. The changing incidence of thyroid cancer. Nat Rev Endocrinol. 2016;12(11):646653. doi:10.1038/nrendo.2016.110

2. Hu J, Yuan IJ, Mirshahidi S, Simental A, Lee SC, Yuan X. Thyroid carcinoma: phenotypic features, underlying biology and potential relevance for targeting therapy. Int J Mol Sci. 2021;22(4):1950. doi:10.3390/ijms22041950

3. Patel S, Pappoppula L, Guddati AK, et al. Analysis of race and gender disparities in incidence-based mortality in patients diagnosed with thyroid cancer from 2000 to 2016. Int J Gen Med. 2020;13:15891594. doi:10.2147/IJGM.S280986

4. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2021. CA Cancer J Clin. 2021;71(1):733. doi:10.3322/caac.21654

5. Davies L, Hoang JK. Thyroid cancer in the USA: current trends and outstanding questions. Lancet Diabetes Endocrinol. 2021;9(1):1112. doi:10.1016/S2213-8587(20)30372-7

6. Chen W, Zheng R, Zuo T, Zeng H, Zhang S, He J. National cancer incidence and mortality in China, 2012. Chin J Cancer Res. 2016;28:111.

7. Davies L, Welch HG. Current thyroid cancer trends in the United States. JAMA Otolaryngol Head Neck Surg. 2014;140(4):317322. doi:10.1001/jamaoto.2014.1

8. Schlumberger MJ. Papillary and follicular thyroid carcinoma. N Engl J Med. 1998;338(5):297306. doi:10.1056/NEJM199801293380506

9. Mazzaferri EL, Kloos RT. Clinical review 128: current approaches to primary therapy for papillary and follicular thyroid cancer. J Clin Endocrinol Metab. 2001;86(4):14471463. doi:10.1210/jcem.86.4.7407

10. LiVolsi VA. Papillary thyroid carcinoma: an update. Mod Pathol. 2011;24(S2):S1S9. doi:10.1038/modpathol.2010.129

11. Nikiforov YE, Nikiforova MN. Molecular genetics and diagnosis of thyroid cancer. Nat Rev Endocrinol. 2011;7(10):569580. doi:10.1038/nrendo.2011.142

12. Giulietti M, Occhipinti G, Principato G, Piva F. Weighted gene co-expression network analysis reveals key genes involved in pancreatic ductal adenocarcinoma development. Cell Oncol. 2016;39(4):379388. doi:10.1007/s13402-016-0283-7

13. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9(1):559. doi:10.1186/1471-2105-9-559

14. Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005;4:Article17. doi:10.2202/1544-6115.1128

15. Zhou Z, Cheng Y, Jiang Y, et al. Ten hub genes associated with progression and prognosis of pancreatic carcinoma identified by co-expression analysis. Int J Biol Sci. 2018;14:124136. doi:10.7150/ijbs.22619

16. Tang J, Yang Q, Cui Q, et al. Weighted gene correlation network analysis identifies RSAD2, HERC5, and CCL8 as prognostic candidates for breast cancer. J Cell Physiol. 2020;235(1):394407. doi:10.1002/jcp.28980

17. Hu X, Sun G, Shi Z, Ni H, Jiang S. Identification and validation of key modules and hub genes associated with the pathological stage of oral squamous cell carcinoma by weighted gene co-expression network analysis. PeerJ. 2020;8:e8505. doi:10.7717/peerj.8505

18. Shi K, Bing ZT, Cao GQ, et al. Identify the signature genes for diagnose of uveal melanoma by weight gene co-expression network analysis. Int J Ophthalmol. 2015;8:269274.

19. Dennis G Jr, Sherman BT, Hosack DA, et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 2003;4(5):P3. doi:10.1186/gb-2003-4-5-p3

20. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet. 2000;25:2529. doi:10.1038/75556

21. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 2002;40:D109D114. doi:10.1093/nar/gkr988

22. Uhln M, Fagerberg L, Hallstrm BM, et al. Proteomics. Tissue-based map of the human proteome. Science. 2015;347(6220):1260419. doi:10.1126/science.1260419

23. Li H, Tian Z, Qu Y, et al. SIRT7 promotes thyroid tumorigenesis through phosphorylation and activation of Akt and p70S6K1 via DBC1/SIRT1 axis. Oncogene. 2019;38(3):345359. doi:10.1038/s41388-018-0434-6

24. Li X, Ruan X, Zhang P, et al. TBX3 promotes proliferation of papillary thyroid carcinoma cells through facilitating PRC2-mediated p57KIP2 repression. Oncogene. 2018;37(21):27732792. doi:10.1038/s41388-017-0090-2

25. Li Y, Yang Q, Guan H, Shi B, Ji M, Hou P. ZNF677 suppresses Akt phosphorylation and tumorigenesis in thyroid cancer. Cancer Res. 2018;78(18):52165228. doi:10.1158/0008-5472.CAN-18-0003

26. Ock S, Ahn J, Lee SH, et al. Thyrocyte-specific deletion of insulin and IGF-1 receptors induces papillary thyroid carcinoma-like lesions through EGFR pathway activation. Int J Cancer. 2018;143(10):24582469. doi:10.1002/ijc.31779

27. Ren Z, Zhang L, Ding W, et al. Development and validation of a novel survival model for head and neck squamous cell carcinoma based on autophagy-related genes. Genomics. 2021;113(1):11661175. doi:10.1016/j.ygeno.2020.11.017

28. Xiao H, Zhang Y, Li Z, et al. Periostin deficiency reduces diethylnitrosamine -induced liver cancer in mice by decreasing hepatic stellate cell activation and cancer cell proliferation. J Pathol. 2021;255(2):212223. doi:10.1002/path.5756

29. Yuan Y, Cao W, Zhou H, Qian H, Wang H. H2A.Z acetylation by lincZNF337-AS1 via KAT5 implicated in the transcriptional misregulation in cancer signaling pathway in hepatocellular carcinoma. Cell Death Dis. 2021;12(6):609. doi:10.1038/s41419-021-03895-2

30. Pogorzelska-Dyrbus J, Szepietowski JC. Adhesion molecules in non-melanoma skin cancers: a comprehensive review. In Vivo (Brooklyn). 2021;35:13271336. doi:10.21873/invivo.12385

31. Gerber JM, Gucwa JL, Esopi D, et al. Genome-wide comparison of the transcriptomes of highly enriched normal and chronic myeloid leukemia stem and progenitor cell populations. Oncotarget. 2013;4(5):715728. doi:10.18632/oncotarget.990

32. Xiao W, Wang X, Wang T, Xing J. MiR-223-3p promotes cell proliferation and metastasis by downregulating SLC4A4 in clear cell renal cell carcinoma. Aging. 2019;11(2):615633. doi:10.18632/aging.101763

33. Chen X, Chen J, Feng Y, Guan W. Prognostic value of SLC4A4 and its correlation with immune infiltration in colon adenocarcinoma. Med Sci Monit. 2020;26:e925016. doi:10.12659/MSM.925016

34. Guijo M, Ceballos-Chvez M, Gmez-Marn E, Basurto-Cayuela L, Reyes JC. Expression of TDRD9 in a subset of lung carcinomas by CpG island hypomethylation protects from DNA damage. Oncotarget. 2018;9(11):96189631. doi:10.18632/oncotarget.22709

35. Wang Z, Liu W, Chen C, Yang X, Luo Y, Zhang B. Low mutation and neoantigen burden and fewer effector tumor infiltrating lymphocytes correlate with breast cancer metastasization to lymph nodes. Sci Rep. 2019;9(1):253. doi:10.1038/s41598-018-36319-x

See the rest here:
Hub Genes as Prognostic Candidates of Thyroid Cancer | IJGM - Dove Medical Press

Posted in Human Genetics | Comments Off on Hub Genes as Prognostic Candidates of Thyroid Cancer | IJGM – Dove Medical Press