How a Google Engineer Used Her AI Smarts to Create the Ultimate Family Archive – PCMag UK

Posted on December 20, 2020 by Prof Baldwin

(Image: Getty)

COVID-19 lockdowns perhaps gave a few of you some time to organize old photos that have been languishing on SD cards or in boxes, but how many of you built an AI-powered searchable archive of family videos from almost 500 hours of footage?

Dale Markowitz, an Applied AI Engineer and Developer Advocate at Google, did just that. The Texas-based Princeton grad took hours of disorganized, miniDV tape footage housed on Google Drive and turned it into an archive "that let me search my family videos by memories, not timestamps," she wrote in a July blog post. It was the ultimate Father's Day gift.

We spoke with Markowitz recently to find out how machine learning helped her get it done, but why AI is only one part of the puzzle when it comes to solving complex problems.

Although this project used a raft of Google tools, which well get to, it was actually not for the day job, but the coolest Father's Day gift, right?[DM] At Google, I spend lots of time trying to think up new use cases for AI and build prototypes focused on the more business-y side. But I always wanted to work on more fun, zany stuff and, with quarantine, I finally had SO MUCH TIME. So, yes, this one was a gift for my dadwho, by the way, is also a huge programmer nerd who works in machine learning.

As your dad works in machine learning, he would totally get what it took to build it out. Let's go "under the hood" with the details.[DM] Sure. So I uploaded all of my dad's videos to a cloud storage bucket and then analyzed them with the Video Intelligence API, which returns JSON. Basically, the API does all the heavy lifting including: detecting scene changes; extracting text and timestamps on screen using computer vision; transcribing audio; tagging objects and scenes in images; and so on.

Because you needed to apply intelligence to what was probably hours of untagged material, right?[DM] Exactly. When my dad recorded on miniDV, the clips werent saved into separate files. They'd all be smashed into one long, three-hour recording, separated by little flashes of black and white. The API was able to pick out where those clip boundaries should have been.

Regarding audio transcription, that must have helped in tagging, categorizing, and identifying what was on all those miniDVs.[DM] Yes, and I found this to be the coolest part of the project, because it let me search for hyper specific things like "Pokemon" or "Gameboy." Also, my dad was a big video narrator, so I could search his commentary for milestones.

As an applied AI engineer, you're experienced in this field, but others using the API won't need to be up on machine learning, right? Essentially, it's not quite, but almost, out-of-the-box in terms of building out the metadata and intelligence?[DM] Confirmed. You dont need any ML expertise to build out this project. Its very developer-friendly. Having said that, there was one more AI part of this project, which was implementing search. I wanted to be able to search through all those transcripts, scene labels, and objects labels, but I didnt want to have to exactly match the words.

Because you needed a proper semantic search layer for this project?[DM] Exactly. I wanted to allow for near-matches and misspellings and even matching synonyms, such as treating the word trash the same as garbage." As you know, in semantic search," you want an algorithm that understands the semantic meaning of what youre saying regardless of the specific words and spellings you use. For that, I used a great Search as a Service tool called Algolia. I uploaded all my records (as JSON) and Algolia provided me with a smart semantic search endpoint to query those records.

Obviously, youve got a corporate account as a Googler to use all these tools. But what would the cost be for a non-Googler to do this? And are you sharing your GitHub codeyour GitHub code so people can replicate this?[DM] Yep, the code is all open source. Though I should add that a lot of these features are available through Google Photos, which works with videos too, apart from the ability to search transcripts. Cost-wise, I analyzed 126GB of video (about 36 hours) and my total cost was $300. I know that seems high, but it turns out the bulk of the cost came from one single type of analysisdetecting on-screen text. Everything else amounted to just $80. As on-screen text was the least interesting attribute I extracted, I recommend leaving that out unless you really need it. Also, the first 1,000 minutes of video falls in the Google Cloud free tier. Besides the ML parts, storing my data in Algolia runs me around $50 a month for around 90,000 JSON objects. But I havent done much optimizing, and they do have a free tier.

Youre the overall host on YouTube for the new series Making with Machine LearningMaking with Machine Learning." Whats up next there in terms of projects?[DM] Machine-generated recipes, automatically dubbed videos, and an AI dash cam. Well, if I can get those things to workI never really know until I start building them. Another thing Ive been fascinated with lately are ways to do machine learning with little or no data, and zero-shot learning. More on that coming soon.

Well look out for those. Now lets do some background on you: What drew you to study computer science and why specifically at Princeton?[DM] I originally decided to go to Princeton because I wanted to be a theoretical physicist, and I really admired Professor Richard Feynman when I was in high school.But back in 2013, when I was a sophomore in college, it really felt like computer science was the place to be: everything was developing so quicklyArduino, AI, brain-computer interfaces. In retrospect, though I didnt know it then, majoring in computer science was a great decision, because theres almost no field, scientific or otherwise, that hasnt benefited from machine learning. In fact, sometimes it seems like some of the most cutting-edge work in biology and neuroscience and physics is coming from ML.

Whats caught your eye recently in terms of ML?[DM] Specifically within the field of biophysics, Id say DeepMinds new protein folding model, AlphaFold 2, is a great example of ML.

You worked as a researcher on brain-machine interfaces to measure sustained attention. Can you give us a brief explanation of what you were doing there?[DM] In that lab, some researchers had discovered they could (roughly) measure attention by having people do an extremely mundane task in an fMRI machine and then analyzing their brain scans. They were actually using deep learning, which was pretty revolutionary in neuroscience at the time. The problem is that fMRI machines are extremely expensive. I was investigating whether you could get similar results using an EEG machine (which is much cheaper), and specifically a portable, wireless EEG (which is much much cheaper). The results were mixed, but I think, since then, portable EEG machines have gotten better at taking clear readings, and I have gotten better at machine learning.

You moved from data science to applied AI and your focus is on how people can apply AI, ML, etc. But do you also interface with the more theoretical AI people at Google too or only tangentially?[DM] There is a pretty tight relationship between Google Cloud and Google Research. The field changes so quickly that there has to be. When a splashy research paper comes out, it takes almost no time before customers start asking how to get it on Google Cloud. One good example is around explainability and responsible AI. Now that machine learning is becoming more accessible, more folks can build their own models. But how do you know those models are accurate? How do you know you can trust them, and that they wont make predictions that are embarrassing or offensive? The answer is closely linked to explainability, our ability to understand why models make the predictions they doi.e. its hard to trust black box models.

Yeah, theres a big push for explainable AIexplainable AI right now.[DM] This is a tough problem, and an active area of research across Google. But weve been working very closely with Google Research to add explainability into our customer-facing products.

At Google I/OGoogle I/O 2019, you focused on democratizing AIallowing developers to use Googles AI tools, like AutoML, and off-the-shelf APIs to create cool stuff. Tell us more about that. [DM] ML has gotten way easier and more accessible for developers over the past five years. And one of the reasons thats so exciting is because more people from different backgrounds start using it and we end up with very creative projects. Sometimes people see a project Ive built and theyll riff on it, which I think is super cool. For example, I built a tennis serve analyzer, and then some folks built a cricket and a badminton version. I saw a yoga pose detector, and someone built an AI Diary using some of the same tech as my video archive analyzer.

Thinking more broadly, it occured to me that many of your AI-powered projects are applications that could help non-neurotypical people to navigate the world. For example, you engineeredengineered an AI Stylist which could illuminate social cues and help people be workplace appropriate or situation appropriate.[DM] Interesting. On one hand, there are definitely great applications of AI for non-neurotypical folks. The most compelling one Ive heard of involves using computer vision to understand facial expressions and emotions. On the flip side, I try to avoid using machine learning in situations where the result of a mistake is catastrophic.

On that note, when I interviewed Dr. Janelle Shaneinterviewed Dr. Janelle Shane, she had some bizarre brownie recipes generated by one of her AIs, because that stuff is harder than most people imagine. For example, AI doesnt have common sense," so you had to build in rules that a human wouldnt need - i.e. I need two shoes, a left one and a right one, but only one shirt or hat." Any wardrobe mishaps with the stylist before it got it right?[DM] Oh yes, 100%. Furthermore, I would say using a combination of ML and human rules is a pretty good design pattern. One mistake I see people make a lot is try to completely, end-to-end solve a problem with AI. Its better to use ML only for the parts of your system that really need it, such as recognizing a clothing item from an image. But then writing simple rules in places where ML isnt necessarysuch as combining clothing items to make an outfit. Human rulesi.e. An outfit contains exactly two shoesare usually easier to understand, debug, and maintain than ML models. One thing that seemed to trip up the stylist app was that I took a bunch of pictures of clothing on mannequins; my vision model was trained on pictures of people, not mannequins.

The vision model which was looking for humans not static clothes horses?[DM] Yup. That really tricked the model. It was convinced the mannequin was a suitcase or something. By the way, I published the code on GitHub if others want to try it out.

At Google I/O, you also talked about the custom sentiment analysis using natural language. Has that been deployed into something cool like a concurrent translator that can detect irony or emotioni.e. good for non-native speakers while on business trips abroadif we ever get to do those again?[DM] Interesting idea. Were still struggling with irony detection in NLP. But can you really blame a computer for not recognizing irony when lots of humans cant, either?

Good point.[DM] I also suspect irony is largely contextuali.e. text paired with an image, or spoken in a particular way, which makes the problem more challenging. Detecting emotion from speech is a cool idea. But Id probably opt not to analyze just the words the person is saying (text sentiment) and focus more on their intonation. Sounds like a neat project. But like many ML problems, the challenge is finding a good training dataset.

True. So, wrapping it up, do you see the AI tools that youre working with now are a way of building a smart layer between IRL and our silicon cousins (embodied/non-bodied AIs)? For example, when I interviewed AI researcher Dr. Justin Liinterviewed AI researcher Dr. Justin Li, we talked about AI being able to anticipate our needs before we know we have them. [DM] In the future, yes, I think humans and AIs will work closely together. But for me whats most compelling are use cases where machine learning models are uniquely well-suited to do something that humans cant do or arent good at. For example, people make really good assistants and companions and teachers, but theyre not very good at processing millions of web pages in seconds or discovering exoplanets or predicting how proteins fold. So its in these applications, I believe, that AI can make the most impact.

GT Gain Therapeutics SA Announces Funding from the Swiss Innovation Agency Supporting a 3-year Research Collaboration Project with the Institute for…

Posted on December 20, 2020 by Prof Baldwin

- Researchers will further develop the Site-directed Enzyme Enhancement Therapy (SEE-Tx) technology for the treatment of rare genetic and neurodegenerative diseases

- The collaborative agreement unites resources from the Institute for Research in Biomedicine (IRB)-USI; Neurocentro -Ente Ospedaliero Cantonale (EOC) & GT GAIN Therapeutics, SA

LUGANO, Switzerland, Dec. 15, 2020 (GLOBE NEWSWIRE) -- GT Gain Therapeutics SA (Gain), a subsidiary of Gain Therapeutics, Inc.,a biotechnology company focused on redefining drug discovery by identifying and optimizing allosteric binding sites that have never before been targeted, along with the Institute for Research in Biomedicine (IRB, affiliated to USI Universit della Svizzera Italiana) and the Neurocentro announced today that Innosuisse, the Swiss Innovation Agency, has agreed to support the CHF 1.5M project by funding approximately CHF 850,000 to leverage these world class research organizations and promote continued innovation in the area of CNS diseases. The remaining support will come from Gain to cover the cost of related headcount expenses being dedicated to the project. The award specifically supports further investigation of the mechanisms of action of Gains proprietary STAR small molecule therapeutic candidates on lysosomal dysfunction and prion-like transmission of toxic forms of protein aggregates associated with neurodegenerative diseases.

Being recognized as an Innosuisse funded innovation project reinforces the support for our innovative approach and unites us with scientists and researchers as passionate as we are to discover new therapeutic approaches using our SEE-Tx target identification platform, said Manolo Bellotto, Ph.D., President and General Manager of Gain. The specific know-how in protein quality control by Prof. Molinari at the IRB and the expertise in neurosciences of Dr. Paganetti from Neurocentro will certainly contribute to a further understanding of the mechanism of action of our molecules in rare and genetic diseases, thus accelerating their development towards the clinic.

Dr.Maurizio Molinari, group leader of the Protein Folding and Quality Control research team from the IRB added, We are honored to be collaborating with the Gain team and to evaluate Gains novel therapeutic candidates as we work to advance new, innovative treatment options for rare lysosomal disorders and neurodegenerative diseases for which there are currently few treatment options. We are grateful to the Swiss Innovation Agency for their support and look forward to initiating this critical research program.

About Gain Therapeutics, Inc.

Gain Therapeutics, Inc. is redefining drug discovery with its SEE-Tx target identification platform. By identifying and optimizing allosteric binding sites that have never before been targeted, Gain is unlocking new treatment options for difficult-to-treat disorders characterized by protein misfolding. Gain was originally established in 2017 with the support of its founders and institutional investors such as TiVenture, 3B Future Health Fund (formerly known as Helsinn Investment Fund) and VitaTech. It has been awarded funding support from The Michael J. Fox Foundation for Parkinsons Research (MJFF) and The Silverstein Foundation for Parkinsons with GBA, as well as from the Eurostars-2 joint program with co-funding from the European Union Horizon 2020 research and Innosuisse. In July 2020, Gain Therapeutics, Inc. completed a share exchange with GT Gain Therapeutics SA., a Swiss corporation, whereby GT Gain Therapeutics SA became a wholly owned subsidiary of Gain Therapeutics, Inc. For more information, visit https://www.gaintherapeutics.com/

About the Institute for Research in Biomedicine (IRB)

The Institute for Research in Biomedicine was founded in 2000 with the clear and ambitious goal of advancing the study of human immunology, with particular emphasis on the mechanisms of host defense. The activities of the 13 research groups now extend beyond immunology to include the fields of DNA repair, rare diseases, structural and cell biology. Located in Bellinzona, capital of the Italian-speaking Canton of Ticino, the IRB is an affiliated institute of the USI Faculty of Biomedical Sciences. For more information, visit : http://www.irb.usi.ch

About Neurocentro -Ente Ospedaliero Cantonale (EOC)

The EOC multisite hospital is organized and managed as a modern company at the service of the patient. It has structures with clear segregations of functions and flexible management systems that foster innovation, accountability and simplification.Our approach favors a collegial and participatory management style. General management and hospital directors form the EOC Management Coordination Conference, physicians are directly involved in EOC management through the Clinical Coordination Conference. The other professional categories actively participate in the management of the EOC within inter-hospital groups.For more information, visit http://www.eoc.ch/en/Centri-specialistici/NSI/NSI.html

Forward-Looking Statements

Any statements in this release that are not historical facts may be considered to be forward-looking statements. Forward-looking statements are based on managements current expectations and are subject to risks and uncertainties which may cause results to differ materially and adversely from the statements contained herein. Such statements include, but are not limited to, statements regarding Gain Therapeutics, Inc. (Gain) expected use of the proceeds from the Series B financing round; the market opportunity for Gains product candidates; and the business strategies and development plans of Gain. Some of the potential risks and uncertainties that could cause actual results to differ from those predicted include Gains ability to: make commercially available its products and technologies in a timely manner or at all; enter into other strategic alliances, including arrangements for the development and distribution of its products; obtain intellectual property protection for its assets; accurately estimate its expenses and cash burn and raise additional funds when necessary. Undue reliance should not be placed on forward-looking statements, which speak only as of the date they are made. Except as required by law, Gain does not undertake any obligation to update any forward-looking statements to reflect new information, events or circumstances after the date they are made, or to reflect the occurrence of unanticipated events.

Gain Therapeutics Investor Contact:Daniel FerryLifeSci Advisors+1 617-430-7576daniel@lifesciadvisors.com

Gain Therapeutics Media Contact:Cait Williamson, Ph.D.LifeSci Communications+1 646-751-4366cait@lifescicomms.com

Those We Lost in 2020 – The Scientist

Posted on December 20, 2020 by Prof Baldwin

For a complete list of our obituaries, seehere.

Jeff McKnight, a molecular biologist at the University of Oregon, died in October at the age of 36.

McKnights research focused on chromatin, a complex of DNA and proteins that controls when and how DNA can be accessed for replication and gene expression. He was one of the earliest researchers in the world capable of directly manipulating its structure, stemming back to his postdoctoral work at the Fred Hutchinson Cancer Research Center using the model organism Saccharomyces cerevisiae. When he had started his own lab in 2016, McKnight said at the time that his real dream was to apply his work to the dozens of human diseases that involve some level of chromatin disruption, including Parkinsons, Alzheimers, and Huntingtons.

Prior to his death from lymphoma, McKnight had spent months chronicling his diagnosis and treatment on social media, prompting an outpouring of support from fellow scientists. He had this humility and vulnerability about him that was really endearing, David Garcia, a molecular biologist at the University of Oregon, told The Scientist.

Biologist Lynika Strozier, a researcher at the Field Museum and an instructor at Malcom X College, died June 7 at age 35 due to complications associated with COVID-19.

After being introduced to molecular biology as an undergraduate at Truman College, Strozier developed a passion for using DNA to identify new and sometimes cryptic species. For her thesis work as a masters student at Loyola University, Strozier sequenced DNA from 200 individual birds in Madagascar thought to belong to three species. Instead, she identified several new species that were indistinguishable based only on the birds appearance.

Her steady hand and aptitude in extracting usable genetic material from old samples earned the admiration of her colleagues. Our entire team entrusted Lynika with extracting DNA from old dried plant material of over 15 years and only very little material from which to do so, Matt Von Konrat, the head of botanical collections at the Field Museum, told The Scientist.

VANDERBILT UNIVERSITY MEDICAL CENTER

Nobel laureate and biochemist Stanley Cohen, who led pioneering studies of cell growth factors, died in February. He was 97.

Stans work not only provided key insights into how cells grow, but it led to the development of many drugs that are used to treat cancer, Lawrence Marnett, the dean of basic sciences at Vanderbilt University, where Cohen taught for 40 years, said to The Tennessean.

Cohens work on different types of growth factors alongside biochemist Rita Levi-Montalcini earned them the 1986 Nobel Prize in Physiology or Medicine. Cohen was honored for his discovery of epidermal growth factora protein that stimulates cell growth and differentiation and plays an important role in tumor developmentwhile Levi-Montalcini was acknowledged for first isolating nerve growth factor. Growth factor receptors have since become the target of numerous drugs, such as gefitinib and cetuximab, that slow or prevent the progression of certain cancers.

SAMARA VISE, KOCH INSTITUTE AT MIT

Angelika Amon, a cell biologist at MIT, died on October 29 from ovarian cancer at the age of 53.

Amon dedicated her career to researching the cell cycle and how disruptions to its normal function can lead to cancer.

During her PhD at the University of Vienna and her subsequent postdoc at MITs Koch Institute for Integrative Cancer Research, Amon used model organisms such as yeast and fruit flies to study how certain proteins and enzymes direct cells through mitosis.

Later, Amon turned her focus to the study of aneuploidy, an abnormal number of chromosomes, and chromosome segregation. She found that extra chromosomes disrupt protein folding and cell metabolism, leading to errors in those processes that can drive cancer.

More than anyone else Ive ever met, she was an absolute force of nature, Matthew Vander Heiden, an MIT biologist and close friend of Amon, told The Scientist. She just has this larger than life personalitytheres no other way to put it.

WILL KIRK/JOHNS HOPKINS UNIVERSITY

Computational biologist James Taylor, who developed a widely used bioinformatics platform, died in April. He was 40.

James made huge contributions to open-source, accessibility, and reproducibility, genomicist Andrew Carroll tweeted following his death. Anyone who runs a bioinformatics tool on the cloud does so thanks to Jamess work.

During his PhD at Penn State University, Taylor helped develop the Galaxy Project, a platform that allows researchers to share genomic data without needing to know how to program. He continued refining the platform as he moved from teaching at Emory University to Johns Hopkins University, and since then Galaxy has been used in more than 10,000 publications across disciplines. Prior to his death, Taylor spoke on Twitter of the need to make transparent, reusable and reproducible analysis pipelines to address the current pandemic, by developing resources for best practices in sharing and analyzing data.

ED SOUZA/STANFORD NEWS SERVICE

Sleep scientist William Dement, who described a number of sleep disorders and opened one of the worlds first sleep disorder clinics, died in June. He was 91.

During his graduate studies at the University of Chicago in the 1950s, Dement studied the physiology of REM sleep and its relationship to dreaming. He later joined the faculty at Stanford University, where he taught for 50 years. There, his focus became the study of sleep apnea and the effects of sleep deprivation. In 1970, he launched the Stanford Sleep Medicine Center and is credited with prompting Congress to establish the National Center on Sleep Disorders Research.

There are not a lot of people who can say they saved the lives of hundreds of thousands of people, Emmanuel Mignot, a professor of psychiatry and behavioral sciences at Stanford University, said in an obituary. But just by pushing this field forward, making sleep apnea recognized, as well as sleep disorders and sleep deprivation, Bill did that.

Wendy Havran, an immunologist at the Scripps Research Institute who studied the role of gamma-delta T cells in wound healing, died January 20 at the age of 64.

Havran first became interested in immunology after meeting John Cambier, an immunologist at Duke University, where she completed her undergraduate degree. While she had intended to study medicine, she became enamored of doing research. It just clicked, and there was no going back, she told The Scientist in a 2019 profile. I wanted to understand how the immune system worked.

During her doctorate research at the University of Chicago, Havran used monoclonal antibodies to study CD4 and CD8 surface markers on T cells. Later, as a postdoc at the University of California, Berkeley, Havran began focusing specifically on gamma-delta T cells, which had only just been described. She was able to map their abundance throughout the body, showing for the first time that they were common in the skin and intestines. In her own lab at Scripps, Havran went on to demonstrate the cells ability to heal wounds and suppress tumor growth.

JOHNS HOPKINS BLOOMBERG SCHOOL OF PUBLIC HEALTH

Immunologist and microbiologist Noel Rose, whose early experiments established the concept of autoimmunity, died of a stroke on July 30 at the age of 92.

Before his pioneering work, it was believed that the body was incapable of launching an immune response against itself. But as a young medical student at the University of Buffalo, Rose showed that rabbits injected with their own thyroid-derived antigens mounted an immune response that damaged or destroyed the animals thyroid. Over the next several decades, he would further unravel the causes and mechanisms of autoimmune diseases, publishing more than 880 articles and book chapters.

In every aspect, [Rose] is the father of autoimmunity, George Tsokos, a professor of rheumatology at Harvard Medical School, told The Scientist in a profile of Rose. The man opened a whole chapter in the book of medicine.

LIZA GREEN, HARVARD MEDICAL SCHOOL

Phillip Leder, a molecular geneticist at Harvard Medical School whose research furthered the fields of molecular biology, immunology, and cancer genetics, died in February. He was 85.

Working alongside NIH geneticist Marshall Nirenberg as a postdoc in the 1960s, Leder developed a technique that confirmed, for the first time, that amino acids were encoded by three nucleotides. Speaking in a 2012 interview, he recalled the excitement of those early experiments. I would go to bed thinking about the next days experiments and then jump out of bed in the morning and rush to the laboratory. It was a lot of work, but the intellectual excitement was enormous.

Having revealed the genetic basis of protein coding, Leder next went on to map the first complete sequence of a mammalian gene, develop the first recombinant DNA vector system, and discover a cancer-causing gene that led to the development of the first mouse model of cancer, among other achievements. He established the genetics department at Harvard Medical School in 1981 only a year after joining the faculty and served as chair for 25 years.

Phil Leder was special. Among great scientists, he was special, and among scientists, he was an icon, David Livingston, a geneticist at Harvard, told The Scientist.

University of california, san diego

Molecular virologist Flossie Wong-Staal, a researcher at the University of California, San Diego (UCSD), who first cloned the human immunodeficiency virus (HIV), died in July at age 73 due to complications from pneumonia.

When Wong-Staal first entered the laboratory of fellow virologist Robert Gallo as a postdoc in 1973, scientists were skeptical that retroviruses could cause cancer in humans. Wong-Staals work helped to overturn this dogma after she and her team identified the first human retrovirus (HTLV-1) and showed that it could indeed lead to cancer. Together, she and Gallo published more than 100 papers in 20 years, making Wong-Staal the most-cited woman in science during the 1980s.

As AIDS cases began to spike in the 1980s, Wong-Staal became the first person to clone HIV the retrovirus that causes the diseaseand began studying the functions of its genes, a necessary step towards developing eventual treatments. She left Gallos lab at the National Cancer Institute in 1990 to launch the Center for AIDS Research at UCSD, where she spent the next several decades studying the virus and developing treatments, many of which are still in use today.

Go here to read the rest:
Those We Lost in 2020 - The Scientist

Has Google’s DeepMind revolutionized biology? | TheHill – The Hill

Posted on December 8, 2020 by Prof Baldwin

Every budding biologist learns about proteins and the amino acids that build them. Proteins are the building blocks of life, but knowing the sequence for the protein is only half of the story. How the protein folds onto itself determines what sections are exposed and can interact with other molecules, and therefore also what sections are hidden.

This is called the protein folding problemand has stumped the scientific community for about 50 years. Scores of researchers around the world are working to predict how proteins are folded, many using artificial intelligence (AI).

Biologists want to be able to predict how a protein folds because that gives insight into what it does and how it functions in the body. Geneticists and researchers have gained understanding about genes that encode for proteins, but experts have less knowledge about what happens when proteins are released to do their jobs.

One group at DeepMind, a Google AI offshoot, built an AI system that has done what others have not been able to. The group entered their algorithm, called AlphaFold, in the biennial protein-structure prediction challenge called Critical Assessment of Structure Prediction (CASP). The organizers of CASP look at the accuracy of predictions to assess how good the solutions are. The assessment is done blind, meaning the assessors dont know whose results they are looking at.

BREAKING NEWS ON THE CORONAVIRUS PANDEMIC

CDC CUTS LENGTH OF COVID-19 QUARANTINE TIME AFTER EXPOSURE

UK BECOMES FIRST WESTERN NATION TO AUTHORIZE COVID-19 VACCINE

CDC DECIDES WHO WILL RECEIVE FIRST DOSES OF COVID-19 VACCINES

CORONAVIRUS EPIDEMIC WAS SPREADING IN US LAST CHRISTMAS, LONG BEFORE IT WAS IDENTIFIED IN CHINA, NEW STUDY FINDS

This year, AlphaFold has come out on top, beating its past performance and others in the competition.

This is a big deal,said John Moult, who is a computational biologist at the University of Maryland in College Park and co-founded CASP in 1994, to Nature. In some sense the problem is solved.

America is changing faster than ever! Add Changing America to your Facebook or Twitter feed to stay on top of the news.

Research groups that dont use AI usually focus on experiments and collect data like X-ray diffraction data. One group that was trying to figure out a bacteria protein has been studying it for a decade while AlphaFold solved it in half an hour, according to Nature.

This is a problem that I was beginning to think would not get solved in my lifetime,said Janet Thornton, who is a structural biologist at the European Molecular Biology Laboratory-European Bioinformatics Institute and a past assessor for CASP, to Nature.

DeepMind is mostly known for its success in chess, Go and other games. Demis Hassabis, DeepMinds founder and chief executive,said to The Guardian, These algorithms are now becoming mature enough and powerful enough to be applicable to really challenging scientific problems.

READ MORE LIKE THIS FROM CHANGING AMERICA

LIQUID BIOPSIES COULD LEAD TO EARLY CANCER DETECTION

CORONAVIRUS EPIDEMIC WAS SPREADING IN US LAST CHRISTMAS, LONG BEFORE IT WAS IDENTIFIED IN CHINA, NEW STUDY FINDS

SEVERAL DIFFERENT TYPES OF DEPRESSION ARE SET TO COLLIDE THIS WINTER

WOMEN OF COLOR ARE TIPPING THE BALANCE OF POWER IN U.S. CITIES

Deep medicine: Artificial intelligence is changing the face of healthcare, daily – Yiba

Posted on December 15, 2020 by Prof Baldwin

Professor Tshilidzi Marwala is the Vice-Chancellor and Principal of the University of Johannesburg. He recently penned an opinion article that first appeared in theDaily Maverickon 07 December 2020.

This year has been a great definer. As we waged a battle against an unknown entity, proponents of artificial intelligence (AI) were swift to act. Just last week, DeepMind announced that it has cracked what is referred to as a 50-year-old scientific riddle. It has solved the protein-folding problem. In other words, it can determine a proteins 3D shape from its amino-acid sequence, making it easier to develop treatments for a range of diseases from cancer to the coronavirus.

To do this, researchers trained the DeepMind algorithm on a public database, which contained about 170,000 protein sequences and their shapes over a few weeks, running the equivalent of 100 to 200 graphics processing units. In recent years, DeepMind has been most recognised for its ability to beat human beings in games such as Go or Atari Classics. These were, in a sense, testing grounds for ultimately solving real-world problems.

As DeepMinds founder Demis Hassabis said at the announcement last week: It marks an exciting moment for the field. These algorithms are now becoming mature enough and powerful enough to be applicable to really challenging scientific problems. In fact, many had expected this kind of advancement in AI only in a few decades from now.

This indicates the advent of the Fourth Industrial Revolution (4IR) the era we find ourselves in, where intelligent technologies permeate all aspects of our lives. AI, which is the most significant technology of the 4IR, is already changing how we live, work and communicate by reshaping government, education, healthcare and commerce. In his bookDeep Medicine,Eric Topol distinguishes between shallow and deep medicine. Shallow medicine is a healthcare system based on observations of community groups (for example, people of African descent have a higher risk of prostate cancer than other community groups), whereas deep medicine is based on individualised medicine that is enabled by AI.

Not only do we have more access to information than ever before, but we also see a confluence of cyber, physical and biological technologies that no longer exist in labs, but impact on us every day. Proponents have long argued that the 4IR could be the key to finding solutions to some of our most deep-seated problems. The unprecedented responses to the coronavirus pandemic have been an exemplification of this.

For instance, AI has made the detection of the coronavirus easier. Alibabas research institute, Damo Academy, has developed an AI algorithm that can detect the coronavirus in just under 20 seconds with 96% accuracy. The AI was trained using 5,000 samples from confirmed cases and can detect the virus from chest CT scans, differentiating between infected patients and general viral pneumonia cases.

South Korea was swift to act following the outbreak in China, anticipating a spread into its borders. The government organised the private sector to develop testing kits for the virus. Molecular biotech company Seegene in South Korea used AI to accelerate these kits development. This facilitated the submission of its solution to the Korea Centers for Disease Control and Prevention (KCDC) only three weeks after scientists began working on this solution. Under normal situations, this process would have taken two to three months with an approval process of about 18 months.

It is not just pockets of AI that have cropped up in these regions. The opportunity for AI to speed up the implementation of vaccines, drugs and diagnostics is gaining traction elsewhere. Projects such as the Covid-19 Open Research Dataset provide free access to the texts of almost 25,000 research papers, while the Covid-net open access neural network is working on systems similar to those deployed by the Damo Academy.

Companies such as BenevolentAI, based in the United Kingdom, are using AI and the available data to scour through existing drugs that could be used to treat coronavirus patients until a vaccine becomes available.

Vir Biotechnology and Atomwise, start-ups in the United States, are using algorithms to identify a molecule that could facilitate treatment. Now, as various vaccines are in the final testing stages, algorithms are being used to sift through data on potential adverse reactions. Companies such as Genpact UK have signed contracts with the UK government to ensure that nothing is missed as preparations begin for mass vaccinations in the coming year. This is significant given the rapid timeline in which many of these vaccines have been developed and the various unknowns that remain.

AI solutions once thought of as futuristic and unrealistic are now commonplace. We see far more advances than we had expected at this stage, perhaps indicating the urgency that the pandemic has presented.

Similarly, there has been a shift to find AI solutions in Africa. Data science competition platform Zindi which is based in South Africa and Ghana has initiated a competition sponsored by the Artificial Intelligence for Development-Africa Network (AI4D-Africa), which requires data scientists to create an epidemiological model that forecasts the spread of Covid-19 throughout the globe. This is critical for both policy makers and health workers to make informed decisions and take action.

In Kenya, start-up Afya Rekod deploys AI and Blockchain to establish a health-data platform that lets users store their health records, access health information and connect to health service providers.

Of course, it is not only in the context of the coronavirus pandemic that there have been AI advances. There have been great strides in bridging some of the inequalities that exist in the healthcare system. In Rwanda, for instance, the government has collaborated with US start-up Zipline to deliver blood supplies by drones to remote areas. Where a journey would have taken three hours by car, a drone can complete the trip within six minutes. This addresses emergency medical supply requirements in rural areas.

Just last month, to improve access and quality of services to rural communities in South Africa, the Department of Health in Limpopo installed CT-Scans and Picture Archiving Communication System (PACS) in the province. The availability of this equipment at regional hospitals now improves the speed of diagnosis and management of the associated conditions and indicates an embracing of the 4IR.

This is vital because according to the General Household Survey conducted by Statistics SA, only 17% of South Africans have medical insurance, the critical key for private healthcare. About 82% of South Africans fall outside the medical-aid net, and, as a result, are largely dependent on public healthcare. According to Statistics South Africa, in 2017, 81% of households that used public healthcare services were satisfied or very satisfied with public facilities services.

AI also addresses concerns of a shortage of doctors, particularly in the public sector. For example, the increased speed and accuracy of cancer diagnostics through analytics which can characterise tumours and predict therapies has not replaced doctors, but quickened their efforts and given them the space to attend to more patients. Technologies such as AI will decrease the cost of health care globally.

Almost two-thirds of healthcare costs are from non-communicable diseases such as cancer, strokes, heart failure and kidney failure that can be treated more effectively and at less cost if diagnosed early.

For example, in China, a company called Infervision developed AI algorithms that efficiently and accurately read medical images to augment radiologists in diagnosing cancer.

As Dhruv Khullar, a physician at New York-Presbyterian Hospital, said, most fundamentally, it means recognising that humans, not machines, are still responsible for caring for patients. It is our duty to ensure that we are using AI as another tool at our disposal not the other way around.

What is clear is that, like many other sectors, health care will be transformed by AI and we need to ready ourselves for these shifts. As Enrico Coiera aptly put it inThe Lancetin 2018, what is the fate of medicine in the time of AI? Our fate is to change.

*The views expressed in the article is that of the author/s and does not necessarily reflect that of the University of Johannesburg.

Source: UJ

More:
Deep medicine: Artificial intelligence is changing the face of healthcare, daily - Yiba

Tech.eu Podcast #198: Even more money for e-scooters, new VC funds, protein folding, and we talk to Sebastian Peck of InMotion Ventures – Tech.eu

Posted on December 8, 2020 by Prof Baldwin

The Tech.eu Podcast is a show in which we discuss some of the most interesting stories from the European technology scene and interview leading entrepreneurs and investors from across the region.

This week, we talk about whats going on in European tech, including some of the biggest funding rounds of the week, new VC funds, science and research news, and much more. Weve also spoken to Sebastian Peck, managing director of InMotion Ventures.

You can find the latest episode embedded below. Subscribe today and dont miss new episodes:

And here are the notes and links for this weeks episode:

Voi, the European micromobility rental company, raises $160 million additional equity and debt funding

UK-based HungryPanda raises $70 million to expand its online Asian food delivery business worldwide

Monzo, the UK challenger bank, picks up additional 60 million in funding

UK edtech startup MEL Science snags $14 million Series B

SoftBank buys 10.1 percent stake in Sinch after its meteoric surge

This is where Target Global wants to invest its new 300+ million fund

Firstminute Capital launches second $111 million fund, featuring a whos-who of founders as LPs

The European Investment Bank Group debuts new 150 million financing instrument to support European AI tech firms

London AI lab claims breakthrough that could accelerate drug discovery

Interview with Sebastian Peck, managing director of InMotion Ventures, a firm backed by Jaguar Land Rover

We hope you enjoy(ed) the podcast! Please feel free to email us with any questions, suggestions, and opinions topodcast@tech.eu or tweet at us @tech_eu.

Image credit: National Cancer Institute on Unsplash

Link:
Tech.eu Podcast #198: Even more money for e-scooters, new VC funds, protein folding, and we talk to Sebastian Peck of InMotion Ventures - Tech.eu

Will AI empower scientists or replace them? – Techerati

Posted on December 8, 2020 by Prof Baldwin

Googles DeepMind AI team solved a long-running biological problem

Scientists are not about to lose their jobs to more sophisticated artificial intelligence instead it will help them work even better, an expert in the field has said following a Google breakthrough.

Last week, the tech giants DeepMind AI specialists based in the UK made a leap forward in solving one of biologys biggest challenges, the five-decade-old protein folding problem.

Determining the structure of a protein opens up a world of possibilities, from understanding neurological diseases like Parkinsons, to discovering new drugs.

The problem is there are so many and it takes time to understand them all we have only managed to unfold a fraction of the millions of known proteins in living things.

But what does this mean for scientists going forward?

Like many jobs touched by technology, it does not mean their skills will no longer be needed, according to Dr Aldo Faisal, professor of AI and neuroscience at Imperial College London.

Instead it will cut down on mundane tasks, allow research to be carried out faster, and enable scientists to concentrate on more in-depth experiments.

I think what were going to see is that AI is going to empower scientists, its not about replacing scientists, its about empowering them to be able to do more and effectively taking away the boring parts of the work so to speak that are routine and mundane and allowing them to move quicker, discover things faster and I think thats one of the biggest appeals of AI, Dr Faisal told the PA news agency.

The protein folding and AlphaFold is beautiful because it shows that one can test hypotheses much, much quicker than with current conventional technologies about how protein folds and of course how protein folds tell us something about how they can function, interact and so this will basically save time and allow people to very quickly explore protein structures without having to do costly and slow great experiments.

Although AI has been used to revolutionise science for several years, Dr Faisal said we are seeing loads of other applications arrive and he expects more to come.

For example, earlier this year a group of scientists from Massachusetts Institute of Technology (MIT) used AI to help them uncover new types of powerful antibiotics, capable of killing some of the worlds most problematic disease-causing bacteria.

That was a very fortuitous discovery they made using AI and were seeing loads of other applications in understanding, basically, bringing together data about health care and environment and the context in which people live in relating that to the genes and the function of proteins inside their body, Dr Faisal continued.

Establishing these links, basically connecting healthcare data, connecting daily life data,

See the article here:
Will AI empower scientists or replace them? - Techerati

Genesis Therapeutics raises $52M A round for its AI-focused drug discovery mission – TechCrunch

Posted on December 3, 2020 by Prof Baldwin

Sifting through the trillions of molecules out there that might have powerful medicinal effects is a daunting task, but the solution biotech has found is to work smarter, not harder. Genesis Therapeutics has a new simulation approach and cross-disciplinary team that has clearly made an impression: the company just raised a $52 million A round.

Genesis competed in the Startup Battlefield at Disrupt last year, impressing judges with its potential, and obviously others saw it as well in particular Rock Springs Capital, which led the round.

Over the last few years many companies have been formed in the drug discovery space, powered by increased computing and simulation power that lets them determine the potential of molecules in treating certain diseases. At least thats the theory. The reality is a bit messier, and while these companies can narrow the search, they cant just say here, a cure for Parkinsons.

Founder Evan Feinberg got into the field when an illness he inherited made traditional lab work, as an intern at a big pharma company, difficult for him. The computational side of the field, however, was more accessible and ended up absorbing him entirely.

He had dabbled in the area before and arrived at what he feels is a breakthrough in how molecules are represented digitally. Machine learning has, of course, accelerated work in many fields, biochemistry among them, but he felt that the potential of the technology had not been tapped.

I think initially the attempts were to kind of cut and paste deep learning techniques, and represent molecules a lot like images, and classify them like youd say, this is a cat picture or this is not a cat picture, he explained in an interview. We represent the molecules more naturally: as graphs. A set of nodes or vertices, those are atoms, and things that connect them, those are bonds. But were representing them not just as bond or no bond, but with multiple contact types between atoms, spatial distances, more complex features.

The resulting representation is richer and more complex, a more complete picture of a molecule than youd get from its chemical formula or a stick diagram showing the different structures and bonds. Because in the world of biochemistry, nothing is as simple as a diagram. Every molecule exists as a complicated, shifting 3D shape or conformation where important aspects like the distance between two carbon formations or bonding sites is subject to many factors. Genesis attempts to model as many of those factors as it can.

Step one is the representation, he said, but the logical next step is, how does one leverage that representation to learn a function that takes an input and outputs a number, like binding affinity or solubility, or a vector that predicts multiple properties at once?

Thats the work theyve focused on as a company not just creating a better model molecule, but being able to put a theoretical molecule into simulation and say, it will do this, it wont do this, it has this quality but not that one.

Some of this work may be done in partnerships, such as the one Genesis has struck up with Genentech, but the teams could very well find drug candidates independent of those, and for that reason the company is also establishing an internal development process.

The $52 million infusion ought to do a lot to push that forward, Feinberg wrote in an email:

These funds allow us to execute on a number of critical objectives, most importantly further pioneering AI technologies for drug development and advancing our therapeutics pipeline. We will be hiring more top notch AI researchers, software engineers, medicinal chemists and biotech talent, as well as building our own research labs.

Other companies are doing simulations as well and barking up the same tree, but Feinberg says Genesis has at least two legs up on them, despite the competition raising hundreds of millions and existing for years.

Were the only company in the space thats working at the intersection of modern deep neural network approaches and biophysical simulation conformational change of ligands and proteins, he said. And were bringing this super technical platform to experts who have taken FDA-approved drugs to market. Weve seen tremendous value creation just from that the chemists inform the AI too.

The recent breakthrough of AlphaFold, which is performing the complex task of simulation protein folding far faster than any previous system, is as exciting to Feinberg as to everyone else in the field.

As scientists, we are incredibly excited by recent progress in protein structure prediction. It is an important basic science advance that will ultimately have important downstream benefits to the development of novel therapeutics, he wrote. Since our Dynamic PotentialNet technology is unique in how it leverages 3D structural information of proteins, computational protein folding similar to recent progress in cryo-EM is a nice complementary tailwind for the Genesis AI Platform. We applaud all efforts to make protein structure more accessible such that therapeutics can be more easily developed for patients of all conditions.

Also participating in the funding round were T. Rowe Price Associates, Andreessen Horowitz (who led the seed round), Menlo Ventures and Radical Ventures.

‘Stunning advance’ on ‘protein folding’: A 50-year-old science problem solved and that could mean big things – USA TODAY

Posted on December 4, 2020 by Prof Baldwin

A breakthrough on protein folding could unlock new possibilities into disease understanding and drug discovery, among other fields.(Photo: DeepMind)

Anew discovery about "protein folding" could unlock a world of possibilities into the understanding ofeverything from diseases to drugs, researchers say.

The breakthrough that is sending ripples of excitement throughthe science and medical communities this week deals with theshapestiny proteins in our bodies essential to all life fold into.

The so-called "protein-folding problem" has puzzled scientists for five decades, and the discovery this week from the London-based artificial intelligence lab DeepMind has been heralded as a major milestone.

"This computational work represents a stunning advance on the protein-folding problem, a 50-year old grand challenge in biology," said Venki Ramakrishnan, president of the U.K.'s Royal Society. "It has occurred decades before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research.

'The Ultimate': Astronomers want to put a huge telescope on the moon to study the Big Bang

Proteins are essential to life, supporting practically all of its functions, according to DeepMind, which is owned by Google. They are large, complex molecules, made up of chains of amino acids, and what a protein does largely depends on its unique 3D structure.

The ability to predict protein structures accurately enables a better understanding of what they do and how they work.

This isn't your typical space rock: There's a metal asteroid out there worth $10,000 quadrillion

When proteins are translated from their DNA codes, they quickly transform from a non-functional, unfolded state into their folded, functional state. Problems in folding can lead to diseases such asAlzheimer's and Parkinson's.

The companys breakthrough essentially means that it figured out how to use artificial intelligence to deliver relatively quick answers to questions about protein structure and function that would take many months or years to solve using currently available methods, according to STAT News.

Lunar discovery: Water discovered on sunlit part of the moon for the first time, NASA says

DeepMinds program, called AlphaFold, outperformed about 100 other teams in a biennial protein-structure prediction challenge called CASP, short for Critical Assessment of Structure Prediction, according to the journal Nature.

We have been stuck on this one problem how do proteins fold up for nearly 50 years," said University of Maryland professor John Moult, co-founder and chair of CASP. "To see DeepMind produce a solution for this, having worked personally on this problem for so long and after so many stops and starts wondering if wed ever get there, is a very special moment.

Researchers from DeepMind plan to publish their results in a peer-reviewed journal in the near future.

Read or Share this story: https://www.usatoday.com/story/news/nation/2020/12/03/protein-folding-discovery-major-breakthrough-deepmind/3809693001/

Image of the Month: The right place of human Man1b1 – Baylor College of Medicine News

Posted on November 5, 2020 by Prof Baldwin

Location, location, location! It is especially important in the world of cells. The Man1b1 protein, known to be involved in regulating a balanced, functional network of cellular proteins, was assumed to localize in the endoplasmic reticulum.

Dr. Richard Siferss group challenged this widespread view by showing that Man1b1 is actually located in the Golgi, a cellular structure functionally associated with but physically separate from the endoplasmic reticulum. The findings sharpened the appreciation of the dynamic process that regulates protein folding and the handling of misfolded, defective proteins, known to be involved in a number of conditions such as conformational diseases.

Conformational diseases include common conditions associated with accumulation of defective proteins, including neurological disorders, such as Alzheimers disease. Human Man1b1 has been linked to the causes of multiple congenital disorders of intellectual disability and HIV infection, and to poor prognosis in patients with bladder cancer. A better understanding of how Man1b1 works can potentially open new doors into developing improved treatments.

Learn more about the research conducted at the Sifers lab here, including the recent discovery of an unexpected new function of Man1b1.

Dr. Richard Sifers is professor of pathology & immunologyand member of theDan L Duncan Comprehensive Cancer CenteratBaylor College of Medicine.

By Ana Mara Rodrguez, Ph.D.

More:
Image of the Month: The right place of human Man1b1 - Baylor College of Medicine News

Real Progress In Crowdsourcing Scientific Tasks To Gamers – Bio-IT World

Posted on November 5, 2020 by Prof Baldwin

By Deborah Borfitz

November 4, 2020 | Gaming and sciencetwo seemingly incompatible areas of activityhave come together nicely in the case of citizen science games such as Foldit, Phylo, and Borderlands Science, as reported by academics close to the action who presented at the recent Bio-IT World Conference & Expo Virtual. The games are all played online, involve analyzing large sets of data, and endeavoring to solve real scientific problems. And players get credit individually (when willing) or as a crowd when findings appear in scholarly, peer-reviewed publications.

Whats not to love about the concept? Its certainly a great way to redirect the attention of people already spending untold hours on video games, says Seth Cooper, assistant professor in the Khoury College of Computer Sciences at Northeastern University. A pioneer of the field of scientific discovery games, he has demonstrated that video game enthusiasts are able to outperform purely computational methods for certain types of structural biochemistry problems, effectively codify their strategies, and integrate with the lab to help design real synthetic proteins.

Cooper is co-creator of Foldit, where the competition is about protein folding and design. Its hard for a computer to search all the possibilities without the aid of human creativity and reason, he says. The game is built on chemistry software called Rosetta and has been out for over a decade with more than half a million players, Cooper continues. It has evolved into a multi-institutional collaboration.

The goal, as with most games, is to get a high score, Cooper says. Players compete, and often collaborate, to build the best protein structures.

The process begins with a biochemist identifying a problem that gets turned into a game or puzzle that gets posted online, he explains. Each puzzle is only available for about a week, and generally a couple are up for play at any one time. Data generated by the Foldit players continually improve the game for better scientific results, Cooper notes. The levels of play get progressively harder.

Anyone can participate and most have no formal background in biochemistry, yet theyre contributing to science, he says. Back in 2011, players famously came up with an elegant, low-energy model for a monkey-virus enzyme, solving a longstanding scientific problem potentially useful for the design of retroviral drugs for AIDSand accomplished the feat inside of three weeks.

Players have also successfully redesigned existing enzymes, Cooper adds, as well as designed several protein structures from scratch that have been confirmed by X-ray crystallography. Theyre now working on designing an enzyme that will bind to the spike protein of SARS-CoV-2.

Vanderbilt University is also using Foldit to design small molecules and the University of California, Davis is studying the impact of adding a narrative to the competition. In the future, Cooper says, Foldit users might start working in a virtual reality environment. An educational version of Foldit with more contextual science information is available for classroom use, says Cooper, as is a standalone version that is completely separate from the game.

Burning Task Use

At McGill University, associate professor and computational scientist Jerome Waldispuehl is championing the gamification of genomics research with citizen science video game Phylo and its newest iteration called Borderlands Science. His focus is on multiple sequence alignment, one of the most challenging problems in bioinformatics that involves discovering similarities between a set of protein or DNA sequences.

Phylo presents players with DNA puzzles where they manipulate patterns consisting of colored tiles so that they almost forget the scientific context, Waldispuehl says. The abstraction task is to minimize the mismatch of colors to avoid a penalty.

Every alignment submitted by players is eventually reinserted into an existing algorithm as an optimization, says Waldispuehl. Alignments up for play contain sections of human DNA thought to be linked to various genetic disorders. Since 2010, Phylo has had 350,000 participants and generated one million solutions by improving alignments by 40%-95% over a computer algorithm, he reports.

Borderlands Science, launched in April for purposes of education and science outreach, quickly hit the one million mark with players and has come up with 50 million solutions, he adds. Collaborators include video game science company Massively Multiplayer Online Science, Gearbox Software and The Microsetta Initiative of the University of California, San Diego.

The Borderlands version of the game is played vertically rather than horizontally and rewards success with in-game currency that is important to some players, Waldispuehl says. It is currently aimed at improving 16S ribosomal RNA gene sequences from human microbiome alignments.

Go here to read the rest:
Real Progress In Crowdsourcing Scientific Tasks To Gamers - Bio-IT World

Angelika Amon, cell biologist who pioneered research on chromosome imbalance, dies at 53 – MIT News

Posted on October 31, 2020 by Prof Baldwin

Angelika Amon, professor of biology and a member of the Koch Institute for Integrative Cancer Research, died on Oct. 29 at age 53, following a two-and-a-half-year battle with ovarian cancer.

"Known for her piercing scientific insight and infectious enthusiasm for the deepest questions of science, Professor Amon built an extraordinary career and in the process, a devoted community of colleagues, students and friends," MIT President L. Rafael Reif wrote in a letter to the MIT community.

Angelika was a force of nature and a highly valued member of our community, reflects Tyler Jacks, the David H. Koch Professor of Biology at MIT and director of the Koch Institute. Her intellect and wit were equally sharp, and she brought unmatched passion to everything she did. Through her groundbreaking research, her mentorship of so many, her teaching, and a host of other contributions, Angelika has made an incredible impact on the world one that will last long into the future.

A pioneer in cell biology

From the earliest stages of her career, Amon made profound contributions to our understanding of the fundamental biology of the cell, deciphering the regulatory networks that govern cell division and proliferation in yeast, mice, and mammalian organoids, and shedding light on the causes of chromosome mis-segregation and its consequences for human diseases.

Human cells have 23 pairs of chromosomes, but as they divide they can make errors that lead to too many or too few chromosomes, resulting in aneuploidy. Amons meticulous and rigorous experiments, first in yeast and then in mammalian cells, helped to uncover the biological consequences of having too many chromosomes. Her studies determined that extra chromosomes significantly impact the composition of the cell, causing stress in important processes such as protein folding and metabolism, and leading to additional mistakes that could drive cancer. Although stress resulting from aneuploidy affects cells ability to survive and proliferate, cancer cells which are nearly universally aneuploid can grow uncontrollably. Amon showed that aneuploidy disrupts cells usual error-repair systems, allowing genetic mutations to quickly accumulate.

Aneuploidy is usually fatal, but in some instances extra copies of specific chromosomes can lead to conditions such as Down syndrome and developmental disorders including those known as Patau and Edwards syndromes. This led Amon to work to understand how these negative effects result in some of the health problems associated specifically with Down syndrome, such as acute lymphoblastic leukemia. Her expertise in this area led her to be named co-director of the recently established Alana Down Syndrome Center at MIT.

Angelikas intellect and research were as astonishing as her bravery and her spirit. Her labs fundamental work on aneuploidy was integral to our establishment of the center, say Li-Huei Tsai, the Picower Professor of Neuroscience and co-director of the Alana Down Syndrome Center. Her exploration of the myriad consequences of aneuploidy for human health was vitally important and will continue to guide scientific and medical research.

Another major focus of research in the Amon lab has been on the relationship between how cells grow, divide, and age. Among other insights, this work has revealed that once cells reach a certain large size, they lose the ability to proliferate and are unable to reenter the cell cycle. Further, this growth contributes to senescence, an irreversible cell cycle arrest, and tissue aging. In related work, Amon has investigated the relationships between stem cell size, stem cell function, and tissue age. Her labs studies have found that in hematopoetic stem cells, small size is important to cells ability to function and proliferate in fact, she posted recent findings on bioRxiv earlier this week and have been examining the same questions in epithelial cells as well.

Amon lab experiments delved deep into the mechanics of the biology, trying to understand the mechanisms behind their observations. To support this work, she established research collaborations to leverage approaches and technologies developed by her colleagues at the Koch Institute, including sophisticated intestinal organoid and mouse models developed by the Yilmaz Laboratory, and a microfluidic device developed by the Manalis Laboratory for measuring physical characteristics of single cells.

The thrill of discovery

Born in 1967, Amon grew up in Vienna, Austria, in a family of six. Playing outside all day with her three younger siblings, she developed an early love of biology and animals. She could not remember a time when she was not interested in biology, initially wanting to become a zoologist. But in high school, she saw an old black-and-white film from the 1950s about chromosome segregation, and found the moment that the sister chromatids split apart breathtaking. She knew then that she wanted to study the inner workings of the cell and decided to focus on genetics at the University of Vienna in Austria.

After receiving her BS, Amon continued her doctoral work there under Professor Kim Nasmyth at the Research Institute of Molecular Pathology, earning her PhD in 1993. From the outset, she made important contributions to the field of cell cycle dynamics. Her work on yeast genetics in the Nasmyth laboratory led to major discoveries about how one stage of the cell cycle sets up for the next, revealing that cyclins, proteins that accumulate within cells as they enter mitosis, must be broken down before cells pass from mitosis to G1, a period of cell growth.

Towards the end of her doctorate, Amon became interested in fruitfly genetics and read the work of Ruth Lehmann, then a faculty member at MIT and a member of the Whitehead Institute. Impressed by the elegance of Lehmanns genetic approach, she applied and was accepted to her lab. In 1994, Amon arrived in the United States, not knowing that it would become her permanent home or that she would eventually become a professor.

While Amons love affair with fruitfly genetics would prove short, her promise was immediately apparent to Lehmann, now director of the Whitehead Institute. I will never forget picking Angelika up from the airport when she was flying in from Vienna to join my lab. Despite the long trip, she was just so full of energy, ready to talk science, says Lehmann. She had read all the papers in the new field and cut through the results to hit equally on the main points.

But as Amon frequently was fond of saying, yeast will spoil you. Lehmann explains that because they grow so fast and there are so many tools, your brain is the only limitation. I tried to convince her of the beauty and advantages of my slower-growing favorite organism. But in the end, yeast won and Angelika went on to establish a remarkable body of work, starting with her many contributions to how cells divide and more recently to discover a cellular aneuploidy program.

In 1996, after Lehmann had left for New York Universitys Skirball Institute, Amon was invited to become a Whitehead Fellow, a prestigious program that offers recent PhDs resources and mentorship to undertake their own investigations. Her work on the question of how yeast cells progress through the cell cycle and partition their chromosomes would be instrumental in establishing her as one of the worlds leading geneticists. While at Whitehead, her lab made key findings centered around the role of an enzyme called Cdc14 in prompting cells to exit mitosis, including that the enzyme is sequestered in a cellular compartment called the nucleolus and must be released before the cell can exit.

I was one of those blessed to share with her a eureka moment, as she would call it, says Rosella Visintin, a postdoc in Amons lab at the time of the discovery and now an assistant professor at the European School of Molecular Medicine in Milan. She had so many. Most of us are lucky to get just one, and I was one of the lucky ones. Ill never forget her smile and scream neither will the entire Whitehead Institute when she saw for the first time Cdc14 localization: You did it, you did it, you figured it out! Passion, excitement, joy everything was in that scream.

In 1999, Amons work as a Whitehead Fellow earned her a faculty position in the MIT Department of Biology and the MIT Center for Cancer Research, the predecessor to the Koch Institute. A full professor since 2007, she also became the Kathleen and Curtis Marble Professor in Cancer Research, associate director of the Paul F. Glenn Center for Biology of Aging Research at MIT, a member of the Ludwig Center for Molecular Oncology at MIT, and an investigator of the Howard Hughes Medical Institute.

Her pathbreaking research was recognized by several awards and honors, including the 2003 National Science Foundation Alan T. Waterman Award, the 2007 Paul Marks Prize for Cancer Research, the 2008 National Academy of Sciences (NAS) Award in Molecular Biology, and the 2013 Ernst Jung Prize for Medicine. In 2019, she won the Breakthrough Prize in Life Sciences and the Vilcek Prize in Biomedical Science, and was named to the Carnegie Corporation of New Yorks annual list of Great Immigrants, Great Americans. This year, she was given the Human Frontier Science Program Nakasone Award. She was also a member of the NAS and the American Academy of Arts and Sciences.

Lighting the way forward

Amons perseverance, deep curiosity, and enthusiasm for discovery served her well in her roles as teacher, mentor, and colleague. She has worked with many labs across the world and developed a deep network of scientific collaboration and friendships. She was a sought-after speaker for seminars and the many conferences she attended. In over 20 years as a professor at MIT, she has mentored more than 80 postdocs, graduate students, and undergraduates, and received the School of Sciences undergraduate teaching prize.

Angelika was an amazing, energetic, passionate, and creative scientist, an outstanding mentor to many, and an excellent teacher, says Alan Grossman, the Praecis Professor of Biology and head of MITs Department of Biology. Her impact and legacy will live on and be perpetuated by all those she touched.

Angelika existed in a league of her own, explains Kristin Knouse, one of Amons former graduate students and a current Whitehead Fellow. She had the energy and excitement of someone who picked up a pipette for the first time, but the brilliance and wisdom of someone who had been doing it for decades. Her infectious energy and brilliant mind were matched by a boundless heart and tenacious grit. She could glance at any data and immediately deliver a sharp insight that would never have crossed any other mind. Her positive attributes were infectious, and any interaction with her, no matter how transient, assuredly left you feeling better about yourself and your science.

Taking great delight in helping young scientists find their own eureka moments, Amon was a fearless advocate for science and the rights of women and minorities and inspired others to fight as well. She was not afraid to speak out in support of the research and causes she believed strongly in. She was a role model for young female scientists and spent countless hours mentoring and guiding them in a male-dominated field. While she graciously accepted awards for women in science, including the Vanderbilt Prize and the Women in Cell Biology Senior Award, she questioned the value of prizes focused on women as women, rather than on their scientific contributions.

Angelika Amon was an inspiring leader, notes Lehmann, not only by her trailblazing science but also by her fearlessness to call out sexism and other -isms in our community. Her captivating laugh and unwavering mentorship and guidance will be missed by students and faculty alike. MIT and the science community have lost an exemplary leader, mentor, friend, and mensch.

Amons wide-ranging curiosity led her to consider new ideas beyond her own field. In recent years, she has developed a love for dinosaurs and fossils, and often mentioned that she would like to study terraforming, which she considered essential for a human success to life on other planets.

It was always amazing to talk with Angelika about science, because her interests were so deep and so broad, her intellect so sharp, and her enthusiasm so infectious, remembers Vivian Siegel, a lecturer in the Department of Biology and friend since Amons postdoctoral days. Beyond her own work in the lab, she was fascinated by so many things, including dinosaurs dreaming of taking her daughters on a dig lichen, and even life on Mars.

Angelika was brilliant; she illuminated science and scientists, says Frank Solomon, professor of biology and member of the Koch Institute. And she was intense; she warmed the people around her, and expanded what it means to be a friend.

Amon is survived by her husband Johannes Weis, and her daughters Theresa and Clara Weis, and her three siblings and their families.

Read this article:
Angelika Amon, cell biologist who pioneered research on chromosome imbalance, dies at 53 - MIT News

If AlphaFold Is a Product of Design, Maybe Our Bodies Are Too – Walter Bradley Center for Natural and Artificial Intelligence

Posted on October 31, 2020 by Prof Baldwin

Recently, weve been looking at tech philosopher George Gilders new Gaming AI about what AI canand cantdo for us. It cant do our thinking for us but it can do many jobs we dont even try because no human being has enough time or patience to motor through all the calculations.

Which brings us to the massive complexity of the proteins that carry out our genetic instructionsbetter knowledge of which would help us battle many diseases.

Gilder notes that when DeepMinds AlphaGo beat humans at the board game Go in 2016, it wasnt just for the fun of winning a game. DeepMind cofounder Demis Hassabis (pictured in 2018) is more interested in real-life uses such as medical research (p. 11). The human body is very complex and a researcher can be confronted with thousands of possibilities. Which ones matter?

The area the DeepMind team decided to focus on is protein folding: Human DNA has 64 codons that program little machines in our cells (ribosomes) to create specific proteins out of the standard twenty amino acids. But, to do their jobs, the proteins fold themselves into many, many different shapes. Figuring it all out is a real problem for researchers and the DeepMind crew hope that AI will help:

Over the past five decades, researchers have been able to determine shapes of proteins in labs using experimental techniques like cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, but each method depends on a lot of trial and error, which can take years of work, and cost tens or hundreds of thousands of dollars per protein structure. This is why biologists are turning to AI methods as an alternative to this long and laborious process for difficult proteins. The ability to predict a proteins shape computationally from its genetic code alonerather than determining it through costly experimentationcould help accelerate research.

As Gilder recounts, the biotech industry conducts annual global protein-folding competitions among molecular biologists and in 2019 DeepMind defeated all teams of relatively unaided human rivals:

Advancing from the unaided human level of two or three correct protein configurations out of forty, DeepMind calculated some thirty-three correct solutions out of forty. This spectacular advance opens the way to major biotech gains in custom-built protein molecules adapted to particular people with particular needs or diseases. It is the most significant biotech invention since the complementary CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) method for using enzymes directly to edit strands of DNA.

But now that we have found a way to tackle one aspect of the immense complexity of human bodily existence, heres an interesting problem to think about: We are told by many philosophers that life came to exist on Earth purely by chance. How likely is that, given the intricacy of the machinery that governs our bodies?

Kirk Durston, a biophysicist who studies protein folds, comments:

As we all know from probabilities, you can get lucky once, but not thousands of times

As real data shows, the probability of finding a functional sequence for one average protein family is so low, there is virtually zero chance of obtaining it anywhere in this universe over its entire history never mind finding thousands of protein families.

Yet thats what we have. All those protein families. As we learn more about the world we live in, we may find ourselves confronting more challenges like this: We had to invent a really complex machine to even begin to figure out protein folding in our bodies and we know that the machine did not happen by chance. So why should we believe that our bodies happened that way? Probably not.

Note: While medicine may be the most important way AI can help us, it also helps us in other areas where huge numbers of calculations are essential for success. For example, it can help recover lost languages and interpret charred scrolls. It can continuously scan the skies, sparing astronomers for more human-friendly work like interpreting the results. It can restore blurred images and help with cold case files. As with anything, the trick is to take advantage of what it can really do. We dont need the courtroom sentencing robot or the AI Jesusbut then we never did. As our information resources become larger and more complex, we do need some help with the sheer volume and thats where AI is bound to succeed.

Regulation of chaperone function by coupled folding and oligomerization – Science Advances

Posted on October 28, 2020 by Prof Baldwin

Abstract

The homotrimeric molecular chaperone Skp of Gram-negative bacteria facilitates the transport of outer membrane proteins across the periplasm. It has been unclear how its activity is modulated during its functional cycle. Here, we report an atomic-resolution characterization of the Escherichia coli Skp monomer-trimer transition. We find that the monomeric state of Skp is intrinsically disordered and that formation of the oligomerization interface initiates folding of the -helical coiled-coil arms via a unique stapling mechanism, resulting in the formation of active trimeric Skp. Native client proteins contact all three Skp subunits simultaneously, and accordingly, their binding shifts the Skp population toward the active trimer. This activation mechanism is shown to be essential for Salmonella fitness in a mouse infection model. The coupled mechanism is a unique example of how an ATP-independent chaperone can modulate its activity as a function of the presence of client proteins.

Molecular chaperones are central for the survival of the cell in all kingdoms of life (12). They are involved in many cellular processes such as helping proteins to fold, preventing protein aggregation, and reducing cellular stress (3). Some chaperones can use adenosine triphosphate (ATP) binding and hydrolysis to trigger conformational changes that, in turn, regulate their functional cycle, including their interaction with client proteins (4). ATP-independent chaperones, in turn, lack this possibility. Nonetheless, some ATP-independent chaperones were found to be regulated by major conformational changes, and the transition mechanisms for the activation of ATP-independent chaperones have been classified into three categories (5): oligomer disassembly [small heat shock protein (sHSP) (6) and trigger factor (TF) (79)], order-to-disorder transition {Hsp33 (10), HdeA [HNS (histone-like nucleoid structuring)dependent expression A] (11), and HdeB (12)}, and lack of major conformational change [spheroplast protein Y (Spy) (13, 14), seventeen kilodalton protein (Skp) (15), HSP40 (16), SecB (17), and survival factor A (SurA) (18)]. These mechanisms of activation are of major biological importance, because constitutively active chaperones can interfere with protein folding processes and proteostasis due to their high affinity and low specificity for client proteins, thus representing a potential hazard to cells (1922). An example of these detrimental effects has been reported for a constitutively active variant of the chaperone Hsp33, which lead to accumulation of large amounts of insoluble aggregates and severe growth disadvantages (20).

Representative of the first category, binding of chaperone sHSP to its client proteins is regulated via a shift from an inactive oligomeric ensemble toward an ensemble of smaller multimers, representing the active species (6). The monomeric species exposes a binding motif that is shielded within the oligomeric structure, making the large oligomeric state an inactive storage form that can be activated upon dissociation (23, 24). Similarly, it has been shown that binding of TF to client proteins is accompanied by a shift from the inactive dimeric state toward the active monomeric state (79). By contrast, the order-to-disorder activation is found for chaperones where the active form is intrinsically disordered. Thereby, to shift from the folded inactive chaperone to the unfolded active chaperone, not only the oligomeric state but also the secondary structure of the chaperone is undergoing change, triggered either by a pH drop to acidic conditions (HdeA and HdeB) or by a redox transition (Hsp33). Once stress factors are reduced, these chaperones can return to their folded/oligomeric inactive state with a release of the client (25). The third category contains chaperones for which only one conformational state is known, and therefore, these are assumed to require no major conformational changes for their activation, as well as chaperones for which activation requires only minor conformational changes. One such example is provided by the chaperone Hsp40, which has minimal structural differences between its client-bound and apo state (16). Another example is given by the chaperone SecB, for which high-resolution structures of client-bound states showed only a minor conformational change to the inactive client-free state (17). In the client-free form, helix 2 acts as a lid of the client protein binding site. Upon client binding, this helix swings outward, thereby allowing access to the client binding groove. Similarly, the chaperone SurA has been shown to have a dynamic mechanism of activation where a domain connected to the chaperone core by linkers assists client protein recognition, binding, and release (18).

The periplasmic chaperone Skp is an integral part of outer membrane protein (Omp) biogenesis, on a parallel pathway with the chaperone SurA. Skp transports Omps in their unfolded state across the periplasm toward their insertion point into the outer membrane (2628). Yersinia skp and Salmonella skp mutants show compromised virulence in rodent infection models, indicating a crucial role of Skp in vivo (29, 30). Skp is structurally characterized by a trimeric oligomeric state with a jellyfish-like architecture (31, 32). Each protomer contributes three -strands toward a nine-stranded -barrel in the trimerization interface and a long, -helical arm, made of two -helices in coiled-coil arrangement (31, 32). The combination of three arms from the individual subunits leads to formation of a cavity that can accommodate and bind unfolded Omps (15, 33).

The elongated arms of Skp are highly flexible in the apo state, and a recent molecular dynamics study has identified a pivot element to act as a hinge, allowing Skp to adapt to clients of different sizes (15, 34). Upon binding, the Skp arms undergo a rigidification and keep the bound Omps inside the cavity in the fluid globule state (15, 35). While Skp can accommodate differently sized protein clients, all functional complexes observed so far feature an Omp:Skp stoichiometry of 1:3 or 1:6, depending on the size of the client, suggesting that Skp binds clients always as a trimer (36). A recent study has emphasized that at physiological concentrations, Skp exists as an equilibrium between a trimeric and a monomeric form (37). The equilibrium was quantified by analytical ultracentrifugation (AUC), showing that the monomeric form is strongly dominant at 2 M Skp, the concentration found in Escherichia coli stationary phase (38, 39). The monomeric form of Skp has been proposed to be well folded based on indirect evidence (37); however, it has so far not been possible to directly analyze its structure, because at the high concentration required for most biophysical methods, the protein is mostly trimeric. Consequently, the structural features of the Skp monomeric state and the Skp activation mechanism remain poorly understood.

Here, we bypass this analytical challenge by introducing several weakly and non-oligomerizing mutants of Skp. We characterize their monomeric states by solution nuclear magnetic resonance (NMR) spectroscopy at the atomic level. The emerging reference data can then be used to fruitfully understand monomeric Skp(WT). The data show that monomeric Skp is intrinsically disordered and inactive and that binding of a client protein triggers Skp trimerization and activation. Last, we demonstrate that this mechanism is essential for bacterial virulence under in vivo conditions in a mouse infection model. The data thus reveal an essential mechanism regulating Skp chaperone activity by a combined disorder-to-order and oligomerization transition.

To prepare samples of monomeric Skp at concentrations sufficient for structural characterization, we set out to design mutants that would destabilize the oligomerization interface to shift the oligomerization equilibrium toward the monomeric form. The structure of trimeric Skp is stabilized by a network of three -sheets per subunit that together form the trimerization interface in the head of the molecule (Fig. 1A). We identified the conserved alanine-103 and alanine-108 as promising candidates, because they are located at the oligomerization interface with limited space for their side chains. Their replacement by a bulkier side chain such as leucine or arginine should introduce steric clashes, leading to destabilization of the trimer (Fig. 1, A and B). In addition, we designed the mutant V117P to insert a proline residue, which is a known secondary structure breaker, into the trimerization -sheet (2). The oligomerization state of each of the Skp mutants Skp(A103L), Skp(A103R), Skp(A108L), Skp(A108R), and Skp(V117P) was determined by SEC-MALS (size-exclusion chromatography coupled to multi-angle light scattering) experiments at an elution concentration of 80 M. At this concentration, the wild-type (WT) protein is mostly trimeric with a monomeric fraction lower than 4%. The mutant A103L was hardly distinguishable from WT, but the other Skp variants featured a gradually increased monomeric fraction, as evidenced by a smaller apparent mass, in the order A103R, A108L, V117P, and A108R (Fig. 1C). Thereby, mutants A108R and V117P were fully monomeric, and the others had effective molecular weights in between monomer and trimer, suggesting the presence of dynamic equilibria. We quantified the concentration dependence of these equilibria for Skp(A103L), Skp(A103R), and Skp(A108L) by solution NMR spectroscopy and SEC-MALS experiments (Fig. 1, C to E; Table 1; and fig. S1). Skp(WT) followed an equilibrium with C0.5 = 1.5 M, the protein concentration at which half of the molecules are in the monomeric form, in agreement with published data (37). Skp(A103L) showed a trimer-monomer equilibrium that was essentially identical to WT (Fig. 1E and fig. S1), whereas Skp(A103R) had C0.5 = 7 2 M and Skp(A108L) had C0.5 = 80 20 M, indicating that these mutations shifted the equilibrium by about one to two orders of magnitude toward the monomer (Fig. 1, D and E, and fig. S1). The two mutants Skp(A108R) and Skp(V117P) were found to be monomeric at concentrations of even up to 1 mM (Fig. 1E and fig. S1).

(A) Location of the mutation sites [red boxes (I), (II), and (III)], displayed on the Skp crystal structure (Protein Data Bank: 1SG2). Secondary structure elements and termini are indicated. (B) Close-up of the interface between Skp subunits, highlighting the position of the five mutations. See text for details. (C) SEC elution profiles (solid lines, left axis) and MALS apparent molecular mass (MM) (dotted lines, right axis) at elution concentrations of 80 M and a temperature of 25C. Dark gray, Skp(WT); brown, Skp(A103L); green, Skp(A103R); blue, Skp(A108L); magenta, Skp(A108R); purple, Skp(V117P). Gray horizontal lines indicate the molecular masses of monomers, dimers, and trimers of Skp. (D) Experiment as in (C) for Skp(A108L) as a function of the elution concentrations: 5, 20, and 79 M. (E) Fractional populations f of monomers in the monomer-trimer equilibrium as a function of total Skp concentration. Experimental data points from NMR and SEC-MALS are indicated by filled and open circles, respectively. These have been fitted by Eq. 4 for mutants A103R and A108L (solid lines). The corresponding fractional populations of the trimeric state, 1 f, are shown by dashed lines. Note that the concentration of Skp trimers equals one-third of the concentration of Skp molecules in the trimeric state, i.e., [Skp]trimer = 1/3 (1 f) [Skp]tot. For Skp(WT) and Skp(A103L), the data follow the WT association constant published by Sandlin et al. (37).

Error estimates have been omitted for clarity. n.d., not determined.

We then characterized the structural integrity of Skp(A103L), Skp(A103R), and Skp(A108L) in their trimeric forms by NMR spectroscopy. For each of these proteins, two-dimensional (2D) [15N,1H]-TROSY (transverse relaxation-optimized spectroscopy) fingerprint spectra show the presence of two species in slow exchange on the NMR time scale, i.e., with kinetic exchange rate constants kex 10 s1 (fig. S1). For each of the mutants, the overlay of the NMR spectra at 25C (fig. S2) shows a high degree of similarity with the WT protein for most resonances, with considerable chemical shift perturbations only for some residues. Those residues are all located in spatial vicinity of the mutation site, in full agreement with the expected local distortion effects of single point mutations (fig. S2). The signals of residues located in the arms are not affected by the mutations, suggesting that symmetry and structural integrity of the trimeric form of the protein are maintained in the mutant. Oligomeric states other than the monomer and the trimer were not detected. The mutations thus shift the oligomerization equilibrium while leaving the trimeric form largely intact.

The mutant Skp(A108L) with a C0.5 of 80 20 M at 25C allowed us to prepare the monomeric state at concentrations of 100 M and above, which is required for solution NMR spectroscopy. The NMR spectra of monomeric Skp(A108L) are completely overlapping with the monomeric, but low-abundant conformation of Skp(WT) (Fig. 2A), indicating that the conformations are essentially identical and thus validating the further analysis. Increasing the temperature from 25 to 37C shifted the equilibrium of Skp(A108L) further toward the monomer, resulting in around 95% monomer at a concentration of 1 mM and thus further increasing the NMR signal intensity (fig. S2). A primary classification of the type of conformational state of monomeric Skp was obtained from the observation of a narrow chemical shift dispersion of backbone amide NMR signals, which is characteristic for proteins with low structural propensity (fig. S3). To quantify the secondary structure elements, we established complete sequence-specific resonance assignments of the monomeric state (fig. S3) and determined backbone 13C and 13C secondary chemical shifts (Fig. 2B). These show that the three -sheets that constitute the oligomerization interface in the trimeric form are in random-coil conformation in the monomeric state. Furthermore, the four -helices forming the arms of Skp are in a fast conformational exchange between folded and unfolded conformations, as evidenced by the observation of a single set of resonances in fast exchange. Taking the fully denatured form of Skp in 8 M urea solution and the folded trimer as reference points, the residual helicity can be quantified for each residue (Fig. 2, B and C). The analysis shows that the helices 1, 3.B, and 4, which are closest to the trimerization interface, feature a residual helicity of <20%, while the helices 2.A, 2.B, and 3.A located at the tip of the arms display a helical population of 20 to 30% (Fig. 2, C and D). Overall, these data show that a small amount of residual -helical structure is present in the disordered Skp monomers, but that the complete formation of the helices requires the trimerization interface. Overall, these data demonstrate that the monomeric state of Skp is intrinsically disordered with some residual helical propensity located at the tip of the arms. In the trimeric structure, the circular-barrel interface, connecting the N- and C-terminal part of the protein, brings helices 2 and 3 close together in space and thus stabilizes their secondary structure (Fig. 2E). This unique mechanism resembles a stapling of the coiled-coil helices to the barrel in the head domain.

(A) Sections of 2D [15N,1H]-TROSY spectra of [U-2H,15N] Skp(WT) (dark gray) and Skp(A108L) (blue) at a concentration of 1 mM and 37C in NMR buffer (20 mM MES, pH 6.5, and 150 mM NaCl). NMR signals of the monomeric state of Skp(WT) are overlaying with the one from Skp(A108L). The assignments of the overlapping NMR signals of the monomeric state are indicated in the panel. (B) Residue-specific secondary backbone chemical shifts of Skp(WT) in 8 M urea solution, Skp(A108L) in its monomeric form, and Skp(WT) in its trimeric form. Positive and negative values indicate -helical and -sheet secondary structure elements, respectively. The gray-shaded area indicates the positions of helices in the Skp trimer. (C) Percentage of helical population in the conformational ensemble of the Skp monomer. Helical regions with 10 to 20% helicity or 20 to 30% helicity are highlighted with light or dark yellow, respectively. (D) Structural model of the Skp monomer. On a configuration of Skp with -helices formed, the degree of residual helical population present in the conformational ensemble is indicated. The large majority of monomeric Skp is disordered. (E) Schematic model of coupled oligomerization and folding mechanism of Skp. Monomeric Skp explores an ensemble of conformations with a low propensity for the formation of the arm -helices. The formation of the oligomerization interface brings the N and C termini together (red arrows), thus stabilizing the coiled-coil structure of the -helical arms.

It has been previously proposed that the monomeric state of Skp would be well folded rather than disordered (37). That conclusion was obtained from indirect measurements of the molar heat capacity change Cp of trimer formation by a vant Hoff analysis of temperature-dependent AUC data, which indicated a value of Cp = 0.62 0.11 kcal mol1 K1 for the Skp monomer-trimer transition. Because the authors expected a value for a coupled folding and oligomerization of Cp = 8.01 3.3 kcal mol1 K1, they concluded that only trimerization, but not folding, would take place during oligomerization. To resolve these different views, we determined Cp of Skp(WT) directly by differential scanning calorimetry (DSC) to Cp = 2.9 0.4 kcal mol1 K1 at 37C (fig. S3). Considering the average residual helicity of 21% in the monomer, this corresponds to a value of 1.1 kcal mol1 K1 for folding of one monomer subunit, which is a similar value to proteins of the same size (40, 41). We note that Cp is strongly temperature dependent (fig. S3), which may have perturbed the precision of the vant Hoff analysis by Sandlin et al. (37).

Having established that Skp activation comprises an equilibrium between a folded trimer and a disordered monomer, it appears relevant to understand how this equilibrium contributes to Skp chaperone activity. As a model client, we use the native client protein tOmpA, an eight-stranded transmembrane domain of OmpA. tOmpA, when bound to Skp, adopts a conformational ensemble of rapidly reorienting conformers (15). To investigate whether tOmpA binds to the trimeric or the monomeric state of Skp, or to both, we used an activity assay with all mutants. In a first step, we measured the chaperone activity by quantifying the amount of Skp-bound tOmpA. Intriguingly, the activity correlated with the concentration of the trimer for all Skp variants, such that, e.g., Skp(A108L) has around 50% of the Skp(WT) activity and that no chaperone activity could be detected for Skp(V117P) and Skp(A108R) (Fig. 3, A and B, and fig. S4).

(A) Holdase activity of Skp variants as determined by the amount of aggregation-prone tOmpA solubilized in equilibrium. Values are normalized to the activity of Skp(WT). Error bars represent the SD of 15 individual signals of tOmpA. (B) 2D [15N,1H]-TROSY fingerprint spectra of [U-2H,15N]-tOmpA bound to unlabeled Skp(WT) or Skp(A108L). Spectra were recorded at a temperature of 37C in NMR buffer (20 mM MES, pH 6.5, and 150 mM NaCl). A 1D 1H cross section shows the intensity of alanine-176. (C) Combined amide chemical shift differences between [U-2H,15N]-Skp(WT) and [U-2H,15N]-Skp(A108L) with bound unlabeled tOmpA. The magnitude of 2 SDs [0.053 parts per million (ppm)] is indicated by a dashed line. (D) Structural model of Skp(108L) with bound tOmpA. Amide groups with chemical shift changes larger than 2 SDs upon binding of tOmpA to Skp(A108L) are marked in light blue. The position of A108 is indicated by a blue circle. (E and F) 2D [15N,1H]-TROSY fingerprint spectra of [U-2H,15N] Skp(A108L) in the absence (E) and presence (F) of unlabeled tOmpA. Spectra were recorded at 37C in NMR buffer. The spectral area 7.5 to 8.5 ppm in 1H, corresponding to disordered protein states, is indicated by gray lines. 1D 1H cross sections of lysine-141 in the monomeric (M) and trimeric (T) state of Skp are shown, and the relative fractions are indicated.

We then selected Skp(A108L) to characterize structure and arrangement of the tOmpA-Skp(A108L) complex. First, addition of tOmpA to Skp(A108L) increases the apparent molecular mass in SEC-MALS experiments (fig. S4). Second, the 2D [15N,1H]-TROSY NMR spectra of isotope-labeled tOmpA bound to unlabeled Skp(A103L), Skp(A103R), Skp(A108L), or unlabeled Skp(WT) are highly similar (Fig. 3B and fig. S4). Because the chemical shift is a population-weighted average over the individual conformers in the tOmpA ensemble, this observation indicates that the client conformational ensemble inside the chaperone is essentially unperturbed by the local structural adaptations, resulting from the mutation A103L, A103R, or A108L. Third, a direct spectral comparison showed that the chemical shift perturbations that occur on the Skp trimeric state upon tOmpA binding are highly similar for Skp(WT), Skp(A103L), Skp(A103R), and Skp(A108L) (Fig. 3, C and D, and fig. S4). Identically to the apo state, only one set of NMR signals is present for the trimeric state, showing that the complex with tOmpA does not involve other stable oligomeric states (Fig. 4, D to G). Furthermore, for all mutants with a considerable population of the trimeric state, binding of tOmpA induces similar chemical shift perturbations, confirming a similar mode of binding (Fig. 3, C and D, and fig. S3). As a consequence, the structural description that was previously established for the Skp-tOmpA complex (15) can be assumed in good first-order approximation also for Skp(A108L), although the thermodynamics and kinetic of the ensemble are somewhat different (Fig. 3, B to D).

(A) Fitness of Salmonella strains with various chromosomal skp mutations in rich lysogeny broth. Data for individual cultures and means are shown. (B) Fitness of Salmonella strains in a mouse infection model. Each circle represents data for one mouse from a total of two independent infection experiments (****P < 0.0001 and ***P < 0.001; statistical significance of difference to values for WT based on t test with Holm-dk correction for multiple comparisons). Corresponding competitive index data are shown in fig. S4. (C) Functional cycle of Skp. In the absence of client proteins, Skp populates the periplasm in monomeric form up to low micromolar concentrations. These partially disordered monomers are functionally inactive. An emerging Omp client at the inner membrane recruits an active trimeric chaperone from the ensemble equilibrium. Upon release of the client, trimeric Skp dissociates and the monomers enter the pool of inactive disordered conformations. See text for details.

Then, we investigated the effect of tOmpA binding on the Skp monomer-trimer equilibrium at a temperature of 37C, where Skp(A108L) is more than 80% in its monomeric state and Skp(A108R) and Skp(V117P) are completely monomeric (Fig. 3, E and F, and fig. S4). For Skp(A108L), binding of tOmpA resulted in a strong shift of the population levels from the monomeric toward the trimeric state, while no change was observed for Skp(A108R) and Skp(V117P) (Fig. 3, E and F, and fig. S4). Furthermore, for all Skp variants with considerable population of the monomeric state, the NMR signal positions of the monomeric state were not perturbed by the addition of tOmpA, confirming that there is no detectable interaction between monomeric Skp and the Omp client (fig. S3). This is an additional proof that only the structured trimer, but not the disordered monomer, has chaperone activity.

Because a bound tOmpA client is in direct contact with all three arms of Skp simultaneously (15), client binding contributes by avidity to the thermodynamic stability of the trimeric state of the chaperone. We quantified the difference in free energy of apo-Skp(WT) in comparison to tOmpA-Skp(WT) by a denaturation titration (fig. S4). Binding of tOmpA to Skp(WT) increased its stability by 1.7 kJ mol1, corroborating the stabilization effect of the trimeric state by the binding of its client protein. Overall, the data show that monomeric, disordered Skp does not interact with the Omp client and that client binding increases the stability of the Skp trimer by avidity, thus shifting the conformational equilibrium toward the trimeric state.

Skp is dispensable for growth of various bacterial species under rich laboratory conditions. However, bacterial pathogens such as Yersinia and Salmonella require Skp for growth in hostile host tissue. To determine whether the Skp activation mechanism that we identified is important under these physiologically relevant conditions, we engineered analogous point mutants in Salmonella enterica serovar Typhimurium. Salmonella Skp is highly homologous to E. coli Skp, with 91% identity (fig. S4). We selected three of the mutations for these experiments, the two mildest ones A103R and A103L, as well as V117P, and also engineered a strain with complete genetic deletion of the skp gene (skp). As expected, neither the point mutants nor a full skp deletion affects Salmonella fitness in rich lysogeny broth (Fig. 4A and Table 1). We then tested the same mutants in competitive infections in a mouse typhoid fever model. In competitive infections, mice are infected with a mixture of WT and mutant strains. Plating of bacteria retrieved from spleen of these mice yields the fitness of mutants relative to the WT bacteria in each mouse. This approach reduces interindividual variance and offers higher statistical power with limited numbers of experimental animals compared to single-strain infections. The data reveal a slight but significant fitness defect of Salmonella skp(A103L) compared to WT and strong fitness defects for mutants skp(A103R) and skp(V117P), which are comparable to the full skp deletion (Fig. 4B and Table 1; competitive index data in fig. S4). These results show that already subtle perturbations of the Skp monomer-trimer equilibrium diminish Skp function in vivo and that perturbation of this equilibrium by less than an order of magnitude in C0.5 completely abolishes Skp function, rendering bacteria nonvirulent.

In this work, we have elucidated the activation mechanism of the molecular chaperone Skp at atomic resolution. The monomer state of Skp is intrinsically disordered, with a limited residual propensity of -helicity in the coiled-coil tentacle arms. This low inherent stability of helices 2 and 3 is particularly interesting, because they are not involved in inter-subunit contacts in the trimer structure. The formation of the head domain trimer merely fixes the positions of the end points of the -helices in space, thus stabilizing them by reducing the conformational entropy of the unfolded state. This unique mechanism resembles a stapling of the coiled-coil helices to the barrel in the head domain. A directly related effect is being exploited in peptide chemistry to stabilize helical conformation of small peptides by a suitably chosen covalent circularization, the so-called stapled peptides (42). Furthermore, because the tOmpA client is in simultaneous direct contact with all three Skp subunits, its binding stabilizes and shifts the oligomer equilibrium of Skp toward the trimeric state. Last, the disordered Skp monomer does not exhibit chaperone activity.

These mechanistic insights integrate into an improved picture of the functional cycle of Skp in the bacterial periplasm (Fig. 4C). Monomeric, disordered Skp molecules populate the periplasmic space. As soon as a client protein emerges from the Sec translocase, the inactive monomers fold and assemble into a trimeric state around the unfolded client protein. Skp directly or indirectly transports the chain to the Bam complex for folding and insertion in the membrane and possibly also to DegP for degradation (27, 43). The exact mechanism of client release is not understood, but besides direct migration to a higher-affinity target, one exciting possibility to enhance the release kinetics could be a destabilization of the oligomeric state of Skp or a stabilization of the monomeric state of Skp in the vicinity of the downstream receptor of the substrate. This may include negative charges on membranes or BamA (36, 44, 45). After client release, the disordered Skp monomers enter the periplasmic reservoir of individually inactive chaperones. The absence of a chaperoning activity of the monomer ensures that only Skp molecules with complete cavity bind clients, providing maximal chaperoning effect in an all-or-none fashion. At the same time, it introduces a directionality of the chaperoning effect toward the center of the cavity, avoiding spurious binding effects that would not be directed into the Skp cavity. These could potentially destabilize periplasmic proteins that are not intended client proteins. Last, the disordered nature of monomeric Skp might facilitate its import into the periplasm through the Sec complex upon its own biogenesis. Additional impact for this type of activation mechanism comes from a direct comparison to the activation mechanism of the chaperone SurA (18). SurA is constitutively active with just a dynamic modulation of its activity upon rotation of a domain connected by linkers to its chaperone core, i.e., its activity is only weakly regulated (18). Skp activity, in turn, is strongly regulated, with a switch between a completely inactive and an active state, as shown in this work. This stark difference matches a fundamental difference in function of these two periplasmic chaperones. Skp has high affinity for its client proteins and a strong tendency to prevent their folding and therefore presumably requires to be tightly regulated to avoid unspecific chaperone activity under no-stress conditions, whereas SurA binds unfolded OMPs with lower affinity while promoting their folding and therefore presumably does not require a strong regulation of its chaperone activity (15, 4649).

The Skp activation mechanism provides an elegant example how a chaperone can regulate its functional cycle in an environment depleted of any source of energy. For ATP-independent chaperones, only three types of activation mechanisms have so far been described: an order-to-disorder transition [Hsp33 (10), HdeA (11), and HdeB (12)], oligomer disassembly [sHSP (6) and TF (79)], and no or minor conformational change [Spy (13, 14), HSP40 (16), SecB (17), and SurA (18)]. Skp is the first chaperone found to feature these activation mechanisms in the opposite direction and even combine them, i.e., by a disorder-to-order transition that is coupled to oligomerization. The high (nM)affinity Skp has for its client proteins and the strong tendency to prevent their folding could represent a potential hazard to the cell (15, 49). The coupled folding and oligomerization mechanism ensures that holdase function is only present in the trimer where it is geometrically oriented only toward the chaperone cavity. Under nonstressed conditions, Skp exists as an inactive disordered monomer with a minor population of active folded trimer to avoid detrimental effect for the cells. At the opposite, under stress conditions, up-regulation of the Skp concentration and binding to client proteins shift the equilibrium toward the trimeric folded active state, protecting the cells by preventing aggregation of unfolded protein. While most chaperones use strategies to cover a preexisting client binding site in their inactive state, Skp has thus evolved a more extreme mechanism where the client binding area exists only in the active state. This strong regulation allows the tight control of Skp activity while providing at the same time a fast mechanism for client release upon dissociation into the disordered monomeric state. The chaperone activity of Skp is thus regulated in dynamic response to chaperone concentration and client availability.

Skp, lacking its signal sequence, was cloned from genomic DNA through Nde I and Xho I into the pET28b expression vector (Novagen) containing a thrombin-cleavable N-terminal His6-tag (15). Skp was expressed in BL21-( DE3)-Lemo cells [New England Biolabs (NEB)] transformed with the Skp plasmid and grown at 37C in M9 minimal medium containing kanamycin (30 mg/ml) to OD600 (optical density at 600 nm) = 0.6, and then the expression was induced by adding 0.4 mM isopropyl--d-thiogalactopyranoside (IPTG) at 25 for 12 hours. Uniformly [2H, 13C, 15N]-labeled protein was prepared by growing cells in D2O-based M9 minimal medium, with 1 g of 15NH4Cl and 2 g of [U-13C,2H] glucose per liter of medium. Cells were harvested by centrifugation at 5000g for 20 min. The pellet was resuspended in 20 ml of lysis buffer A per liter of culture [20 mM tris (pH 7.5), 500 mM NaCl, deoxyribonuclease (DNase) (0.01 mg/ml), ribonuclease (RNase) (0.02 mg/ml), and inhibitor cocktail (cOmplete EDTA-free protease inhibitor; Roche)]. Cell lysis was performed using a microfluidizer (Microfluidics) for three cycles at 4C. The soluble bacterial lysate was separated from cell debris and other components by centrifugation at 14,000g for 60 min and loaded onto a Ni-NTA (nitrilotriacetic acid) column (Qiagen). Skp eluted at 250 mM imidazole concentration and was dialyzed against buffer [20 mM tris (pH 7.5) and 500 mM NaCl] overnight to remove the imidazole. In a final step, a size exclusion chromatography (Superdex-200 16/600 PG) step was applied to further purify the proteins and adjust the protein to its final buffer [20 mM MES (pH 6.5) and 150 mM NaCl]. Note that the His6-tag was consistently not cleaved from all Skp constructs, because in both our hands and published work by others (37), the presence of the His6-tag was found to not change the monomer-trimer equilibrium constant and because monomeric, disordered Skp was found to be sensitive to proteolytic degradation. Afterward, Skp was concentrated by ultrafiltration and stored at 20C until use. Final yield of purified protein was 25 mg for Skp(WT) and mutants per liter of deuterated M9 minimal medium.

The transmembrane domain of OmpA (residues 1 to 177) was cloned through Nco I and Xho I into the pET28b expression vector without any affinity tag and lacking its signal sequence (15). BL21-( DE3)-Lemo cells (NEB) were transformed with the tOmpA expression plasmid and grown at 37C in medium containing kanamycin (30 g/ml) to OD600 = 0.8. Expression was induced by 1 mM IPTG. Cells were harvested 4 hours after induction and resuspended in 20 ml of buffer B per liter of culture (20 mM tris-HCl and 5 mM EDTA, pH 8.5). Cell lysis was performed using a microfluidizer (Microfluidics) for three cycles at 4C. Purification from inclusion bodies was done as described (50). The ion-exchange elution fractions containing tOmpA were pooled and dialyzed against buffer B. The precipitate was resuspended in 6 M Gdm/HCl and stored at 20C until usage. Final yield of purified protein was 50 mg of tOmpA per liter of deuterated M9 minimal medium.

The QuikChange II mutagenesis protocol (Stratagene) was used to introduce the mutations A108L, A108R, A103L, A103R, or V117P into Skp. Polymerase chain reaction (PCR) primers (Table 2) were obtained from Microsynth. The expression and purification of the mutant proteins was performed as described for the WT proteins. The final yield of purified mutants was similar to WT.

Salmonella strains used in this study were based on S. enterica serovar Typhimurium SL1344 hisG xyl (51, 52). Salmonella mutants with gene deletions were obtained by two consecutive single crossovers with positive selection for resistance to kanamycin and negative selection for levansucrase-mediated sensitivity to sucrose. Salmonella was grown in lysogeny broth containing NaCl (5 g/liter; Lennox LB). Each strain was transformed with a low-copy plasmid expressing a distinct fluorescent protein (mtagBFP2, mNeonGreen, YPet, or mCherry). These plasmids have no impact on in vivo fitness (53, 54). All animal experiments were approved (license 2239, Kantonales Veterinramt Basel) and performed according to local guidelines (Tierschutz-Verordnung, Basel) and the Swiss animal protection law (Tierschutz-Gesetz). Eight 10- to 16-week-old female BALB/c mice (Charles River Laboratories) were infected by tail vein injection of mixtures containing WT Salmonella and different combinations of three mutants with about 1000 colony-forming units (CFU) each per strain. The exact inoculum size for each strain was determined by plating. After 4 days, mice were euthanized with carbon dioxide and Triton X-100 detergenttreated spleen homogenates were prepared as described previously (55). Total Salmonella loads were determined by plating dilution series on agar plates. Mutant-to-WT ratios were determined by flow cytometry counting of bacterial cells falling into gates indicative for the various fluorescent proteins using optical filters (55). Fitness was calculated as log2(FI), with FI corresponding to the fold increase starting from the initial spleen colonization [around 20% of the inoculum (56)] to the final spleen load for each strain. The relative fitness value of co-administered WT Salmonella was set to 100%. We also determined the more commonly used readout competitive index by dividing the output ratio (mutant/WT) by the inoculum ratio (mutant/WT).

Complex assembly was carried out following a modified version of the protocol published by Burmann et al. (15). A 1.5 M excess of denatured tOmpA was added to Skp(WT) or mutants in 20 ml of assembly buffer [20 mM MES (pH 6.5) and 150 mM NaCl] in a dropwise fashion under continuous stirring. The solution was then stirred for another 1 hour to ensure saturation of the chaperones. After centrifugation at 10,000g for 30 min, the supernatant fraction, containing the Skp-tOmpA complexes, was separated from the pellet, containing the precipitated tOmpA. The supernatant was exchanged by ultrafiltration to NMR buffer [20 mM MES (pH 6.5) and 150 mM NaCl], and after concentration, the volume was adjusted to 250 l. The chaperone activity of Skp(WT) and mutant was determined by quantifying the NMR signals in 2D [15N,1H]-TROSY spectra of [U-2H,15N]-tOmpA bound to unlabeled Skp. Control sample of [U-2H,15N]-tOmpA in NMR buffer was prepared following the reference protocol, showing that, in the absence of the functional Skp(WT), less than 2% of [U-2H,15N]-tOmpA signals were observed in comparison to [U-2H,15N]-tOmpA bound to Skp(WT).

All NMR experiments for Skp-Omp complexes were performed in NMR buffer [20 mM MES (pH 6.5) and 150 mM NaCl]. The experiments were recorded at the specified temperature on a Bruker AscendII 700 MHz or Avance 800 MHz spectrometer running Topspin 3.0 and equipped with a cryogenically cooled triple-resonance probe. For the sequence-specific backbone resonance assignment of [U-99% 2H, 13C, 15N]-Skp(A108L), the following NMR experiments were recorded at 37C: 2D [15N,1H]-TROSY, 3D TROSY-HNCA, 3D TROSY-HNCACB, 3D TROSY-HNCO, and 3D TROSY-HN(CA)CO. NMR data were processed with nmrPipe (57) and analyzed with CARA and ccpnmr (58). Secondary chemical shifts were calculated relative to the random-coil values of Kjaergaard and Poulsen (59). For the backbone assignment of the unfolded [U-2H,15N,13C]-Skp(WT), automated projection spectroscopy (APSY) experiments were recorded in NMR buffer [20 mM MES (pH 6.5) and 150 mM NaCl] containing 8 M urea at 15C. The 5D APSY-HNCOCACB (60) was recorded with 54 transients for Skp, two scans per transient, 0.7-s recycle delay, and 1024 150 complex points in the direct and indirect dimensions. The 4D APSY-HNCACB (60) was recorded with 46 transients, two scans per transient, 0.7-s recycle delay, and 1024 180 complex points in the direct and indirect dimensions, respectively. The GAPRO (geometric analysis of projections) (60) analysis of the projection spectra was carried out with = 5.0 Hz, Rmin = 15.0 Hz, S/N = 7.0, and Smin,1 = Smin,2 = 8 for the 5D APSY-HNCOCACB and with = 5.0 Hz, Rmin = 15.0 Hz, S/N = 10.0, and Smin,1 = Smin,2 = 15 for the 4D APSY-HNCACB. As the signals for glycine residues within the 4D APSY-HNCACB and the signals of residues succeeding glycines within the 5D APSY-HNCOCAB have a different sign than the other resonances, the GAPRO algorithm was run twice for positive and negative peaks, respectively, and the two resulting peak lists were combined. The combined peak lists were assigned by using the newest version of the MATCH algorithm within the UNIO10 software package, yielding a 65% complete assignment for Skp. By using a conventional 3D TROSY-HNCACB experiment, complete backbone assignment for Skp could be obtained. NMR data were processed using PROSA (61) and analyzed with CARA and XEASY. Combined chemical shift differences of the amide resonances in 2D [15N,1H]-TROSY spectra were calculated asHN=((H1))2+(0.2(N15))2(1)

SEC-MALS measurements of Skp were performed at 25C in NMR buffer [20 mM MES (pH 6.5) and 150 mM NaCl] using a GE Healthcare Superdex-200 Increase 10/300 GL column on an Agilent 1260 high-performance liquid chromatography. Elution was monitored using an Agilent multi-wavelength absorbance detector (data collected at 280 and 254 nm), a Wyatt Heleos II 8+ multiangle light-scattering detector, and a Wyatt Optilab rEX differential refractive index detector. The column was equilibrated overnight in the running buffer to obtain stable baseline signals from the detectors before data collection. Inter-detector delay volumes, band broadening corrections, and light-scattering detector normalization were calibrated using an injection of bovine serum albumin solution (2 mg/ml; ThermoPierce) and standard protocols in ASTRA 6. Weight-averaged molar mass, elution concentration, and mass distributions of the samples were calculated using the ASTRA 6 software (Wyatt Technology).

DSC data were acquired using a Microcal VP-Capillary DSC instrument (Malvern Panalytical, Malvern UK) at a Skp trimer concentration of 24.4 M (i.e., 73 M concentration in terms of monomer). After centrifugation, protein concentration was determined by ultraviolet spectrophotometry using a molar extinction coefficient of 4470 M1 cm1 at 280 nm for the trimer and correcting for minor scattering contributions to apparent absorbance. The Skp sample was scanned from 15 to 105C at a scan rate of 1C/min, and data points were acquired at 0.1C increments. Multiple buffer versus buffer scans, performed before the sample scan to establish the instrumental heat capacity baseline, were averaged and subtracted from the sample scan data, which were then normalized to excess molar heat capacity using the trimer concentration. Attempts to fit the complex thermogram with standard models of oligomer dissociation and denaturation proved unsuccessful, so Cp for folding was estimated from the difference between the slopes of the excess molar heat capacity in low- and high-temperature regions (the apparent pre- and post-transition baselines), fitted by linear regression, and extrapolated to the temperature of interest.

The chemical equilibrium between trimeric and monomeric Skp can be described by the reaction3SmSt(2)where the equilibrium constant L13 is given byL13=[St][Sm]3(3)where [St] and [Sm] are the molar concentrations of Skp trimers and free Skp monomers, respectively, and L13 has units of M2. For this equilibrium, the concentration of trimer [St] as a function of total Skp [S0] is given by Sandlin et al. (37)[St]([S0])=[S0]3+(23+)13+(23+)13(4)where , , and are given by=9L13[S0]2+1L13(5)=[S0]2981(6)=[S0]318[S0]162(7)

The fraction of total Skp protein that is trimeric at any total Skp concentration equalsfSt=3[St][S0](8)and the fraction of total Skp protein that is monomeric equalsfSm=1fSt(9)

In SEC-MALS experiments in equilibrium situations, the detected molar mass represents the concentration-weighted average mass of the species involvedMw=(ciMi)(ci)(10)where ci is the mass concentration and Mi is the molar mass of the ith species. Therefore, for the monomer-trimer equilibrium, by comparison with the limits for the completely monomeric or trimeric species, the weight-averaged mass reports directly on the fractional populations asfSm=MobsMStMSmMSt(11)where Mobs is the detected weight-averaged mass, and MSm and MSt are the detected masses of the completely monomeric and trimeric state, respectively.

For the estimation of the population of monomeric and trimeric states for the WT and mutants, the residue lysine-141 was chosen, because its signals are well resolved in each state and it is located in a nonstructured, locally flexible region in the trimer. The fractions were estimated by calculating the ratio of the intensity of the signals in the monomeric and trimeric state according to the equationfSm=ISmISm+ISt(12)where ISm and ISt are the intensity of the residue lysine-141 in the monomeric and trimeric state, respectively. Similarly, for the denaturation titration, the fractions of folded and unfolded Skp were determined from the signals of residue lysine-141, and for each titration point, G was calculated assuming a two-state model according to the equationG=RTlnfStfSm(13)

The data were fitted by linear regression, and G was extrapolated to a concentration of 0 M urea.

Acknowledgments: We thank C. Johnson for help in setting up the DSC experiments and the Biophysics Facility of the MRC Laboratory of Molecular Biology, Cambridge, for access to the DSC instrument. Funding: This work was supported by the Swiss National Science Foundation (grants 310030B_185388 and 407240_167125 to S.H. and 310030_182315 to D.B.). Author contributions: G.M., S.H., and D.B. designed the study, analyzed the data, discussed the results, and wrote the paper. G.M. and T.S. conducted the SEC-MALS experiments. T.S. conducted the DSC experiment. B.M.B. conducted the assignment of urea-unfolded Skp(WT). B.C. engineered the Salmonella mutants and conducted the mouse experiments. G.M. conducted all other experimental work. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors. Sequence-specific resonance assignments have been submitted to the Biological Magnetic Resonance Data Bank under the following accession codes: Skp(WT) in 8 M urea, 26613; monomeric Skp(A108L), 50195.

See the article here:
Regulation of chaperone function by coupled folding and oligomerization - Science Advances

Discovery of a previously unknown biosynthetic capacity of naringenin chalcone synthase by heterologous expression of a tomato gene cluster in yeast -…

Posted on October 31, 2020 by Prof Baldwin

INTRODUCTION

Plant specialized metabolism is a rich source of structurally and functionally diverse small molecules, also known as plant natural products. These specialized metabolites play important roles in plant communication and defense and have been widely applied as phytomedicines, antibiotics, antivirals, nutraceuticals, and cosmetics (1, 2). Recent developments in synthetic biology and metabolic engineering have enabled the assembly and expression of plant genes in heterologous hosts as a sustainable and efficient alternative for production of complex chemicals, including plant natural products and their synthetic derivatives (3, 4). However, the broader potential of these engineering efforts is challenged partially due to our limited knowledge of plant biosynthetic pathways and associated enzyme activities.

The elucidation of plant specialized metabolic pathways has been challenging, particularly in comparison to the elucidation of natural product pathways in microbes. In part, this has been due to the differences in the genomic organization of these pathways, where the genes encoding the biosynthetic pathway in plants are generally dispersed across the plant genome, whereas, in contrast, those in microbes tend to be tightly clustered in operons. However, recent work has revealed that certain genes constituting a number of plant natural product pathways are colocalized in the genome in operon-like structures. These plant biosynthetic gene clusters range from ~35 to several hundred kilobases (i.e., 3 to more than 10 genes) in size (5) and comprise genes that are physically colocalized and potentially coregulated. These gene clusters encode species-specific and/or specialized biochemical pathways modifying metabolites from primary metabolism, contributing to the vast chemical space present in the plant kingdom (6). Characterization of putative gene cluster activities and their resulting products assisted by genome mining and analytical chemistry may thus provide an abundant source for the discovery of enzyme activities and compound structures (7, 8).

Gene cluster prediction in plants has been challenging because plant genomes are larger than those of bacteria and fungi, and plant genes are sparsely distributed along the genome, separated by a substantial amount of intergenic, noncoding sequences (7). A general approach for identifying plant gene clusters involves defining a cluster core by searching for backbone-generating enzymese.g., nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), hybrid NRPS-PKS, and terpene synthasefrom genome sequences and then expanding the cluster components based on catalytic domain analysis, physical colocalization, gene coexpression, and/or shared regulatory patterns (7, 8). Recently developed cluster-mining algorithms such as PhytoClust (9), PlantiSMASH (10), and PlantClusterFinder (11) have demonstrated automated detection of hundreds to thousands of putative gene clusters from various plant genomes.

Despite the increasing number of putative plant biosynthetic gene clusters arising from computational prediction tools, characterizing the potential functionality of these clusters and associated enzymes in their host organisms has been limiting. In particular, in planta pathway characterization can be hindered by cryptic pathway gene expression, low concentrations of targeted compounds embedded in complex mixtures, and difficulties in genetically manipulating the native host for cluster activation (7). Facilitated by well-developed tools for genetic manipulation and pathway expression, bakers yeast (Saccharomyces cerevisiae) has proven to be a powerful platform for expression of heterologous gene clusters. Previous research has used yeast to characterize the biosynthetic activities of several gene clusters from various plant species, including triterpene biosynthetic clusters from Arabidopsis thaliana (12), a 10-gene noscapine-producing cluster from poppy (Papaver somniferum) (13), partial pathway genes for vinblastine and vincristine biosynthesis from Madagascan periwinkle (Catharanthus roseus) (14), cucurbitacin from cucurbit (Cucurbitaceae) (15), and a cyanogenic glycoside biosynthetic cluster from sorghum (Sorghum bicolor) (16). In these earlier studies, the previously identified plant gene clusters were heterologously expressed in yeast to validate the production of the compounds as expected from their plant hosts.

In this work, we use yeast as a plant natural product discovery platform to characterize the biosynthetic potential of a putative tomato gene cluster predicted from PlantClusterFinder (11), the activity of which has not been reported previously. By coexpressing the cluster genes with an early-step flavonoid pathway gene in yeast, we identified two previously unknown compounds in the yeast culture when fed p-coumaric acid, specifically 3-hydroxyanthranilic acid (3-HAA) methyl ester (1) and a hydroxycinnamic acid amide (HCAA) compound, dihydro-coumaroyl anthranilate amide (2) (Fig. 1A). Further analysis confirmed that a methyltransferase (SlMT2) catalyzes the conversion of 3-HAAa native yeast metabolite involved in tryptophan metabolismto (1), and a naringenin chalcone synthase (SlCHS) catalyzes the condensation of (1) and p-dihydro-coumaroylcoenzyme A (CoA), reduced from p-coumaroyl-CoA by a yeast endogenous enoyl-CoA reductase (ECR), leading to production of (2). Knocking out the native ECR in yeast restored the production of an oxidized form of (2), coumaroyl anthranilate amide (3). Our characterization results reveal a previously uncharacterized amide synthesis activity for SlCHS. In vivo site-directed mutagenesis results suggest that SlCHS uses the same active site for synthesis of (3) and for canonical synthesis of naringenin chalcone. Our work demonstrates the potential of yeast as a characterization tool for computationally aided discovery of compound structures and enzymatic activities from plant genomes.

(A) Discovery of two previously unidentified compound structures by heterologous expression of genes from tomato cluster in yeast. Gene color: red, putative gene cluster; white, plant flavonoid pathway. (B) Validation of (1) and (2) production in yeast. CEN.PK2, wild-type yeast strain; CSY1210, strain expressing SlCHS, SlCYP, and SlMT1/2/3. (C) Characterization of (1) and (2) production with individual tomato methyltransferases in yeast. SlCHS and SlCYP are coexpressed with SlMT1 (CSY1301), SlMT2 (CSY1302), or SlMT3 (CSY1303). (D) Summary of compound production with SlMT1/2/3. (E) Proposed pathway for biosynthesis of (1) and (2) in yeast. Enzyme color: red, tomato; yellow, yeast. (F) Proposed activity of SlCHS in TSC13 knockout strains. (G) Summary of compound production by TSC13 knockout strains. TIC, total ion chromatogram; EIC, extracted ion chromatograms; ** indicates a thorough MS scan from m/z 10 to 168.0 or 316.1. +/ indicates the presence/absence of a gene or a gene fragment. Data show the mean of two biologically independent replicates, with error bar the indicating SD. Compound color: purple, (1) methyl 3-hydroxyanthranilic acid; blue, (2) dihydro-coumaroyl anthranilate amide; green, (3) coumaroyl anthranilate amide. Enzyme abbreviations: SlMT2, methyltransferase 2; Sl4CL, 4-coumarate-CoA ligase; SlCHS, naringenin chalcone synthase; ATR1, NADPH-cytochrome P450 reductase 1; ECR, enoyl-CoA reductase.

Our study investigated the biosynthetic potential of a tomato-derived putative gene cluster that was predicted to produce hydroxylated naringenin chalcone and/or methyl esters of hydroxylated naringenin chalcone, natural compounds that are found in tomato but without an elucidated pathway for biosynthesis (11). The putative tomato gene cluster predicted from PlantClusterFinder [referred to as C584_4 (11)] consists of a CHS (SlCHS, SOLYC09G091510), a putative cytochrome P450 (SlCYP, SOLYC09G091570), and three methyltransferases (SlMT1/2/3; SOLYC09G091530, SOLYC09G091540, and SOLYC09G091550). SlCHS is a well-studied type III PKS, which is known to sequentially condense one p-coumaroyl-CoA and three malonyl-CoA molecules to make naringenin chalcone, the first committed intermediate in the biosynthesis of flavonoids and anthocyanins (17). Among the three methyltransferases, SlMT3 was previously characterized as a putative salicylic acid methyltransferase potentially regulating tomato hormone emission (18). To our knowledge, no studies have been reported characterizing SlMT1, SlMT2, and SlCYP from the cluster.

We examined the biosynthetic capacity of the predicted tomato gene cluster in yeast. Yeast expression cassettes for complementary DNAs encoding the five genes identified in the cluster (SlCHS, SlCYP, and SlMT1/2/3) were designed and assembled into a yeast artificial chromosome and transformed into a wild-type yeast strain (CEN.PK2), resulting in yeast strain CSY1210. Two additional enzymes supporting the putative pathway enzymes were expressed in CSY1210 from low-copy plasmids: (i) a yeast codon-optimized 4-coumarateCoA ligase from tomato (Sl4CL), a precursor-producing gene from the flavonoid pathway, and (ii) an Arabidopsis NADPH-cytochrome P450 reductase (AtATR1), a reductase partner to support the activity of the putative cytochrome P450 (SlCYP). We cultured CSY1210 transformed with the additional plasmids and a control strain (transformed with the plasmids but not harboring the reconstructed tomato cluster) in synthetic dropout media supplemented with 100 M p-coumaric acid (the substrate for Sl4CL) for 72 hours at 25C and analyzed the yeast media. The metabolites produced by the strain harboring the reconstructed tomato cluster were identified using an untargeted metabolomics analysis by qToF-MS (quadrupole time-of-flight hybrid mass spectrometry) (with a mass accuracy at 50 parts per million).

We observed two differential peaks representing compounds only produced in the strain harboring the reconstructed tomato cluster, one at mass/charge ratio (m/z) 168.0655 ([M + H]+) (1) and the other at 316.1179 ([M + H]+) (2) (fig. S1, A and B). To validate production of the two compounds in yeast, we analyzed the yeast culture media for production of (1) and (2) on liquid chromatographytandem MS (LC-MS/MS). A product ion scan with a precursor ion set at 168.0 m/z showed two peaks at retention times of 4.291 and 5.872 min, respectively, and a product ion scan with a precursor ion set at 316.1 m/z showed a single peak at 5.872 min (Fig. 1B). On the basis of retention times and fragmentation patterns of (1) and (2) from qToF-MS analysis (fig. S1, A and B), we hypothesized that the peak at 4.291 min corresponds to (1) and that the peak at 5.872 min (for both precursor ion settings) corresponds to (2).

We next identified the genes from the predicted tomato cluster and supporting flavonoid pathway (i.e., Sl4CL and AtATR1) that participated in the production of (1) and (2) in yeast. We first examined whether the methyltransferases individually participated in the biosynthesis of (1) and (2). To enable stable expression of the gene cassettes, Sl4CL, SlCHS, and SlMT1/2/3 were chromosomally integrated into the wild-type yeast strain (CEN.PK2) such that each engineered strain harbors Sl4CL, SlCHS, and one of the methyltransferasesleading to construction of CSY1301 (SlMT1), CSY1302 (SlMT2), and CSY1303 (SlMT3). As a control, we eliminated SlCYP (and AtATR1) from the integration to isolate their functions in compound synthesis. We cultured the strains in synthetic complete media supplemented with 100 M p-coumaric acid for 72 hours at 30C and analyzed the yeast culture media for production of (1) and (2). A product ion scan on LC-MS/MS with precursor ion set at 168.0 showed two peaks for SlMT1 and SlMT2 transformants at 4.324 and 5.864 min, respectively (Fig. 1C). A product ion scan with a precursor ion set at 316.1 showed a single peak at 5.864 min for SlMT1 and SlMT2 transformants (Fig. 1C). As previously hypothesized, the peak at 5.864 min detected at 168 m/z may be a molecular fragment of (2). Production of (1) and (2) in the absence of SlCYP (and AtATR1) indicates that SlCYP and AtATR1 are not involved in the production of the compounds. From the data, we observed production of (1) and (2) in both CSY1301 and CSY1302, and the product ion detected in CSY1302 was 14-fold greater than that in CSY1301 (Fig. 1D). The results indicate that SlMT1 and SlMT2 participate individually in the production of (1) and (2) and that SlMT2 leads to ~21-fold higher level of (1) and ~14-fold higher level of (2) than SlMT1. Since the activities of SlMT1 and SlMT2 appear to be redundant in the context of characterizing the production of (1) and (2), we focused on the activity of SlMT2 for subsequent characterizations. Together, the results of methyltransferase characterizations revealed that (1) and (2) can be produced from a minimal set of genes consisting of Sl4CL, SlCHS, and SlMT2.

We next elucidated a biosynthetic scheme for the synthesis of (1) and (2) in yeast. Low-copy plasmids encoding the expression of Sl4CL, SlCHS, and SlMT2 were cotransformed in different combinations into yeast, and the production of (1) and (2) were monitored in the presence and absence of fed p-coumaric acid after 72 hours of growth at 30C (table S1). We first coexpressed the three genes with or without fed p-coumaric acid (groups 1 and 2). We then coexpressed all pairs of genes, e.g., SlCHS and SlMT2, SlMT2 and Sl4CL, and SlCHS and Sl4CL with fed p-coumaric acid (groups 3 to 5). Last, we expressed each single gene in the absence of fed p-coumaric acid (groups 6 to 8). We observed that (i) the removal of fed p-coumaric acid eliminates the production of (2) (groups 1 and 2), (ii) the removal of the expression of Sl4CL or SlCHS eliminates the production of (2) (groups 3 and 4), (iii) the removal of the expression of SlMT2 eliminates the production of both (1) and (2) (group 5), and (iv) the single expression of SlMT2 without fed p-coumaric acid leads to production of (1) (groups 6 to 8). The observations (i) and (ii) indicate that p-coumaric acid is a precursor for the production of (2), and both Sl4CL and SlCHS are required for the production of (2). The observations (iii) and (iv) indicate that SlMT2 is responsible for the production of (1), which is independent of fed p-coumaric acid, and that (1) is likely a substrate for the production of (2).

On the basis of the production patterns of (1) and (2) under different enzyme combinations, we proposed the sequencing of intermediates along the reconstructed pathway in yeast. Sl4CL is known to catalyze the conversion of p-coumaric acid to p-coumaroyl-CoA (19), and we observed that p-coumaric acid is an essential precursor for the production of (2) through the reconstructed pathway; thus, we hypothesized that p-coumaroyl-CoA is likely an intermediate of the pathway. A previous study reported that a group of methyltransferases from the salicylic acid benzoic acid theobromine (SABATH) enzyme family in maize is able to catalyze conversion of anthranilic acid to methyl anthranilate, a volatile methyl ester with potential function in plant defense (20). We hypothesized that SlMT2 may use an anthranilate analog from yeast native metabolism (as the pathway precursor) and catalyze its conversion to a methyl ester (as a pathway intermediate). By searching anthranilate-related yeast native metabolites in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, we identified 3-HAA, a primary metabolite involved in tryptophan metabolism, as a putative substrate for the SlMT2 methyltransferase and proposed the compound structure for the methyl ester (1) (Fig. 1A). We confirmed the compound structure of (1) with its chemical standard by retention time and tandem mass (MS/MS) spectrum (fig. S1C).

The data further support that (2) is the final product of the reconstructed pathway in yeast. Specifically, (2) may result from the condensation of the two identified intermediates, 3-HAA methyl ester (1) and p-coumaroyl-CoA, through the formation of an amide bond potentially catalyzed by SlCHS. However, direct condensation of the two intermediates would lead to a final m/z of 314.1023 ([M + H]+), whereas the final m/z we observed for (2) from yeast culture was m/z 316.1179 ([M + H]+). A native yeast ECR, encoded by TSC13, has been reported to reduce p-coumaroyl-CoA to p-dihydro-coumaroyl-CoA (21). We hypothesized that native Tsc13p activity in yeast may reduce p-coumaroyl-CoA to p-dihydro-coumaroyl-CoA and that SlCHS catalyzes the condensation of p-dihydro-coumaroyl-CoA with (1), leading to production of (2) (Fig. 1E).

To validate our hypothesis, we used CSY1302 (which harbors chromosomally integrated SlCHS, SlMT2, and ySl4CL) to engineer TSC13 knockout strains. As deletion of TSC13 inhibited cellular growth due to its essential role in fatty acid synthesis (22), we partially disrupted Tsc13p activity by inserting three consecutive stop codons at two-thirds the length of TSC13 coding sequence, resulting in strain CSY1304. The insertion of stop codons in TSC13 may lead to low activity through a low frequency of stop-codon readthrough, enabling very low expression of Tsc13p. Stop-codon readthrough has been reported in yeast, where readthrough efficiencies can be as high as 8% (23) and be induced by stress conditions (24). We also replaced TSC13 with heterologous ECR variants from Gossypium hirsutum (GhECR2) and Malus domestica (MdECR) that were reported to have low activity on p-coumaroyl-CoA (21), resulting in CSY1305 (TSC13::GhECR2) and CSY1306 (TSC13::MdECR), respectively. We cultured CSY1304 to CSY1306 in synthetic complete media supplemented with 100 M p-coumaric acid and 100 M 3-HAA methyl ester for 72 hours at 30C and analyzed the yeast culture media for production of (1) and (2) on LC-MS/MS by multiple reaction monitoring (MRM) detection. Partial disruption of the native yeast ECR Tsc13p (CSY1304) resulted in a 40% reduction in production of (2), while replacement of Tsc13p with heterologous ECR variants (CSY1305 and CSY1306) resulted in the absence of production of (2) and the presence of a previously unknown compound (3), with an expected m/z of 314.1 ([M + H]+) corresponding to the oxidized form of (2) (Fig. 1, F and G). The compound identities of (2) and (3) were validated by comparing the retention times and MS/MS spectrums to those of the chemical standards (fig. S1, D and E). The results suggest that the yeast native enzyme participated in the tomato cluster activity and produced a derivative product (2); we eliminated this interference by knocking out the yeast native gene TSC13, thereby restoring the true product (3) resulting from the minimal gene cluster (Sl4CL, SlMT2, and SlCHS).

On the basis of our in vivo functional characterization of SlMT2, the methyltransferase recognizes yeast native 3-HAA as a substrate. According to the KEGG pathway database, 3-HAA is involved in central metabolism, i.e., tryptophan metabolism, and the metabolite is also present in tomato. Since no previous studies have been reported on the functional roles of SlMT1 and SlMT2, we investigated the activities of the methyltransferases on hydroxycinnamic acids, amines, and anthranilic acids by feeding these substrates to yeast engineered to express these methyltransferases. Among the three methyltransferases predicted in the tomato cluster, SlMT3 has been reported to catalyze the methylation of salicylic acid (19). SlMT1 and SlMT2 showed high protein sequence similarity to SlMT3 (78.12 and 81.42%, respectively), indicating that they may similarly exhibit activity on salicylic acid. In addition, the three methyltransferases were initially predicted as tailoring enzymes to modify p-coumaric acid and other moieties of hydroxycinnamic acids, contributing to the production of hydroxylated naringenin chalcone and/or methyl esters of hydroxylated naringenin chalcone in tomato flavonoid metabolism (11).

We tested the activity of SlMT1/2/3 toward a variety of candidate substrates in yeast, including hydroxycinnamic acids (cinnamic, p-coumaric, caffeic, and salicylic acids), trace amines (tyramine, tryptamine, octopamine, dopamine, and serotonin), and anthranilic acid analogs (3-HAA and p-aminobenzoic acid). Low-copy plasmids encoding the expression of SlMT1/2/3 or inactive ccdB (negative control) were transformed into the wild-type yeast strain (CEN.PK2). The transformed yeast strains expressing one of the methyltransferases (or negative control protein) were cultured in synthetic dropout media fed with 100 M of each substrate candidate for 72 hours at 30C. The resulting yeast media was analyzed on qToF-MS for total ion scan, and the methylation products were evaluated by analyzing differential peaks detected from the transformants compared to the negative control. A methylation product is counted if the m/z ([M + H]+) of a differential peak (between the sample and the negative control) qualifies a putative methylated product catalyzed from the substrate. Among all the potential substrates tested, SlMT1 and SlMT2 exhibited detectable activities toward 3-HAA, p-coumaric acid, and p-aminobenzoic acid (a primary metabolite that shares similar functional groups with 3-HAA), and SlMT3 exhibited detectable activity only toward 3-HAA. The highest level of the methylation product was observed when supplying 3-HAA to SlMT2 (Fig. 2). Among the three methyltransferases, SlMT3 showed the lowest production of the methylation product from 3-HAA, and the methylation products catalyzed from p-coumaric acid and p-aminobenzoic acid were not detected in our assay. None of SlMT1/2/3 showed detectable activity toward salicylic acid in the context of the yeast-based feeding assay. We hypothesized that either salicylic acid was not efficiently transported into yeast cells due to previously reported antagonism between salicylic acid and d-glucose (25) or the volatile salicylate methyl ester product may have evaporated. Our results indicate that all three methyltransferases (SlMT1/2/3) showed the highest activity toward 3-HAA (among the fed substrates tested) and that SlMT2 led to the highest production of 3-HAA methyl ester in the yeast-based feeding assay.

Relative production of methylation products was calculated as a percentage of the highest production by SlMT2 from substrate 3-HAA: 100% corresponds to the concentration of 3-HAA methyl ester (146 M) catalyzed from yeast endogenous 3-HAA and 100 M 3-HAA fed to yeast culture medium. Compounds not detected were crossed out. Data show the mean and SD of three biologically independent replicates.

Our in vivo characterization results of the minimal gene cluster (Sl4CL, SlMT2, and SlCHS) indicate that SlCHS can potentially catalyze the condensation of p-coumaroyl-CoA and 3-HAA methyl ester, leading to the formation of a nitrogen-carbon (amide) bond. To our knowledge, this study is the first report of amide formation by CHS, which canonically catalyzes Claisen condensation (carbon-carbon bond formation) (26).

We further examined the amide bond catalytic activity of SlCHS by expressing SlCHS recombinantly in Escherichia coli, purifying the enzyme, and characterizing its activities via in vitro enzymatic assays. SlCHS activity was examined with both its canonical substrates (malonyl-CoA and p-coumaroyl-CoA) and the substrates identified in the context of the minimal tomato gene cluster (3-HAA methyl ester and p-coumaroyl-CoA). The reactions were performed by incubating 4 g of purified enzyme with 200 M malonyl-CoA or 3-HAA methyl ester and 200 M p-coumaroyl-CoA for 4 hours and analyzed on LC-MS/MS by MRM detection. For SlCHS canonical activity characterization, we observed spontaneous conversion of naringenin chalcone to naringenin under the in vitro reaction conditions, and we confirmed the production of naringenin by comparing the resultant peak with an authentic standard of naringenin (fig. S2A). We observed the production of (3) when 3-HAA methyl ester was added to the reaction mixture by comparing the peaks with a chemically synthesized standard of (3). The chemical standard of (3) yielded a single peak when dissolved in water (retention time, 6.872 min) but resulted in a secondary peak (retention time of 7.484 min) when dissolved in acidic methanol (fig. S2B). The secondary peak was also detected in acidic methanol-quenched in vitro reaction mixtures, from which the detection of (3) is expected. A previous study compared nonenzymatic and chalcone isomerasecatalyzed conversion of chalcone to flavanone and the pH dependence of this reaction (27). We hypothesized that the secondary peak could result from an isomerized form of (3), similar to the isomerization process of converting naringenin chalcone to naringenin, possibly formed during the in vitro reaction. Together, these results validate that SlCHS is capable of amide formation.

We next examined whether the amide synthesis interferes with the canonical activity. We performed an in vitro reaction with SlCHS under similar conditions but incubated equimolar amounts (200 M) of 3-HAA methyl ester and malonyl-CoA with 200 M p-coumaroyl-CoA. Analysis of the reaction products showed an 85% decrease in production of (3) (Fig. 3, reactions 2 and 3) and 6% decrease in production of naringenin (Fig. 3, reactions 1 and 3). The results suggest that 3-HAA methyl ester is likely competing with malonyl-CoA for a p-coumaroyl starter molecule at the SlCHS active site, indicating that SlCHS could use the same active site for amide formation as for Claisen condensation.

+/ indicates the presence/absence of 200 M p-coumaroyl-CoA, 200 M 3-HAA methyl ester, 200 M malonyl-CoA, or 4 g of purified SlCHS protein. MRM (314.1 147.0) and MRM (273.0 152.8) detect the production of coumaroyl anthranilate amide (3) and naringenin, respectively. The ion counts are normalized by the highest ion count across reaction (rxn) 1 to 5 by each column; SD shows the percentage error among two independent replicates. Enzyme abbreviation: SlCHS, naringenin chalcone synthase.

We next investigated whether SlCHS exhibited a substrate specificity toward 3-HAA methyl ester for amide synthesis. We incubated SlCHS with 200 M anthranilic acid analog and 200 M p-coumaroyl-CoA with similar in vitro reaction conditions, and the reaction mixture was analyzed on LC-MS/MS by product ion scan with a precursor ions set to match the m/z of expected condensation products. We tested numerous anthranilic acid analogs in this assay, including 3-HAA methyl ester, 2-amino-3/4/5-methoxybenzoic acid, 3-HAA, 2-amino-5-hydroxybenxoic acid, 3-hydroxybenzoic methyl ester, and anthranilic acid. Analysis of the m/z ([M + H]+) of the expected product for each substrate indicated product peaks with 3-HAA methyl ester, 2-amino-5-methoxybenzoic acid, and 3-hydroxybenzoic methyl ester, among which 3-HAA methyl ester yielded more than 15-fold and 49-fold higher product ion detected than those of 2-amino-5-methoxybenzoic acid and 3-hydroxybenzoic methyl ester, respectively (fig. S2C). In contrast, no amide product was observed when 3-HAA and anthranilic acid, which share a very similar molecular structure with 3-HAA methyl ester, were included in the reaction mixture. A trace amount of a possible ester product was observed when 3-hydroxybenzoic methyl ester was included as a substrate. The observed substrate preferences of SlCHS on the panel of anthranilic acid analogs tested indicate that methylation on the carboxyl group of the anthranilate may facilitate substrate access to the SlCHS active site and that SlCHS exhibits a high substrate preference toward 3-HAA methyl ester.

Last, we examined whether the observed amide synthesis activity was specific to the CHS variant from tomato (SlCHS). Specifically, we performed in vitro reaction assays with the CHS variant from Arabidopsis (AtCHS). AtCHS was recombinantly expressed in E. coli and purified, and its activities on malonyl-CoA and 3-HAA methyl ester were analyzed under the same assay conditions as were used for SlCHS. AtCHS exhibits identical patterns of catalytic activity and substrate preferences as SlCHS in vitro, i.e., highest production of amide with 3-HAA methyl ester, trace amounts of amide production with 2-amino-5-methoxybenzoic acid, and ester production with 3-hydroxybenzoic methyl ester (fig. S2, D and E). Together, the results indicate that the amide synthesis activity observed in SlCHS is not unique to this variant and could be a common secondary function in plant CHS enzymes.

Type III PKSs are characterized by a conserved cysteine-histidine-asparagine catalytic triad, which corresponds to C164-H303-N336 in SlCHS. For canonical synthesis of naringenin chalcone, C164 and H303 form an imidazolium ion pair, which initiates a nucleophilic attack on the thioester carbonyl of p-coumaroyl-CoA that completes acyl transfer onto C164 (28). H303 and N336 coordinate the orientation of the incoming malonyl-CoA moieties during the process of iterative decarboxylation and condensation of the extender malonyl-CoA molecules in formation of the polyketide intermediate. In addition, F215 is an important gatekeeper residue that is reported to separate the CoA-binding tunnel from the active site cavity and help with folding and internal orientation of the tetraketide intermediate (2830). On the basis of our in vitro assay results, we hypothesized that SlCHS is likely to use the same active site for amide synthesis as for naringenin chalcone synthesis. We therefore investigated the catalytic mechanism of amide bond formation by examining the roles of these active site residues that are important for SlCHS canonical activity.

We first evaluated which residues could potentially interact with 3-HAA methyl ester and use the substrate for amide formation. We built a homology model for SlCHS using Phyre2 (31) and simulated the docking of 3-HAA methyl ester to the homology model structure using AutoDock Vina (32). The simulation shows that 3-HAA methyl ester favorably docks at the SlCHS active site, potentially interacting with H303, N336, and G305 by hydrogen bonding (Fig. 4A, fig. S3A). As a comparison, we simulated the docking of the canonical substrate malonyl-CoA to the SlCHS active site (fig. S3B), which shows that the substrate 3-HAA methyl ester is much smaller in size (molecular weight, 153 versus 854) than the canonical substrate and therefore can readily dock at the active site cavity.

(A) Docking of (1) to SlCHS active site. Dotted line, hydrogen bond interaction. (B to D) Production of (3) and naringenin chalcone in yeast by SlCHS for C164, H303, N336, and G305 mutants (B); F215 mutants (C); and distal [~10 within docking site of (1)] residue mutants (D). Data show the mean of two biologically independent replicates with error bar indicating the SD. Unpaired two-tailed t test was performed between each variant and the parent for production of (3): **P < 0.01 and ***P < 0.001 (D). Compound name: (1), methyl 3-hydroxyanthranilic acid; (3), coumaroyl anthranilate amide. Enzyme abbreviation: SlCHS, naringenin chalcone synthase.

On the basis of the results of the docking simulation, we first investigated the roles of C164, H303, N336 (canonical catalytic triad residues), and G305 on amide synthesis. We created a SlCHS knockout strain (CSY1307) by replacing the full sequence of SlCHS with three consecutive stop codons in CSY1305 (which harbors chromosomally integrated Sl4CL, SlMT2, SlCHS, and TSC13::GhECR2). Low-copy plasmids encoding SlCHS point mutants (C164A, C164S, H303A, N336A, and G305A) were constructed and transformed into CSY1307. Transformed CSY1307 strains harboring individual SlCHS mutants were cultured in synthetic dropout media supplemented with 100 M p-coumaric acid and 100 M 3-HAA methyl ester for 72 hours at 30C. Yeast culture media was analyzed for production of naringenin chalcone and (3) on LC-MS/MS by MRM detection. C164A, C164S, and H303A mutants completely eliminated both the canonical activity and the amide synthesis activity (Fig. 4B). The N336A mutant completely abolished naringenin chalcone production but resulted in an increase in the production of (3) compared to the wild-type variant, whereas the G305A mutant abolished canonical activity but exhibited only trace amounts of amide formation. The results indicate that C164 and H303 are essential for both canonical and amide synthesis, which is expected as these two residues are responsible for the loading of p-coumaroyl-CoA. The C164S mutant confirms the importance of the thiol group of cysteine for forming the imidazolium ion pair with H303 to activate acyl transfer through nucleophilic attack during loading of p-coumaroyl-CoA onto C164. Although N336 is essential for canonical activity for binding of extender malonyl-CoA, it does not contribute to binding of 3-HAA methyl ester to the active site. This result is further supported by an uninterrupted docking of 3-HAA methyl ester to the active site of a N336A mutant homology model using AutoDock Vina (fig. S3C). The increase in production of (3) observed from the N336A mutant relative to the parent enzyme is likely due to a lack of competition between 3-HAA methyl ester and malonyl-CoA for the p-coumaroyl starter moiety at the active site of the N336A mutant. Last, the removal of amide and canonical activities observed in the G305A mutant suggests that G305 potentially performs a stabilizing role in anchoring 3-HAA methyl ester (as predicted by the docking simulation) and malonyl-CoA during their respective condensation reactions.

We next examined potential effects of F215 on amide formation (Fig. 4C). We tested different mutants of the residue to conserve either the ring structure (F215W, F215Y, and F215H) or spatial occupancy (F215I) of the residue side chain. Low-copy plasmids encoding SlCHS mutants (F215A, F215W, F215Y, F215H, F215C, and F215I) were each transformed into CSY1307. The transformed CSY1307 strains were cultured under identical conditions, and production of naringenin chalcone and (3) was analyzed on LC-MS/MS by MRM detection. All F215 mutants except F215W completely abolished the canonical activity, where F215W maintained only 5% naringenin chalcone production as compared to the wild-type variant (Fig. 4C). The results support the previously proposed role of F215 in orienting malonyl-CoA and polyketide intermediates at the active site (29, 30). We also observed that all mutants except F215W led to 70% reduction in production of (3), while F215W maintained 90% production of (3) compared to the wild-type variant (Fig. 4C). The results suggest that the ring structure of residue 215 in wild-type and the F215W mutant may assist in orienting 3-HAA methyl ester at the active site to facilitate amide formation. However, the ring structure itself in the residue is not sufficient for 3-HAA methyl ester binding since decreased production of (3) was observed in F215Y and F215H (which conserved the ring structure); instead, spatial occupancy (F215I) by the residue may also contribute to substrate selection. Furthermore, reduced production of (3) observed in the F215Y and F215H mutants could result from a poorly oriented residue side chain shielding the active site, thus preventing the access of 3-HAA methyl ester to C164-bound p-coumaroyl moiety. We also scanned for the production of pyrone derivatives bis-noryangonin (BNY) and 4-coumaroyltriacetic acid lactone (CTAL), the former a triketide and the latter a tetraketide early-released derailment by-product (29, 33), by F215 mutants in yeast culture media. We observed proportional levels of CTAL production compared to that of naringenin chalcone and no detectable levels of BNY production (fig. S4A). The results suggest that inhibited production of (3) by F215 mutants is unlikely due to pyrone by-product accumulation at the SlCHS active site. In summary, the results indicate that although F215 likely performs a specific structural role in orienting malonyl-CoA during extension of the polyketide intermediate in canonical activity, its function is less specific for selecting 3-HAA methyl ester as a substrate.

Last, we investigated the potential effects of nonspecific binding by 3-HAA methyl ester to SlCHS protein. We mutated nine residues (T132A, S133A, S339A, S339T, I193A, T194A, L267A, V271A, and P272A) within ~10 of the 3-HAA methyl ester docking site and analyzed the effects of these mutations on production of (3) in yeast (Fig. 4D and fig. S3D). CSY1307 strains transformed with the mutants encoded on low-copy plasmids were cultured under identical conditions, and production of naringenin chalcone and (3) was analyzed on LC-MS/MS by MRM detection. The results showed that most of the nine tested residues did not show statistically significant effects on production of (3), except for S339A, T194A, and P272A (Fig. 4D). S339A completely abolished SlCHS activity, and the two distal residue mutants (T194A and P272A) significantly improved SlCHS activity for production of (3). Since S339 is located at a loop structure near the SlCHS active site, the mutation may have interrupted the correct folding of the active site cavity and therefore disrupted both naringenin chalcone and amide synthesis. Removal of the two distal residues (T194A and P272A) may have altered the entrance geometry of the active site cavity, which facilitated the access of 3-HAA methyl ester to the active site and therefore increased production of (3). Similarly, fluctuations in the production of naringenin chalcone observed among the mutants could be caused by an altered geometry around the active site, which affected the access of p-coumaroyl-CoA or malonyl-CoA to the active site.

The results of the site-directed mutagenesis studies suggest that SlCHS uses the same active site for canonical and amide synthesis. We performed in vitro enzymatic assays to further investigate the kinetic properties of SlCHS on 3-HAA methyl ester. Kinetic assays were performed by incubating purified SlCHS with p-coumaroyl-CoA and varying concentrations of 3-HAA methyl ester (0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.4, 0.8, 1.6, 3, 5, 10, and 15 mM). The reactions were stopped at different time points, and reaction products were analyzed on LC-MS/MS to derive the kinetic curve (Fig. 5A and fig. S5A). The kinetic data show that the amide synthesis has a Km (Michaelis-Menten constant) of 3.06 mM and a Vmax of 14.47 nM min1, resulting in a kcat of 0.362 min1 and kcat/Km of 1.18 104 M1 min1 under the in vitro reaction conditions (Fig. 5A). As a comparison, we performed in vitro enzymatic assays to characterize the kinetic properties of SlCHS canonical activity by incubating purified SlCHS with p-coumaroyl-CoA and varying concentrations of malonyl-CoA (0, 5, 50, 100, 200, 300, and 500 M). The canonical synthesis of naringenin chalcone has a Km of 21.34 M and Vmax of 11.32 nM min1, resulting in a kcat of 0.0943 min1 and kcat/Km of 4.42 103 M1 min1 (fig. S5B). The results show a 143-fold difference between SlCHSs Km for 3-HAA methyl ester and malonyl-CoA, indicating that the enzyme has a much higher affinity for malonyl-CoA than for 3-HAA methyl ester. The results also show a 37-fold higher catalytic efficiency (kcat/Km) of SlCHS for synthesis of naringenin chalcone than for that of amide. Together, the results indicate that amide synthesis is likely to be a less efficient secondary function of SlCHS.

(A) Kinetic characterization of SlCHS synthesis of coumaroyl anthranilate amide (3). (B) Kinetic characterization of SlCHS synthesis of naringenin chalcone, inhibited with 0, 3, or 5 mM 3-HAA methyl ester. (C) Proposed inhibition mechanisms of 3-HAA methyl ester to SlCHS canonical activity. E, enzyme (SlCHS); EC, enzyme-coumaroyl complex; I, inhibitor (3-HAA methyl ester); ECI, enzyme-coumaroyl-inhibitor complex; CAA, coumaroyl anthranilate amide; M, malonyl-CoA; ECM, enzyme-diketide complex; ECM2, enzyme-triketide complex; ECM3, enzyme-tetraketide complex; NC, naringenin chalcone; ECMI, enzyme-diketide-inhibitor complex; ECM2I, enzyme-triketide-inhibitor complex; ECM3I, enzyme-tetraketide-inhibitor complex. Equation notations: v0, initial velocity; Vmax, maximal velocity; Km, Michaelis-Menten constant; S, substrate (i.e., malonyl-CoA); Kc, competitive inhibition coefficient; Ku, uncompetitive inhibition coefficient; n, Hill coefficient that simulates cooperativity effect by sequential binding of malonyl-CoA to the coumaroyl-bound enzyme complex. (D and E) Analysis on mode of inhibition by 3 mM (D) and 5 mM (E) 3-HAA methyl ester. Eq. 1, no inhibition; Eq. 2, competitive inhibition; Eq. 3, uncompetitive inhibition; Eq. 4, mixed-type inhibition. Data show the mean of two independent replicates, with error bar indicating the SD.

We next examined the mechanism of 3-HAA methyl ester inhibition of SlCHS canonical activity. Kinetic assays were performed by incubating purified SlCHS with p-coumaroyl-CoA and varying concentrations of malonyl-CoA (5, 50, and 100 M) as the substrate and 3-HAA methyl ester (0, 3, and 5 mM) as the inhibitor. The reactions were stopped at different time points, and reaction products were analyzed on LC-MS/MS to derive the kinetic curve for each inhibitor concentration (Fig. 5B). For the purpose of curve-fitting, only malonyl-CoA was considered as the substrate, since the reactions were performed under saturated concentrations of p-coumaroyl-CoA (200 M). We first fit all data points (measured under 0, 3, and 5 mM inhibitor) to Eq. 1 (Fig. 5, B and C). By tuning the Hill coefficient, we observed that root mean square error (RMSE) is minimized for data points of 0 mM when n = 1, for data points of 3 mM when n = 1.7, and for data points of 5 mM when n = 1.5 (Fig. 5B and table S2A). The curve-fitting results suggest that the effects of cooperativity emerge only when inhibitors are present.

We then fit the data points taken under 3 and 5 mM inhibitors to competitive (Fig. 5, Eq. 2), uncompetitive (Fig. 5, Eq. 3), or mixed-type (Fig. 5, Eq. 4) inhibition modes to interpret inhibition coefficients (Kc for competitive inhibition and Ku for uncompetitive inhibition) by fixing the values for Km and kcat at those obtained at 0 mM inhibitor (Fig. 5, C to E). Here, we used the Hill coefficient n to represent the effect of cooperativity resulting from sequential binding of three molecules of malonyl-CoA to coumaroyl-bound enzyme complex. For the data points obtained under 3 and 5 mM inhibitors, we observed minimization of RMSE with the mixed-type inhibition model, and the best fits were obtained at n = 1.7 and 1.5 for 3 and 5 mM inhibitors, respectively (Fig. 5, D and E, and table S2D). For 3 mM inhibitor, Kc = 0.377 mM and Ku = 1.01 mM (Ku/Kc = 2.67). For 5 mM inhibitor, Kc = 0.341 mM and Ku = 0.897 mM (Ku/Kc = 2.63). Together, the results indicate that inhibition is dominated by competitive mode in both cases with a shift from competitive to uncompetitive mode as inhibitor concentration increases from 3 to 5 mM.

Last, we investigated the production of pyrone derivatives BNY and CTAL by SlCHS when inhibited by 3-HAA methyl ester. We scanned for BNY and CTAL production from reaction mixtures fed with 100 M malonyl-CoA; 100 M p-coumaroyl-CoA; and 0, 3, or 5 mM 3-HAA methyl ester inhibitor at the end of the kinetic assay time course. We detected proportional levels of CTAL production compared to that of naringenin and no detectable levels of BNY production (fig. S4, B and C). The results suggest that 3-HAA is unlikely to promote the release of derailment by-products due to early termination in extension and/or cyclization during polyketide synthesis.

We leveraged a yeast biosynthesis platform to characterize the activity of a computationally predicted biosynthetic gene cluster from tomato, which led to the discovery of a previously undocumented HCAA compound and the potential of CHS for nitrogen-carbon bond synthesis. The HCAA compound is generated by the condensation of a hydroxycinnamic acid moiety and anthranilic acid moiety through formation of an amide bond. We showed that one of the substrates for HCAA production in yeast was 3-HAA methyl ester, which was converted from the native metabolite, 3-HAA, by each of the three methyltransferases in the predicted tomato gene cluster. Among the methyltransferases, SlMT2 exhibited the highest activity toward 3-HAA in yeast. Through systematic mutagenesis, in vivo activity screens, and in vitro substrate competition assays, we showed that SlCHS uses the same active site for its canonical naringenin chalcone synthesis activity to catalyze the condensation of 3-HAA methyl ester and p-coumaroyl-CoA, leading to the production of coumaroyl anthranilate amide (3). To our knowledge, this is the first report of a type III PKS enzyme exhibiting amide bond formation activity. In vitro kinetic assays indicate that SlCHS catalyzes the formation of (3) with a Km of 3.06 mM for 3-HAA methyl ester.

To examine the catalytic mechanism of CHS for HCAA synthesis, we referred to mechanisms of other classes of enzymes that catalyze similar reactions. Specifically, the acyl-CoA N-acyltransferases are a category of benzylalcohol acetyl-, anthocyanin-O-hydroxy-cinnamoyl-, anthranilate-N-hydroxy-cinnamoyl/benzoyl-, deacetylvindoline (BAHD) acyltransferases that catalyze the formation of HCAA in plants (3441) and share a conserved HXXXDG domain, positioned near the center of the enzyme (38). A histidine residue in the HXXXDG motif deprotonates the oxygen or nitrogen atom on the corresponding acceptor substrate, thereby allowing a nucleophilic attack on the carbonyl carbon of the CoA thioester and leading to the formation of a tetrahedral intermediate between the CoA thioester and acceptor substrate (39). The intermediate is reprotonated to release the free CoA and the acylated ester or amide. The aspartic acid residue in the conserved motif plays a structural rather than catalytic role by forming a salt bridge with a conserved arginine residue (39). Another family of enzymes, arylamine N-acetyltransferases (NATs), catalyzes a similar reaction that transfers an acetyl group from acetyl-CoA to the terminal nitrogen group of an arylamine substrate (42). The reaction is catalyzed by a cysteine-histidine-aspartic acid catalytic triad and is initiated by nucleophilic attack of the carbonyl group on acetyl-CoA by cysteine, activated by the histidine residue likely through formation of a thiolate-imidazolium ion pair (43, 44). The incoming arylamine attacks the carbonyl group bound to cysteine in forming a tetrahedral intermediate, with a general base deprotonating the amine group. Similarly to BAHD acyltransferases, it has been suggested that the deprotonation in NATs is assisted by the histidine residue in the catalytic triad (43). The aspartic acid residue was proposed to form a low-barrier hydrogen bond with the histidine residue to increase the basicity of the histidine for cysteine activation (43).

The catalytic mechanisms for BAHD acyltransferases and NATs suggest the potential roles of histidine at the SlCHS catalytic triad (C164-H303-N336) in (i) cysteine activation before nucleophilic attack of the carbonyl group of p-coumaroyl-CoA and (ii) deprotonating the incoming amine nucleophile in formation of a tetrahedral intermediate bound to cysteine. Previous studies on CHS catalytic mechanisms support (i) that H303 and C164 form a thiolate-imidazolium ion pair, which facilitates the nucleophilic attack of the thiolate anion on the thioester carbonyl of p-coumaroyl-CoA, resulting in transfer of the acyl moiety to C164 (28). Our in vivo mutagenesis data indicate that C164 and H303 are critical for canonical and amide synthesis. Therefore, it is likely that the mechanism for cysteine activation and acyl transfer is conserved for amide formation (fig. S6, A and B). In the next step, incoming 3-HAA methyl ester forms a covalent bond with the coumaroyl moiety bound to C164 by nucleophilic attack of the amine group on the carbonyl group of the coumaroyl moiety, leading to formation of a tetrahedral intermediate (fig. S6, C and D). The positively charged amide is then deprotonated by an unidentified general base (fig. S6, D and E), followed by release of the amide product (fig. S6F). H303 may play the role of the unidentified general base in deprotonating the incoming amine nucleophile as suggested for NATs (43); however, this process requires H303 to be regenerated (deprotonation of the imidazolium) after accepting a proton from a thiol group upon acyl transfer from p-coumaroyl-CoA to cysteine, the exact mechanism for which was not determined in this study.

Prior studies on CHS activity reported that mutations in an active site residue (F215) and acidification of in vitro reaction mixtures before extraction can lead to an increase in production of BNY and CTAL (29). In this work, we observed proportional levels of CTAL production compared to that of naringenin chalcone and no detectable levels of BNY production from CHS in vitro reaction mixtures. We also did not observe increases in BNY and CTAL from the F215 mutants expressed in yeast, in contrast to previously reported in vitro characterization of F215 mutants (29). The study reported the production of BNY from F215A and F215H mutants and CTAL from F215Y mutant, where BNY production was maximized at pH 7.0, and CTAL production was prominent within a pH range of 6.0 to 6.5 (29). The absence of detectable BNY and CTAL production by F215 mutants in our work may be due to differences in characterization conditions, i.e., yeast versus in vitro, and specifically may be due to the acidic pH 5.8 of yeast synthetic complete media. The observation also indicates that inhibited production of (3) observed with F215 mutants is not likely due to pyrone by-product accumulation at the CHS active site.

We observed that CHS exhibits catalytic promiscuity by catalyzing the synthesis of two different families of compounds: polyketide through its canonical activity and HCAA through the secondary activity characterized here. The syntheses of other HCAA compoundse.g., p-coumaroyltyramine, p-coumaroyldopamine, and feruloyldopamineby hydroxycinnamoyl-CoA:tyramine N-hydroxycinnamoyl transferase (THT), have been reported in tomato for defense against bacterial and fungal pathogens (45, 46). There is currently limited evidence to support that this secondary activity of CHS may be adapted by the plant host for HCAA synthesis, considering that the secondary activity shows ~40-fold lower efficiency (kcat/Km) compared to the canonical activity. However, this catalytic promiscuity may indicate a starting point for evolution of the enzyme to become an alternative route for HCAA compound production (47). For example, future work can compare the amine substrate specificity of both THT and CHS for HCAA synthesis, which may indicate an evolutionary advantage of CHS to catalyze hydroxycinnamoyl anthranilate-type HCAA if CHS shows higher activity toward anthranilic acid analogs than THT. Additional future work may focus on validating a role of the gene cluster in the native host by knocking out individual genes in tomato and performing metabolomics to search for metabolites that may be associated with the gene cluster. However, if the genes in the cluster are associated with a cryptic pathway, identification of a proper elicitor treatment would be required to induce the silent gene cluster and production of the target compound(s) in the host.

As more than 1000 putative plant gene clusters have now been predicted via computational tools (7, 911), future advances that further streamline high-throughput characterization workflows will be critical to characterizing activities encoded within these clusters. For example, future efforts may develop systematic criteria to prioritize gene clusters for yeast-based characterization and reliable high-throughput metabolite screening methods to accelerate the exploration of previously unidentified chemical space. Parallel genomic integration of multiple gene clusters can be facilitated by multiplexed CRISPR technology (48). Yeast harboring multiple gene clusters can then be screened for compound production using high-precision metabolomics, where improved computational workflows for untargeted metabolomics analysis can enable more efficient identification of novel low-abundance metabolites to distinguish robustly from background metabolite profiles. Thus, the integration of computational plant genome analysis, yeast-based heterologous pathway expression, and advances in analytics will allow for the streamlined characterization and discovery of biosynthetic routes that may be difficult to uncover in planta.

DNA sequences for heterologous biosynthetic enzymes were codon-optimized to improve expression in S. cerevisiae using GeneArt GeneOptimizer software (Thermo Fisher Scientific, Waltham, MA) and were synthesized as gene fragments (Twist Bioscience, San Francisco, CA). For guide RNA (gRNA)/Cas9 plasmids, 20base pair (bp) gRNAs targeting the genomic site were synthesized as primers (TSC13 gRNA1: AACAGCTCAAATGTACGCAT; TSC13 gRNA2: ATAACTTAGCATTCCCAAAG; SlCHS gRNA: TGTTGGTACATCATCAATCT), overlap polymerase chain reaction (PCR)amplified with tRNA promoter/hepatitis delta virus (HDV) ribozyme PCR fragment (pCS3411), trans-activating CRISPR RNA (tracrRNA)/terminator PCR fragment (pCS3414), and cloned into a SpCas9 expression vector with G418 resistance (pCS3410) via Gibson assembly (49).

Plasmids for protein expression in E. coli were constructed by inserting DNA fragments encoding At4CL, SlCHS, and AtCHS into pET28 vector via Gibson assembly, for which the PCR-amplified pET28 vector backbone and the protein inserts share a 40base pair (bp) overhang at both ends of the linear DNA components. Plasmid encoding the parent SlCHS protein in the site-directed mutagenesis study was constructed using Gibson assembly. The plasmid vector (pCS3305) was digested by restriction enzymes Xba I and Xho I, and the SlCHS gene insert was amplified from a gene fragment.

Plasmids for single amino acid mutant variants were constructed either via Gibson assembly or blunt-end ligation. For the Gibson assembly method, primers encoding the single amino acid substitution were used to amplify the parent plasmid and the linear DNA product. The linear DNA product contained a 15-bp overlap between its 5 and 3 ends and was annealed by Gibson assembly. For blunt-end ligation method, a primer pair without overhang was used to amplify the parent plasmid, and the 5 primer encodes the single amino acid substitution. The linear DNA product is then incubated with T4 nucleotide kinase [New England Biolabs (NEB), Ipswich, MA] at 37C for 30 min and subsequently with T4 DNA ligase (NEB, Ipswich, MA) at room temperature for 2 hours.

All the primers in this work were synthesized by the Stanford Protein and Nucleic Acid Facility (Stanford, CA). PCR amplifications were performed with Q5 High-Fidelity DNA polymerase (NEB, Ipswich, MA), and PCR products were purified using the DNA Clean and Concentrator Kit (Zymo Research, Irvine, CA). Plasmids generated in this work are listed in table S3.

The chemical standard for methyl 3-hydroxy-2-(3-(4-hydroxyphenyl)propanamido)benzoate [dihydro-coumaroyl anthranilate amide (2)] and (E)-methyl 3-hydroxy-2-(3-(4-hydroxyphenyl)acrylamido)benzoate [coumaroyl anthranilate amide (3)] was purchased from Toronto Research Chemicals (Canada). Methyl 2-amino-3-hydroxybenzoate [3-HAA methyl ester (1)] was purchased from Apollo Scientific (UK). p-Coumaric acid, malonyl-CoA, 3-HAA, methyl 3-hydroxybenzoate, 2-amino-3-methoxybenzoic acid, 2-amino-4-methoxybenzoic acid, 2-amino-5-methoxybenzoic acid, 2-amino-5-hydroxybenzoic acid, and 2-anthranilic acid were purchased from Sigma-Aldrich (St. Louis, MO). Naringenin chalcone was purchased from Biosynth Carbosynth (USA). Naringenin was purchased from MedChemExpress (USA). p-coumaroyl-CoA standard was purchased from PlantMetaChem (Germany).

Yeast strains used in this study are listed in table S3. All yeast strains are haploid, derived from CEN.PK2-1D (50) (MAT URA3-52, TRP1-289, LEU2-3/112, HIS31, MAL2-8C, and SUC2), referred to as CEN.PK2. Genes in the predicted tomato cluster were codon-optimized and assembled with corresponding promoter/terminator fragments and integrated into pYES1L (Life Technologies, Carlsbad, CA). To create the minimal pathway strain, the pathway genes (SlCHS, Sl4CL, and SlMT1/2/3) were first cloned into pAG414-GDP1p/ADHt, pAG414-PGK1p/PHO5t, pAG414-PYK1p/MFA1t, or pAG414-TEF1p/CYC1t expression vector with Gibson assembly, and the linear DNA fragment for each pathway gene expression cassette with 30-bp overlap between each fragment was PCR-amplified from the pAG vectors, assembled, and integrated into YMR206W:: locus with SpHIS5 selection marker.

TSC13 and SlCHS knockout strains were created by CRISPR-Cas9 genome editing method as previously described (51). The linear DNA repair templates were PCR-amplified and harbor a 30- to 45-bp overlap with the target genomic site. Two hundred nanograms of gRNA/SpCas9 plasmid and 500 ng of linear DNA template were cotransformed into yeast competent cells prepared from the Frozen-EZ Yeast Transformation II Kit (Zymo Research, Irvine, CA), as described in the Yeast strain construction and transformation section. Colonies picked from G418 plate after 3 days were screened for metabolite production.

For yeast transformations, a single colony of the parent strain was inoculated in yeast peptone with 2% dextrose (YPD) media and incubated overnight at 30C and 220 rpm. The saturated overnight culture was then diluted 50-fold in fresh YPD media and incubated for 4 to 6 hours. Cells (2.5 ml) were used per transformation. The cells were then harvested by centrifugation at 3500g for 4 min and prepared for transformation using the Frozen-EZ Yeast Transformation II Kit (Zymo Research, Irvine, CA). For plasmid transformations, 50 ng of DNA was used per transformation. The transformed cells were plated directly onto synthetic dropout agar plates after 45-min incubation with EZ3 solution. For Cas9-based chromosomal integrations, 100 ng of the Cas9 plasmid (encodes G418 resistance) and 500 ng of the linear DNA fragments were used per transformation, and the transformed cells were subject to a 2-hour recovery at 30C in YPD media after 45-min incubation with EZ3 solution. The cells were plated onto synthetic dropout plates supplemented with G418 (400 mg/liter) to select for colonies with successfully integrated constructs. The plate cultures were incubated 2 to 3 days before colonies were picked for metabolite production assays.

To screen for metabolite production, two or three colonies were inoculated for each strain (or transformed strain) into 400 l of synthetic complete or dropout media with 2% dextrose in 2-ml 96-well plates, grown for 16 to 20 hours to saturation, diluted at a 1:8 ratio into fresh media with corresponding feeding conditions, and grown for 72 hours at 25 or 30C, as indicated, before metabolite analysis of culture supernatant on LC-MS/qToF-MS.

For targeted metabolite production assays, 100 l of supernatant of yeast culture from 96-well plates was obtained by centrifugation at 4000g for 5 min. The sample was analyzed by an Agilent 1260 Infinity Binary high-performance LC (HPLC) paired with an Agilent 6420 Triple Quadrapole LC-MS, with a reversed-phase column (Agilent EclipsePlus C18, 2.1 50 mm, 1.8 m), water with 0.1% formic acid as solvent A, and acetonitrile with 0.1% formic acid as solvent B, at a constant flow rate of 0.4 ml/min and an injection volume of 5 l. The following gradient was used for compound separation: 0 to 6 min, 3 to 50% B; 6 to 9 min, 50 to 97% B; 9 to 10 min, 97% B; 10 to 10.5 min, 97 to 3% B; 10.5 to 11 min, equilibration with 3% B. The liquid chromatogram eluent was directed to the MS for 1 to 10 min with electrospray ionization (ESI) source in positive mode, gas temperature at 350C, gas flow rate at 10 liters/min, and nebulizer pressure at 50 psi. LC-MS data files were analyzed in Agilent MassHunter Workstation software. The liquid chromatograms and product ion scans were extracted either by specified precursor ion from total ion current or by MRM with ion transitions and related parameters specified in table S4. All the MRM transitions in this work were derived from product ion scan with specified precursor ion, and the most abundant product ion was chosen for MRM transition quantification. For each compound, production was quantified by integrating the peak area under the ion count curve. The ion counts were calibrated to a chemical standard curve and converted to measurements of titer (ng/ml or g/ml) and molar concentration (nM) for in vivo and in vitro assays, respectively.

For untargeted metabolite production assays, 200 l of yeast culture from 96-well plates was flash-frozen, lyophilized overnight, and dissolved in 100 l of 75% methanol (with 25% water) with 0.1% formic acid. The sample was analyzed by the Agilent 1260 Infinity Binary HPLC paired with an Agilent 6545 Quadrupole Time-of-Flight LC-MS, with a reversed-phase column (Agilent EclipsePlus C18, 2.1 50 mm, 1.8 m), water with 0.1% formic acid as solvent A, and acetonitrile with 0.1% formic acid as solvent B, at a constant flow rate of 0.6 ml/min and an injection volume of 1 l. The following gradient was used for compound separation: 0 to 0.40 min, 5% B; 0.40 to 8.40 min, 5 to 95% B; 8.40 to 10.40 min, 95% B; 10.40 to 10.41 min, 95 to 5% B; 10.41 to 12.00 min, 5% B. The liquid chromatogram eluent was directed to the MS for 1 to 12 min with ESI source in positive mode, gas temperature at 250C, gas flow rate at 12 liters/min, nebulizer pressure at 10 psig, Vcap at 3500 V, fragmentor at 100 V, skimmer at 50 V, octupole 1 RF Vpp at 750 V, and acquisition scan rate at 2.50 spectra/s.

SlCHS homology model was built using Phyre2 (31) from amino acid sequence, with 85% identity with template c1cml chain A from Protein Data Bank. Docking simulation was performed by AutoDock Vina (32), and docking results were visualized using PyMOL. Geometry optimizations of substrate structures before docking simulations were conducted using Gaussian 16 (DFT, B3LYP, and LANL2DZ).

Protein expression plasmids were transformed into E. coli BL21(DE3) cells. For each protein construct, single colony was inoculated into 5 ml of LB media with kanamycin (50 mg/liter) and incubated at 37C and 220 rpm for 16 hours (overnight). Overnight culture (5 ml) was then inoculated into 1 liter of Luria-Bertani (LB) media with kanamycin (50 mg/liter) and incubated at 37C and 200 rpm for around 5 hours until an optical density at 600 nm (OD600) reached 0.6. The culture was then cooled to 18C, induced with 0.5 mM isopropyl--d-thiogalactopyranoside, and incubated for 16 hours at 200 rpm. The cells were harvested by centrifugation at 4000g for 15 min, and all the following steps were performed on ice with prechilled buffers and reagents. The cell pellet was first washed in 50 mM (pH 8.0) tris buffer, resuspended in lysis buffer [10 mM imidazole, 50 mM sodium phosphate, and 300 mM sodium chloride (pH 7.4)], and lysed by sonication. The cellular debris was removed from cell lysate by centrifugation at 16,000g and 4C for 1 hour. The enzyme proteins were purified from the supernatant using Ni-NTA agarose affinity chromatography and eluted using a range of imidazole concentrations (40, 100, 150, 200, 250, and 450 mM) with the target protein most efficiently eluted at 200 mM imidazole. The purified proteins were then buffer-exchanged and concentrated to storage buffer [50 mM potassium phosphate, 100 mM NaCl, and 10% (v/v) glycerol (pH 7.5)]. The protein concentration was determined by NanoDrop and corrected by extinction coefficient. The final yield for all three proteins is ~2.2 mg/ml. Aliquots of the purified proteins were flash-frozen and stored at 80C.

p-Coumaroyl-CoA was synthesized by a batch of in vitro reactions with purified protein (40 g/ml) of At4CL, 400 M p-coumaric acid, 400 M CoA, 4 mM adenosine 5-triphosphate, and 5 mM MgCl2, added to a buffer with 50 mM potassium phosphate and 100 mM NaCl at pH 7.5. The reaction mixture was incubated at 37C and 500 rpm for 4 hours. Aliquots of the reaction products were stored at 20C.

For SlCHS and AtCHS in vitro activity validation, 4 g of purified protein and 200 M p-coumaroyl-CoA were incubated with 200 M malonyl-CoA and/or 3-HAA methyl ester in a 50-l reaction volume at 30C and 450 rpm for 4 hours in the dark. The reaction volume was quenched in equal volume of acidic methanol (with 0.1% formic acid), the mixture was centrifuged at 32,000g for 10 min, and the supernatant was used for LC-MS analysis. For the specificity assay, 4 g of purified protein and 200 M p-coumaroyl-CoA were incubated with 200 M 3-HAA, methyl 3-hydroxybenzoate, 2-amino-3-methoxybenzoic acid, 2-amino-4-methoxybenzoic acid, 2-amino-5-methoxybenzoic acid, 2-amino-5-hydroxybenzoic acid, or 2-anthranilic acid, with the same incubation and extraction protocol described above.

For amide synthesis kinetic assays, 680 or 40 nM purified SlCHS protein and 200 or 500 M p-coumaroyl-CoA were incubated with 0, 1, 5, 10, 50, 100, and 200 M or 0, 0.4, 0.8, 1.6, 3, 5, 10, and 15 mM 3-HAA methyl ester. For canonical activity kinetic assay, 120 nM purified SlCHS protein and 200 M p-coumaroyl-CoA were incubated with 0, 5, 50, 100, 200, 300, and 500 M malonyl-CoA. For each assay, duplicates were performed in 50-l reaction volumes; incubated at 30C and 450 rpm under dark condition; and quenched by adding equal volume of acidic methanol (with 0.1% formic acid) at 5, 10, 15, 20, and 25 min (for amide synthesis with low concentration range of 3-HAA methyl ester); at 6, 24, 30, and 36 min (for amide synthesis with high concentration range of 3-HAA methyl ester); or at 5, 10, 17, 24, and 31 min (for canonical activity). The samples were further diluted by adding 30 l of water and filtered using 0.2 M filter plates before measurements on LC-MS/MS.

For enzymatic inhibition assays, 108 nM purified SlCHS protein was incubated with 200 M p-coumaroyl-CoA and 5, 50, or 100 M malonyl-CoA and 0, 3, or 5 mM 3-HAA methyl ester. For each assay, duplicates were performed in 40-l reaction volumes; incubated at 30C, 450 rpm under dark condition; and quenched by adding equal volume of acidic methanol (with 0.1% formic acid) at 5, 10, 17, 24, and 31 min. The samples were further diluted by adding 30 l of water and filtered using 0.2 M filter plates before measurements on LC-MS/MS.

For untargeted metabolomic analysis, data were obtained from n = 3 biologically independent replicates. Biological independence refers to individual colonies of a yeast strain inoculated into separate culture volumes under the same feeding and growth conditions. qToF-MS data files were converted to mzXML files using MSConvert, and untargeted metabolomics differential analysis was performed using the xcms package in R (52). The differential peaks were then identified by sorting the diffreport generated from xcms differential analysis by fold parameter, with a filter set for a P value smaller than 0.01.

For metabolite production, each liquid chromatogram trace is representative of two biologically independent replicates. Ion count data show the mean of n = 2 or 3 biologically independent replicates, with error bar indicating the SD. Biological independence refers to individual colonies of a yeast strain inoculated into separate culture volumes under the same feeding and growth conditions. Statistical significance analysis was performed (for selected data) by unpaired two-tailed t test.

For in vitro kinetic assay, progress curve data show the mean of compound produced from n = 2 independent replicates performed simultaneously in separated reaction volumes, with error bar indicating the SD. For amide synthesis kinetic assays, initial reaction rates and error bars were calculated by fitting progress curves with a built-in linear regression tool in GraphPad Prism 7 for amide formation reactions. For canonical activity inhibition assay by 3-HAA methyl ester, progress curves were fitted using DynaFit (53) through an ordinary differential equation (ODE)based system derived from the kinetic model specified in fig. S5E. Because of an initial lag phase in the progress curve, the reaction rates were obtained from the first derivative of the progress curve (calculated by DynaFit) and then fitted to the general equation M(1exp(ax)) in MATLAB 2017a, in which M, i.e., plateau of the rate function, represents the reaction rate at steady state, i.e., linear region of the progress curve. For kinetic curve, data show the slope or M obtained from progress curve data analysis, with error bar representing the relative error (%) of the slope (calculated by GraphPad Prism 7 linear regression tool) or relative RMS (%) for progress curve fitting (calculated by DynaFit). Km and Vmax for kinetic data were estimated using built-in Michaelis-Menten kinetic nonlinear regression tool in GraphPad Prism 7 (for amide synthesis) or MATLAB 2017b by fitting data with kinetic equations as specified in Fig. 5C (for canonical activity inhibition assay).

Acknowledgments: We thank A. Cravens for the providing the Cas9/single-guide RNA plasmids (pCS3410, 3411, and 3414) for yeast genomic editing, J. Payne for performing the geometry optimizations of substrate structures for docking simulations, T. Valentic and J. Payne for training in protein purification and valuable discussions on protocol design for in vitro experiments, J. E. Jeon and X. Guan for assistance with tomato metabolomics analyses, and the Stanford ChEM-H Metabolic Chemistry Analysis Center and C. Fischer for instrument (qToF-MS) access and training. We thank E. Sattely, S. Y. Rhee, and C. Khosla for discussions and advice on experimental design. We thank T. Valentic, P. Srinivasan, and B. Kotopka for feedback in the preparation of this manuscript. Funding: This work was supported by the NIH U01GM110699 Genome to Natural Products Initiative and Chan-Zuckerberg Biohub Foundation. Author contributions: All authors designed the research, analyzed the data, and wrote the paper. D.K. and S.L. performed the research. S.L. performed untargeted metabolomics analysis and found the new compounds. D.K. and S.L. proposed and characterized the tomato cluster activity in yeast. D.K. performed and analyzed CHS in vivo site-directed mutagenesis studies and in vitro enzyme assays. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

Read this article:
Discovery of a previously unknown biosynthetic capacity of naringenin chalcone synthase by heterologous expression of a tomato gene cluster in yeast -...

A computer’s all you need: Folding@Home joins the race to find a COVID-19 cure – The Stanford Daily

Posted on August 21, 2020 by Prof Baldwin

Today, you just need a computer Thats all you need. You dont need to have a fancy computer [or a] super modern computer. Anything will do, said Anton Thynell, head of collaboration and communication at [emailprotected]

Founded by chemistry, structural biology and computer science professor Vijay Pande in 2000 at Stanford, the global computing research community [emailprotected] (FAH) is now joining the race to find a cure for COVID-19. Volunteers from across the globe are downloading the FAH software, which is accessible to everyone, to run simulations of protein-folding in the background of their computer. The simultaneous running of these simulations contributes to researchers efforts to find treatments to certain diseases, illnesses and COVID-19.

[emailprotected] was originally a computing project that studied and simulated biomolecular systems. In 2006, collaborators from Stanford University joined the project and later increased computing performance to a level that rivaled that of a supercomputer.

Upon downloading the FAH software, volunteers are given specific proteins to run simulations on. They then can start folding by running their extra CPU power a part of the computer that operates instructions and later, they upload the results. The word folding comes from the process that proteins undergo when they are created. During that process, protein molecules transform from a long chain of amino acids to a complex shape (it folds up). The resulting structure allows researchers to understand the proteins properties and functions.

The FAH community aims to apply their professional knowledge along with volunteers computing power to understand the role of proteins dynamics in their function and dysfunction, and to aid in the design of new proteins and therapeutics. It is established as the worlds fastest supercomputer according to Ethan Zuo, president of [emailprotected] a group of volunteers who contribute to the [emailprotected] research project.

Thynell, who joined the [emailprotected] community in 2013, said that since COVID-19 began, he and his team have created a separate project that relied on the [emailprotected] concept to understand SARS-CoV-2.

[COVID-19] was really an all-hands-on-deck situation, Thynell said. I stopped working at my regular job and started full-time at [emailprotected] We grew our community [to] about 150 times [our past size] in three months. Thats where we are today.

Thynell broke down the process and importance of understanding protein dynamics when trying to find treatments or solutions to diseases, pandemics and more.

Most of the time, when youre studying biology, you look at proteins as a fixed structure, but theyre actually moving around, Thynell said.And there are tons of reactions happening in our cell structure all the time. So these proteins are actually like small machines We wanted to understand more about the virus and hopefully find some hidden pockets. Its like a treasure map, and sometimes you find a treasure.

These hidden pockets can open up for a certain period of time and you can look at them at potential[ly] druggable sites, which is very interesting for developing therapeutics, he added.

Zuo added that [emailprotected] is helping researchers study spike proteins, a type of protein that is part of the SARS-CoV-2 and allows the coronavirus to enter host cells. Zuo states that using extra computing power to run simulations of the virus can speed up the process of studying how these proteins work, which can then help researchers find ways to manipulate them using medicine.

When you download our software from our website and you have Wi-Fi or internet, you connect with our servers and download the small work unit thats a small part of a large simulation and your computer starts crunching away at it, Thynell said. You can decide how much computing power you want to dedicate or when you want to start. Its all up to you.

Recently, Zuo has been very active in volunteering for the [emailprotected] COVID-19 project. He leaves his computer on 24 hours a day so that it can build computational models to help identify sites of the spike protein that researchers can target through a therapeutic antibody.

[When] school shut down, everyone was doing online learning, Zuo said. When doing online learning, I realized that everyone is using their computers for a large fraction of the day [but] not 100% of their computing potential is used. So I decided to [help] put that extra compute power to good use Even though [emailprotected] is the worlds largest supercomputer, a surprising number of people dont know about [it].

By reaching out to more people, youll make the supercomputer more powerful [in] finding a cure for COVID-19 more quickly and gain knowledge more effectively, he added.

Recently, [emailprotected] has been working with COVID Moonshotan organization aiming to develop inexpensive patent-free therapeutics for COVID-19 to identify key compounds that may stop the main viral protease (Mpro),an enzyme that breaks down proteins of COVID-19. As of now, over 800 compounds have been simulated and tested. Volunteers are actively participating in weekly sprints in which they donate their computing power to crunching work units to collect and generate new designs for proteins. Additionally, researchers are constantly discovering new things about the virus and are actively publishing them on their home website.

To see and measure progress within [emailprotected] teams, volunteers are able to collect individual points for their contributions, which are displayed on a universal leaderboard. Depending on the computation power and system, certain amounts of points may also be awarded to teams, which puts them higher on the leaderboard.

According to Thynell, the leaderboard also shows what communities are participating in folding; these include tech companies such as Google, Reddit, Linus, NVIDIA and Intel. Global teams include China [emailprotected] Power, Overclockers Australia and TSC Russia.

What was really interesting is that [emailprotected] is global, Thynell said. We have people contributing from every part of the world. And its really amazing to see a global community coming together and fighting the virus, with the spare computing power of your home computer. That has been really nice to see.

Contact Rachel Jiang at racheljiang310 at gmail.com

Continued here:
A computer's all you need: Folding@Home joins the race to find a COVID-19 cure - The Stanford Daily

Protein domain – Wikipedia

Posted on August 21, 2020 by Prof Baldwin

Conserved part of a protein

A protein domain is a conserved part of a given protein sequence and tertiary structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of different proteins. Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions. In general, domains vary in length from between about 50 amino acids up to 250 amino acids in length.[1] The shortest domains, such as zinc fingers, are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are independently stable, domains can be "swapped" by genetic engineering between one protein and another to make chimeric proteins.

The concept of the domain was first proposed in 1973 by Wetlaufer after X-raycrystallographic studies of hen lysozyme[2] and papain[3]and by limited proteolysis studies of immunoglobulins.[4][5] Wetlaufer defined domains as stable units of protein structure that could fold autonomously. In the past domains have been described as units of:

Each definition is valid and will often overlap, i.e. a compact structural domain that is found amongst diverse proteins is likely to fold independently within its structural environment. Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities.[9] In a multidomain protein, each domain may fulfill its own function independently, or in a concerted manner with its neighbours. Domains can either serve as modules for building up large assemblies such as virus particles or muscle fibres, or can provide specific catalytic or binding sites as found in enzymes or regulatory proteins.

An appropriate example is pyruvate kinase (see first figure), a glycolytic enzyme that plays an important role in regulating the flux from fructose-1,6-biphosphate to pyruvate. It contains an all- nucleotide binding domain (in blue), an /-substrate binding domain (in grey) and an /-regulatory domain (in olive green),[10] connected by several polypeptide linkers.[11] Each domain in this protein occurs in diverse sets of protein families.[12]

The central /-barrel substrate binding domain is one of the most common enzyme folds. It is seen in many different enzyme families catalysing completely unrelated reactions.[13] The /-barrel is commonly called the TIM barrel named after triose phosphate isomerase, which was the first such structure to be solved.[14] It is currently classified into 26 homologous families in the CATH domain database.[15] The TIM barrel is formed from a sequence of -- motifs closed by the first and last strand hydrogen bonding together, forming an eight stranded barrel. There is debate about the evolutionary origin of this domain. One study has suggestedthat a single ancestral enzyme could have diverged into several families,[16] while another suggests that a stable TIM-barrel structure has evolvedthrough convergent evolution.[17]

The TIM-barrel in pyruvate kinase is 'discontinuous', meaning that more than one segment of the polypeptide is required to form the domain. This is likely to be the result of the insertion of one domain into another during the protein's evolution. It has been shown from known structures that about a quarter of structural domains are discontinuous.[18][19] The inserted -barrel regulatory domain is 'continuous', made up of a single stretch of polypeptide.

The primary structure (string of amino acids) of a protein ultimately encodes its uniquely folded three-dimensional (3D) conformation.[20] The most important factor governing the folding of a protein into 3D structure is the distribution of polar and non-polar side chains.[21] Folding is driven by the burial of hydrophobic side chains into the interior of the molecule so to avoid contact with the aqueous environment. Generally proteins have a core of hydrophobic residues surrounded by a shell of hydrophilic residues. Since the peptide bonds themselves are polar they are neutralised by hydrogen bonding with each other when in the hydrophobic environment. This gives rise to regions of the polypeptide that form regular 3D structural patterns called secondary structure. There are two main types of secondary structure: -helices and -sheets.

Some simple combinations of secondary structure elements have been found to frequently occur in protein structure and are referred to as supersecondary structure or motifs. For example, the -hairpin motif consists of two adjacent antiparallel -strands joined by a small loop. It is present in most antiparallel structures both as an isolated ribbon and as part of more complex -sheets. Another common super-secondary structure is the -- motif, which is frequently used to connect two parallel -strands. The central -helix connects the C-termini of the first strand to the N-termini of the second strand, packing its side chains against the -sheet and therefore shielding the hydrophobic residues of the -strands from the surface.

Covalent association of two domains represents a functional and structural advantage since there is an increase in stability when compared with the same structures non-covalently associated.[22] Other, advantages are the protection of intermediates within inter-domain enzymatic clefts that mayotherwise be unstable in aqueous environments, and a fixed stoichiometric ratio of the enzymatic activity necessary for a sequential set of reactions.[23]

Structural alignment is an important tool for determining domains.

Several motifs pack together to form compact, local, semi-independent units called domains.[6]The overall 3D structure of the polypeptide chain is referred to as the protein's tertiary structure. Domains are the fundamental units of tertiary structure, each domain containing an individual hydrophobic core built from secondary structural units connected by loop regions. The packing of the polypeptide is usually much tighter in the interior than the exterior of the domain producing a solid-like core and a fluid-like surface.[24] Core residues are often conserved in a protein family, whereas the residues in loops are less conserved, unless they are involved in the protein's function. Protein tertiary structure can be divided into four main classes based on the secondary structural content of the domain.[25]

Domains have limits on size.[27] The size of individual structural domains varies from 36 residues in E-selectin to 692 residues in lipoxygenase-1,[18] but the majority, 90%, have fewer than 200 residues[28] with an average of approximately 100 residues.[29] Very short domains, less than 40 residues, are often stabilised by metal ions or disulfide bonds. Larger domains, greater than 300 residues, are likely to consist of multiple hydrophobic cores.[30]

Many proteins have a quaternary structure, which consists of several polypeptide chains that associate into an oligomeric molecule. Each polypeptide chain in such a protein is called a subunit. Hemoglobin, for example, consists of two and two subunits. Each of the four chains has an all- globin fold with a heme pocket.

Domain swapping is a mechanism for forming oligomeric assemblies.[31] In domain swapping, a secondary or tertiary element of a monomeric protein is replaced by the same element of another protein. Domain swapping can range from secondary structure elements to whole structural domains. It also represents a model of evolution for functional adaptation by oligomerisation, e.g. oligomeric enzymes that have their active site at subunit interfaces.[32]

Nature is a tinkerer and not an inventor,[33] new sequences are adapted from pre-existing sequences rather than invented. Domains are the common material used by nature to generate new sequences; they can be thought of as genetically mobile units, referred to as 'modules'. Often, the C and N termini of domains are close together in space, allowing them to easily be "slotted into" parent structures during the process of evolution. Many domain families are found in all three forms of life, Archaea, Bacteria and Eukarya.[34] Protein modules are a subset of protein domains which are found across a range of different proteins with a particularly versatile structure. Examples can be found among extracellular proteins associated with clotting, fibrinolysis, complement, the extracellular matrix, cell surface adhesion molecules and cytokine receptors.[35] Four concrete examples of widespread protein modules are the following domains: SH2, immunoglobulin, fibronectin type 3 and the kringle.[36]

Molecular evolution gives rise to families of related proteins with similar sequence and structure. However, sequence similarities can be extremely low between proteins that share the same structure. Protein structures may be similar because proteins have diverged from a common ancestor. Alternatively, some folds may be more favored than others as they represent stable arrangements of secondary structures and some proteins may converge towards these folds over the course of evolution. There are currently about 110,000 experimentally determined protein 3D structures deposited within the Protein Data Bank (PDB).[37] However, this set contains many identical or very similar structures. All proteins should be classified to structural families to understand their evolutionary relationships. Structural comparisons are best achieved at the domain level. For this reason many algorithms have been developed to automatically assign domains in proteins with known 3D structure; see 'Domain definition from structural co-ordinates'.

The CATH domain database classifies domains into approximately 800 fold families; ten of these folds are highly populated and are referred to as 'super-folds'. Super-folds are defined as folds for which there are at least three structures without significant sequence similarity.[38] The most populated is the /-barrel super-fold, as described previously.

The majority of proteins, two-thirds in unicellular organisms and more than 80% in metazoa, are multidomain proteins.[39] However, other studies concluded that 40% of prokaryotic proteins consist of multiple domains while eukaryotes have approximately 65% multi-domain proteins.[40]

Many domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes,[41] suggesting that domains in multidomain proteins have once existed as independent proteins. For example, vertebrates have a multi-enzyme polypeptide containing the GAR synthetase, AIR synthetase and GAR transformylase domains (GARs-AIRs-GARt; GAR: glycinamide ribonucleotide synthetase/transferase; AIR: aminoimidazole ribonucleotide synthetase). In insects, the polypeptide appears as GARs-(AIRs)2-GARt, in yeast GARs-AIRs is encoded separately from GARt, and in bacteria each domain is encoded separately.[42]

Multidomain proteins are likely to have emerged from selective pressure during evolution to create new functions. Various proteins have diverged from common ancestors by different combinations and associations of domains. Modular units frequently move about, within and between biological systems through mechanisms of genetic shuffling:

The simplest multidomain organization seen in proteins is that of a single domain repeated in tandem.[46] The domains may interact with each other (domain-domain interaction) or remain isolated, like beads on string. The giant 30,000 residue muscle protein titin comprises about 120 fibronectin-III-type and Ig-type domains.[47] In the serine proteases, a gene duplication event has led to the formation of a two -barrel domain enzyme.[48] The repeats have diverged so widely that there is no obvious sequence similarity between them. The active site is located at a cleft between the two -barrel domains, in which functionally important residues are contributed from each domain. Genetically engineered mutants of the chymotrypsin serine protease were shown to have some proteinase activity even though their active site residues were abolished and it has therefore been postulated that the duplication event enhanced the enzyme's activity.[48]

Modules frequently display different connectivity relationships, as illustrated by the kinesins and ABC transporters. The kinesin motor domain can be at either end of a polypeptide chain that includes a coiled-coil region and a cargo domain.[49] ABC transporters are built with up to four domains consisting of two unrelated modules, ATP-binding cassette and an integral membrane module, arranged in various combinations.

Not only do domains recombine, but there are many examples of a domain having been inserted into another. Sequence or structural similarities to otherdomains demonstrate that homologues of inserted and parent domains can exist independently. An example is that of the 'fingers' inserted into the 'palm' domain within the polymerases of the Pol I family.[50] Since a domain can be inserted into another, there should always be at least one continuous domain in a multidomain protein. This is the main difference between definitions of structural domains and evolutionary/functional domains. An evolutionary domain will be limited to one or two connections between domains, whereas structural domains can have unlimited connections, within a given criterion of the existence of a common core. Several structural domains could be assigned to an evolutionary domain.

A superdomain consists of two or more conserved domains of nominally independent origin, but subsequently inherited as a single structural/functional unit.[51] This combined superdomain can occur in diverse proteins that are not related by gene duplication alone. An example of a superdomain is the protein tyrosine phosphataseC2 domain pair in PTEN, tensin, auxilin and the membrane protein TPTE2. This superdomain is found in proteins in animals, plants and fungi. A key feature of the PTP-C2 superdomain is amino acid residue conservation in the domain interface.

Protein folding - the unsolved problem: Since the seminal work of Anfinsen in the early 1960s,[20] the goal to completely understand the mechanism by which a polypeptide rapidly folds into its stable native conformation remains elusive. Many experimental folding studies have contributed much to our understanding, but the principles that govern protein folding are still based on those discovered in the very first studies of folding. Anfinsen showed that the native state of a protein is thermodynamically stable, the conformation being at a global minimum of its free energy.

Folding is a directed search of conformational space allowing the protein to fold on a biologically feasible time scale. The Levinthal paradox states that if an averaged sized protein would sample all possible conformations before finding the one with the lowest energy, the whole process would take billions of years.[52] Proteins typically fold within 0.1 and 1000 seconds. Therefore, the protein folding process must be directed some way through a specific folding pathway. The forcesthat direct this search are likely to be a combination of local and global influences whose effects are felt at various stages of the reaction.[53]

Advances in experimental and theoretical studies have shown that folding can be viewed in terms of energy landscapes,[54][55] where folding kinetics is considered as a progressive organisation of an ensemble of partially folded structures through which a protein passes on its way to the folded structure. This has been described in terms of a folding funnel, in which an unfolded protein has a large number of conformational states available and there are fewer states available to the folded protein. A funnel implies that for protein folding there is a decrease in energy and loss of entropy with increasing tertiary structure formation. The local roughness of the funnel reflects kinetic traps, corresponding to the accumulation of misfolded intermediates. A folding chain progresses toward lower intra-chain free-energies by increasing its compactness. The chain's conformational options become increasingly narrowed ultimately toward one native structure.

The organisation of large proteins by structural domains represents an advantage for protein folding, with each domain being able to individually fold, accelerating the folding process and reducing a potentially large combination of residue interactions. Furthermore, given the observed random distribution of hydrophobic residues in proteins,[56] domain formation appears to be the optimal solution for a large protein to bury its hydrophobic residues while keeping the hydrophilic residues at the surface.[57][58]

However, the role of inter-domain interactions in protein folding and in energetics of stabilisation of the native structure, probably differs for each protein. In T4 lysozyme, the influence of one domain on the other is so strong that the entire molecule is resistant to proteolytic cleavage. In this case, folding is a sequential process where the C-terminal domain is required to fold independently in an early step, and the other domain requires the presence of the folded C-terminal domain for folding and stabilisation.[59]

It has been found that the folding of an isolated domain can take place at the same rate or sometimes faster than that of the integrated domain,[60] suggesting that unfavourable interactions with the rest of the protein can occur during folding. Several arguments suggest that the slowest step in the folding of large proteins is the pairing of the folded domains.[30] This is either because the domains are not folded entirely correctly or because the small adjustments required for their interaction are energetically unfavourable,[61] such as the removal of water from the domain interface.

Protein domain dynamics play a key role in a multitude of molecular recognition and signaling processes.Protein domains, connected by intrinsically disordered flexible linker domains, induce long-range allostery via protein domain dynamics.The resultant dynamic modes cannot be generally predicted from static structures of either the entire protein or individual domains. They can however be inferred by comparing different structures of a protein (as in Database of Molecular Motions). They can also be suggested by sampling in extensive molecular dynamics trajectories[62] and principal component analysis,[63] or they can be directly observed using spectra[64][65]measured by neutron spin echo spectroscopy.

The importance of domains as structural building blocks and elements of evolution has brought about many automated methods for their identification and classification in proteins of known structure. Automatic procedures for reliable domain assignment is essential for the generation of the domain databases, especially as the number of known protein structures is increasing. Although the boundaries of a domain can be determined by visual inspection, construction of an automated method is not straightforward. Problems occur when faced with domains that are discontinuous or highly associated.[66] The fact that there is no standard definition of what a domain really is has meant that domain assignments have varied enormously, with each researcher using a unique set of criteria.[67]

A structural domain is a compact, globular sub-structure with more interactions within it than with the rest of the protein.[68]Therefore, a structural domain can be determined by two visual characteristics: its compactness and its extent of isolation.[69] Measures of local compactness in proteins have been used in many of the early methods of domain assignment[70][71][72][73] and in several of the more recent methods.[28][74][75][76][77]

One of the first algorithms[70] used a C-C distance map together with a hierarchical clustering routine that considered proteins as several small segments, 10 residues in length. The initial segments were clustered one after another based on inter-segment distances; segments with the shortest distances were clustered and considered as single segments thereafter. The stepwise clustering finally included the full protein. Go[73] also exploited the fact that inter-domain distances are normally larger than intra-domain distances; all possible C-C distances were represented as diagonal plots in which there were distinct patterns for helices, extended strands and combinations of secondary structures.

The method by Sowdhamini and Blundell clusters secondary structures in a protein based on their C-C distances and identifies domains from the pattern intheir dendrograms.[66] As the procedure does not consider the protein as a continuous chain of amino acids there are no problems in treating discontinuous domains. Specific nodes in these dendrograms are identified as tertiary structural clusters of the protein, these include both super-secondary structures and domains. The DOMAK algorithm is used to create the 3Dee domain database.[75] It calculates a 'split value' from the number of each type of contact when the protein is divided arbitrarily into two parts. This split value islarge when the two parts of the structure are distinct.

The method of Wodak and Janin[78] was based on the calculated interface areas between two chain segments repeatedly cleaved at various residue positions. Interface areas were calculated by comparing surface areas of the cleaved segments with that of the native structure. Potential domain boundaries can be identified at a site where the interface area was at a minimum. Other methods have used measures of solvent accessibility to calculate compactness.[28][79][80]

The PUU algorithm[19] incorporates a harmonic model used to approximate inter-domain dynamics. The underlying physical concept is that many rigid interactions will occur within each domain and loose interactions will occur between domains. This algorithm is used to define domains in the FSSP domain database.[74]

Swindells (1995) developed a method, DETECTIVE, for identification of domains in protein structures based on the idea that domains have a hydrophobicinterior. Deficiencies were found to occur when hydrophobic cores from different domains continue through the interface region.

RigidFinder is a novel method for identification of protein rigid blocks (domains and loops) from two different conformations. Rigid blocks are defined as blocks where all inter residue distances are conserved across conformations.

The method RIBFIND developed by Pandurangan and Topf identifies rigid bodies in protein structures by performing spacial clustering of secondary structural elements in proteins.[81] The RIBFIND rigid bodies have been used to flexibly fit protein structures into cryo electron microscopy density maps.[82]

A general method to identify dynamical domains, that is proteinregions that behave approximately as rigid units in the course ofstructural fluctuations, has been introduced by Potestio et al.[62] and, among other applications was also usedto compare the consistency of the dynamics-based domainsubdivisions with standard structure-based ones. The method,termed PiSQRD, is publicly available in the form of a webserver.[83] The latter allows users to optimally subdivide single-chainor multimeric proteins into quasi-rigid domains[62][83] based on the collective modes of fluctuation of the system. By default thelatter are calculated through an elastic network model;[84]alternatively pre-calculated essential dynamical spaces can beuploaded by the user.

A large fraction of domains are of unknown function. Adomain of unknown function(DUF) is aprotein domainthat has no characterized function. These families have been collected together in thePfamdatabase using the prefix DUF followed by a number, with examples beingDUF2992andDUF1220. There are now over 3,000 DUF families within the Pfam database representing over 20% of known families.[86]

This article incorporates text and figures from George, R. A. (2002) "Predicting Structural Domains in Proteins" Thesis, University College London, which were contributed by its author.

See original here:
Protein domain - Wikipedia

The Cyberlaw Podcast: It’s Time to Pay Attention When Attention Stops Paying – Lawfare

Posted on December 8, 2020 by Prof Baldwin

Did you ever wonder where all that tech money came from all of a sudden? Turns out, a lot of it comes from online programmatic ads, an industry that gets little attention even from the companies, such as Google, that it made wealthy. That lack of attention is pretty ironic, because lack of attention is whats going to kill the industry, according to Tim Hwang, former Google policy maven and current research fellow at the Center for Security and Emerging Technology (CSET).

In our interview, Tim Hwang explains the remarkably complex industry and the dynamics that are gradually leaching the value out of its value proposition. Tim thinks were in an attention bubble, and the popping will be messy. Im persuaded the bubble is here but not that its end will be disastrous outside of Silicon Valley.

Sultan Meghji and I celebrate what seems like excellent news about a practical artificial intelligence (AI) achievement in predicting protein folding. Its a big deal, and an ideal problem for AI, with one exception. The parts of the problem that AI hasnt solved would be a lot easier for humans to work on if AI could tell us how it solved the parts it did figure out. Explainability, it turns out, is the key to collaborative AI-human work.

We welcome first time participant and long-time listener Jordan Schneider to the panel. Jordan is the host of the unmissable ChinaTalk podcast. Given his expertise, we naturally ask him about Australia. Actually, its natural, because Australia is now the testing ground for many of Chinas efforts to exercise power over independent countries using cyber power along with trade. Among the highlights: Chinese tweets highlighting a report about Australian war crimes followed by ham-handed tweet-boosting bot campaigns. And in a move that ought to be featured in future justifications of the Trump administrations ban on WeChat, the platform refused to carry the Australian prime ministers criticism of the war-crimes tweet.

Sen. Ted Cruz, call your office! And this will have to be Sen. Cruzs fight, because it looks more and more as though the Trump administration has thrown in the towel. Its claim that it is negotiating a TikTok sale after ordering divestment is getting thinner; now the divestment deadline has completely disappeared, as the government simply says that negotiations continue. Nick Weaver is on track to win his bet with me that CFIUS wont make good on its order before the mess is shoveled onto President-elect Joe Bidens plate.

Whoever was in charge of beating up WeChat and TikTok may have left the government early, but the team thats sticking pins in other Chinese companies is still hard at work. Jordan and Brian Egan talk about the addition of SMIC to the amorphous defense blacklist. And Congress has passed a law (awaiting the presidents signature) that will make life hard for Chinese firms listed on U.S. exchanges.

China, meanwhile, isnt taking this lying down, Jordan reports. It is mirror-imaging all the Western laws that it sees as targeting China, including bans on exports of Chinese products and technology. It is racing (on what Jordan thinks is a twenty-year pace) to create its own chip design capabilities. And with some success. Sultan takes some of the hype out of Chinas claims to quantum supremacy. Though even dehyped, Chinas achievement should be making those who rely on RSA-style crypto just a bit nervous (thats all of us, by the way).

Michael Weiner previews the still veiled state antitrust lawsuit against Facebook and promises to come back with details as soon as its filed.

In quick hits, I explain why we havent covered the Iranian claim that their scientist was rubbed out by an Israeli killer robot machine gun: I dont actually believe them. Brian explains that another law aimed at China and its use of Xinjian forced labor is attracting lobbyists but likely to pass. Apple, Nike, and Coca-Cola have all taken hits for lobbying on the bill; none of them say they oppose the bill, but it turns out theres a reason for that. Lobbyists have largely picked the bones clean.

President Trump is leaving office in typical fashiongesturing in the right direction but uninteresting in actually getting there. In a Too Much Too Late negotiating move, the President has threatened to veto the defense authorization act if it doesnt include a repeal of Section 230 of the Communications Decency Act. If hes yearning to wield the veto, the Democrats and GOP alike seem willing to give him the chance. They may even override, or wait until Jan. 20 to pass it again.

Finally, I commend to interested listeners the oral argument in the Supreme Courts Van Buren case, about the Computer Fraud and Abuse Act. The solicitor generals footwork in making up quasi textual limitations on the more sweeping readings of the act is admirable, and it may well be enough to keep van Buren in jail, where he probably belongs for some crime, if not this one.

And more.

Download the 341st Episode (mp3)

You can subscribe to The Cyberlaw Podcast using iTunes, Google Play, Spotify, Pocket Casts, or our RSS feed. As always, The Cyberlaw Podcast is open to feedback. Be sure to engage with @stewartbaker on Twitter. Send your questions, comments, and suggestions for topics or interviewees to CyberlawPodcast@steptoe.com. Remember: If your suggested guest appears on the show, we will send you a highly coveted Cyberlaw Podcast mug!

The views expressed in this podcast are those of the speakers and do not reflect the opinions of their institutions, clients, friends, families, or pets.

Originally posted here:
The Cyberlaw Podcast: It's Time to Pay Attention When Attention Stops Paying - Lawfare

A math problem stumped experts for 50 years. This grad student from Maine solved it in days – The Boston Globe

Posted on August 21, 2020 by Prof Baldwin

The problem had to do with proving whether the Conway knot was something called slice, an important concept in knot theory that well get to a little later. Of all the many thousands of knots with 12 or fewer crossings, mathematicians had been able to determine the sliceness of all but one: the Conway knot. For more than 50 years, the knot stubbornly resisted every attempt to untangle its secret, along the way achieving a kind of mythical status. A sculpture of it even adorns a gate at the University of Cambridges Isaac Newton Institute for Mathematical Sciences.

Then, two years ago, a little-known graduate student named Lisa Piccirillo, who grew up in Maine, learned about the knot problem while attending a math conference. A speaker mentioned the Conway knot during a discussion about the challenges of studying knot theory. For example, the speaker said, we still dont know whether this 11-crossing knot is slice.

Thats ridiculous, Piccirillo thought while she listened. This is 2018. We should be able to do that. A week later, she produced a proof that stunned the math world.

__________

Knot theory is a sub-specialty of a field of mathematics known as topology, which is concerned with the study of spaces. Whats it used for? The answer one memorizes is that topology is useful for understanding DNA and protein folding, Piccirillo tells me in May as we sit wearing masks and maintaining a good 10 feet of distance in an outdoor courtyard not far from where she lives in Harvard Square. Apparently these things are very long and they like to stick to themselves, so they get all knotted up.

When topologists think of knots, however, they dont imagine a length of rope with a gnarled twist in the middle. To them, a knot is more like an extension cord in which the two ends have been plugged together and the whole thing has been tossed onto the floor in a mess of crisscrosses. Its essentially a closed loop with various places where the loop crosses over itself.

Now lets take one of these knots and think for a moment about the space in which it exists. That space has a fourth dimension, such as time, and to a topologist, our knot is a kind of sphere that sits within it. Topologists see spheres everywhere, but in a specialized way: A circle is a one-dimensional sphere, while the skin surrounding an orange is a two-dimensional sphere. And here is where minds tend to get blown: If we were to take that whole orange and glue it to another one, topologists would see the resulting object as a three-dimensional sphere, one that could be viewed as the skin of a four-dimensional orange. Dont worry if you are unable to conjure such a higher-dimension image for yourself. There are only a couple hundred specialists doing this work in the world, and not even all of them can.

Piccirillo, who graduated from Boston College in 2013, was already well on her way to joining the ranks of those specialists when, in the summer of 2018, the speaker at the math conference said something that would change the trajectory of her career.

The speaker showed a slide depicting the Conway knot and explained that mathematicians had long suspected that the knot was not, in fact, slice, but no one had been able to prove it. So what does it mean for a knot to be slice? Lets return for a moment to that four-dimensional orange. Inside of it there are disks think of them as the surface of a plate. If a three-dimensional knot, like Conways, can bound such a disk, then the knot is slice. If it cannot, then it is not slice.

Topologists use mathematical tools called invariants to try to determine sliceness, but for half a century, those tools had been unable to help them prove the prevailing belief that the Conway knot wasnt slice. Sitting in that lecture hall two years ago, however, Piccirillo sensed right away that the techniques she was using in a different area of topology might help these invariants better apply to the Conway knot problem. I immediately knew that some work that I was doing for totally other reasons could at least try to answer this question, she says. She started on the problem the very next day.

__________

Piccirillo, who is 29, grew up in Greenwood, Maine, a town with a population of less than 900. She was an excellent student and her mom taught middle school math, but there was little in her interests to suggest that she would become a world-class mathematician.

I was an overachiever, she says. I rode dressage. I was very active in the youth group at my church. I did drama. I was in band. I did everything. Which is another way of saying that she wasnt one of those math prodigies whos programming computers and building algorithms at age 4.

When Piccirillo arrived on campus for her first year at Boston College in 2009, she was as interested in theater and other subjects as she was math. During a calculus class that year, though, she made a connection with professor J. Elisenda Grigsby. (Disclosure: I am the editor of Boston Colleges alumni magazine.)

Piccirillo stood out, even if she lacked a certain polish, Grigsby recalls. Golden-child mathematicians usually went to math camp when they were in high school and had been groomed from a young age, she says. That wasnt Piccirillos background, but I felt a kinship to her.

She really encouraged me, Piccirillo says of Grigsby. Eli really pushed me into trying another math class, and then liking the next class. I had already started on a progression. By her senior year, she was taking graduate-level topology courses. After graduating in 2013, she chose to pursue her doctorate at the University of Texas because of the universitys excellent topology program and its reputation as a great place for female math students. In 2014, just 28.9 percent of math and science doctorates were awarded to women, according to the National Science Foundation, but at Texas, something like 40 percent of graduate math students were women.

By and large, Piccirillo has felt welcomed and encouraged as a female mathematician. But now and again, things happen, she tells me. For example, in grad school, I would receive notes in my department mailbox commenting on my appearance.

Overall, Piccirillo excelled during her six years at the University of Texas, finding both strong mentorship and a supportive research community. The time coincided with her deepening connection to the math itself. She loved to turn problems over in her mind, thinking about how one higher-dimension shape might be manipulated to resemble an entirely different one. It was thrilling, creative work, as much about aesthetic as arriving at a particular result. When you perform a calculation, sometimes theres really clever tricks you can use or some ways that you can be an actual human and not a computer in the performing of the calculation, Piccirillo says. But when you make a logical argument thats entirely yours.

Outside of her studies, Piccirillo liked to make beautiful things. She carved wooden spoons for a while, as well as large-scale woodcut prints of fish and vegetables. She and her roommate, Wiley Jennings, built a dining room table together. For a while, she was obsessed with buying and repairing 70s Japanese motorcycles.

She has a very, very strong sense of aesthetic, says James Farre, a friend of Piccirillos from the University of Texas who specializes in geometry and is a postdoc at Yale. At Piccirillos level, math that people like is often thought of and talked about as beautiful or deep.

The day after hearing about the Conway knot problem, Piccirillo, then 27, sat down at her desk and began looking for a solution. Because much of her graduate work involved building pairs of knots that were different but shared some 4-D properties, she already knew that any two knots that share the same 4-D space also share sliceness theyre either both slice or both not slice. Since her goal was to prove that the Conway knot wasnt slice, her first step was come up with an entirely different knot with the same four-dimensional space, she explains. Then Ill try to show that the other knot isnt slice.

She spent spare time over the next several days hand-sketching and manipulating configurations of the 4-D space occupied by the Conway knot. I didnt allow myself to work on it during the day, she told Quanta Magazine earlier this year, because I didnt consider it to be real math. I thought it was, like, my homework.

The next step was to try to prove that the knot she drew was not slice. There are lots of tools already in the literature for doing that, she says. She would feed the knot iterations into a computer, and based on the data of the knot, maybe based on how its crossings look or other data that you can pull from the knot, the algorithm spits out an integer. In less than a week, Piccirillo had created a knot that hit the sweet spot: It had the same 4-D properties as the Conway knot, and it was found by the algorithm to be not slice.

She had suddenly succeeded where countless mathematicians had failed for five decades. She had solved the Conway knot problem.

__________

Not long after the breakthrough, Piccirillo attended a meeting with the Cameron Gordon, a University of Texas math professor. When she mentioned her solution, Gordon was skeptical. He asked Piccirillo to walk him through the steps. Then he made me write it down, like all up on the board, she recalls, and then he got very excited and started yelling.

Piccirillo submitted her solution to the Annals of Mathematics, and the prestigious math journal agreed to publish her paper. When I asked James Farre, the Yale postdoc, to explain the significance of having a paper published in the Annals he laughed for several seconds. Its head and shoulders the most important and influential journal in mathematics, he says. Thats why Im laughing. Its amazing and its so cool!

By the time Piccirillos paper appeared in the journal about a year later, word of her solution had already spread throughout the math world. After graduating from UT in 2019, Piccirillo started her postdoctoral work at Brandeis. The last time I saw her was in January, says Wiley Jennings, her roommate in Austin, who recently completed a doctorate at Stanford. She was out at a faculty visit here at Stanford. To be invited, as someone who has done one year or less [of postdoc study] just finished their PhD essentially I mean, thats insane. Its unheard of . . . I think thats when I first got a hint that like, Oh my gosh, shes really a hotshot.

Postdoc positions typically run for three or four years, but Piccirillo found herself in high demand. In July, she started a new tenure-track position as an assistant professor at MIT. Its been a whirlwind, and I wondered how her life has changed. The practical answer is not too much, she says. She still teaches undergrads and conducts her research. She acknowledges, though, that there sometimes is a feeling of pressure, based on what shes already accomplished. In practice, math for everyone is about trying to prove simple statements and failing, basically all of the time. So, she says, Im having to relearn how to be OK with the fact that most of the time Im failing to prove really simple stuff when Im feeling the weight of these expectations.

When I ask her about her goals, Piccirillo says one of her priorities is to help grow and broaden the mathematics community. There certainly are many young women, people of color, non-heterosexual, or non-gender binary people who feel put at an arms length by the institution of mathematics, she says. Its really important to me to help mitigate that in any small ways I can. One important way to do that, she continues, is to help shatter the myth of the math prodigy.

When universities organize math conferences, she says, they should avoid inviting speakers who give talks where they go really fast and they try to show you how smart they are and how hard their research is. Thats not good for anyone, but its especially not good for young people or people who are feeling maybe like they dont belong here. What those people in the audience dont know, she says, is that nobody else really understands it either.

You dont have to be really smart whatever that means to be a successful mathematician, Piccirillo says. Theres this idea that mathematicians are geniuses. A lot of them seem to be child prodigies that do these Olympiads. In fact, you dont have to come from that background at all to be very good at math and most mathematicians, including many of the really great ones, dont come from that sort of background.

And as Piccirillo herself proves, some of them even go on to produce work that alters the course of mathematics.

__________

John Wolfson is the editor of Boston College Magazine. Follow him on Twitter @johnwolfson and send comments to magazine@globe.com.

Link:
A math problem stumped experts for 50 years. This grad student from Maine solved it in days - The Boston Globe

Scientists discover protein linked to depression and brain disorders – The Irish Times

Posted on August 21, 2020 by Prof Baldwin

Earlier diagnosis and better treatments for people with depression and certain brain disorders may be possible following a research breakthrough involving Belfast-based scientists.

They have found how a specific protein plays a crucial role in the generation of neurons the nerve cells that relay electrical signals it the brain. This was made possible by focusing on a specific time and location during brain development, and how its disruption can lead to intellectual disability and depression in adults.

A research team led by Queens University Belfast (QUB) in collaboration with the Centre for Regenerative Therapies at Dresden University in Germany have published their findings in the journal Genes & Development.

It is expected this breakthrough will have a major impact on our fundamental understanding of brain development and lead to earlier diagnosis and better treatments for people with certain brain disorders, said Dr Vijay Tiwari, who is based at the Wellcome-Wolfson Institute for Experimental Medicine at QUB.

Our study reveals the key role this protein plays during the birth of probably one of the most important cells in our body the neuron.

Brain development is a highly complex process that involves generating various types of cells at defined time points and locations during embryonic development, he explained. Any kind of interference during these processes is known to cause diseases including a range of intellectual disabilities.

Among these brain cell types, neurons are the working unit of the brain, designed to transmit information to other nerve cells and various tissues in the body, such as the muscles as well as storage of memory in our brain, he added.

While the field has rapidly advanced, the mechanisms creating the birth of neurons from their mother cells, called neural stem cells, in time and space during development has not been well understood until now.

To conduct their study, the researchers looked at brain samples to closely determine the development of various cell types within the brain.

The study showed how the presence of a specific protein (called Phf21b), within a defined time window of brain development and in a specific location in the brain, signals the birth of neurons from neural stem cells in the right place and at the right time, said Dr Tiwari, who is a molecular biologist working in neuroscience.

The researchers found that removal of Phf21b stopped production of neurons from neural stem cells and led to severe defects in brain development. They also found the importance of this protein, in particular in the folding of DNA in cells going on to form neurons.

Understanding how a cell type in the brain is born at a specific point and in a specific place during development is crucial in our understanding how neurological issues arise later in life. We hope this discovery will pave the way for earlier diagnosis, earlier interventions and better treatment for people with a brain disorder, such as depression, he said.

Their research suggested screening for certain genetic variants would enable earlier diagnosis, in contrast to a scenario where depression in adults is not usually detected until a person is seriously depressed.

Here is the original post:
Scientists discover protein linked to depression and brain disorders - The Irish Times

Futurist Transhuman News Blog

Category Archives: Protein Folding

How a Google Engineer Used Her AI Smarts to Create the Ultimate Family Archive – PCMag UK

GT Gain Therapeutics SA Announces Funding from the Swiss Innovation Agency Supporting a 3-year Research Collaboration Project with the Institute for…

Those We Lost in 2020 – The Scientist

Has Google’s DeepMind revolutionized biology? | TheHill – The Hill

Deep medicine: Artificial intelligence is changing the face of healthcare, daily – Yiba

Tech.eu Podcast #198: Even more money for e-scooters, new VC funds, protein folding, and we talk to Sebastian Peck of InMotion Ventures – Tech.eu

Will AI empower scientists or replace them? – Techerati

Genesis Therapeutics raises $52M A round for its AI-focused drug discovery mission – TechCrunch

‘Stunning advance’ on ‘protein folding’: A 50-year-old science problem solved and that could mean big things – USA TODAY

Image of the Month: The right place of human Man1b1 – Baylor College of Medicine News

Real Progress In Crowdsourcing Scientific Tasks To Gamers – Bio-IT World

Angelika Amon, cell biologist who pioneered research on chromosome imbalance, dies at 53 – MIT News

If AlphaFold Is a Product of Design, Maybe Our Bodies Are Too – Walter Bradley Center for Natural and Artificial Intelligence

Regulation of chaperone function by coupled folding and oligomerization – Science Advances

Discovery of a previously unknown biosynthetic capacity of naringenin chalcone synthase by heterologous expression of a tomato gene cluster in yeast -…

A computer’s all you need: Folding@Home joins the race to find a COVID-19 cure – The Stanford Daily

Protein domain – Wikipedia

The Cyberlaw Podcast: It’s Time to Pay Attention When Attention Stops Paying – Lawfare

A math problem stumped experts for 50 years. This grad student from Maine solved it in days – The Boston Globe

Scientists discover protein linked to depression and brain disorders – The Irish Times