Q&A: How machine learning helps scientists hunt for particles, wrangle floppy proteins and speed discovery – Stanford University News

Posted on October 4, 2020 by admin

For example, SLAC scientists have already used machine learning techniques tooperate accelerators more efficiently, to speed up thediscovery of new materials, and to uncoverdistortions in space-time caused by astronomical objectsup to 10 million times faster than traditional methods.

The term machine learning broadly refers to techniques that let computers learn by example by inferring their own conclusions from large sets of data, as opposed to following a predetermined set of steps or rules. To take advantage of these techniques, SLAC launched a machine learning initiative in 2019 that involves researchers across virtually all of the labs disciplines.

An accelerator physicist by training, Daniel Ratner has worked to apply machine learning approaches to accelerators at SLAC for many years and now heads up the initiative. In this Q&A, he discusses what machine learning can do and how SLAC is uniquely equipped to advance the use of machine learning in fundamental science research.

Machine learning programs solve tasks by looking for patterns in examples. This is similar to the way that humans learn. So machine learning tends to be effective at tasks that humans are good at, but it's hard to explain why.

For instance, you can teach your teenager how to drive a car by example. But it's hard to write down a set of rules for how to drive a car in every possible situation you might encounter when you're driving. Thats the kind of case where machine learning has been successful. Just by watching someone drive a car for long enough, a machine learning model can begin to learn the rules of driving.

It usually boils down to learning how to do something by watching enough data.

It's a different conceptual approach to problems. Rather than writing a sophisticated computer program for an entire complex data analysis process by hand, machine learning shifts the emphasis to developing a data set of examples and a way to evaluate solutions. At that point, I can hand over my raw data to a machine learning model and train it to predict solutions for new data it hasnt seen before. You can think of it as a way of avoiding that onerous, expensive programming by extracting rules hidden in a data set.

Lets say were doing some high-energy physics, analyzing particle tracks in a detector, which is how particle physicists learn about natures fundamental components. Rather than writing algorithms by hand that manage each part of the analysis removing noise, finding tracks and identifying what all the particles are that created them and building the analysis process up bit by bit, we can just take a big chunk of simulated data and learn how to do that entire analysis pipeline with a single big neural network, a machine learning technique inspired by human neural systems. And in practice, the machine learning methods often out-perform their human-written counterparts.

In science, the area where machine learning has gotten the most press is in data analysis. A typical task would be: I have a big data set and I want to extract some science from it. And certainly we do a lot of that as well at SLAC. But theres actually a lot more that we can do.

Because we run all these big scientific facilities, we think about how machine learning applies not just to data analysis, but to how scientific experiments at these facilities work.

We can use machine learning to address questions like How do I design a new accelerator?, Once Ive built it, how can I run it better? and How can I identify or even predict faults?

For example, were building the next-generation X-ray laser LCLS-II, which will generate terabytes of data per second. A new project led by SLAC will develop machine learning models on the facilitys detectors to analyze this enormous amount of data in real time. This model can be flexible and adapt to the individual needs of every future user of LCLS-II.

Every level of a scientific experiment from design to operations to experimental procedure to data analysis can be changed with machine learning. I think that's a particular emphasis for a place like SLAC, where our bread and butter is running big facilities.

One example is in improving our ability to analyze how a protein molecule changes over time on the atomic level. A protein is a floppy, flexible thing, and that motion is essential to the proteins function. Rather than trying to learn the average structure of a protein by taking a blurred picture of a moving object, we would like to make a movie and actually watch how that molecule is moving. Theres been some very interesting research on using machine learning models to make these protein movies.

As an example of the particle tracking idea mentioned earlier, we have a group at SLAC applying machine learning to neutrino detectors. The task here is to look for very small tracks when particles fly through huge three-dimensional detectors. The scientists have been doing that using something called sparse models, which speed up the learning process by not allocating computational resources to empty space without any tracks. These sparse models are both faster to train and more accurate compared to the standard neural networks developed for the analysis of everyday images.

And it turns out that we can actually use that same concept in very different areas. For example, in materials science, you might want to be able to identify a single atom and ignore the vast area around that atom. So even with different scientific goals in mind, we can use the same boundary-pushing machine learning models. Having the machine learning initiative allows all these different people to talk to each other, share experiences and ideas, and make progress faster.

Ive always been intrigued by the possibility of extracting valuable information from seemingly random data. I think this search for structure in noisy data is what draws me to science in general. Theres also a lot of overlap with the non-machine learning science projects Ive chosen the last 15 years.

Science is a place where you often have large datasets and ask concrete questions, and that is exactly the setup that makes machine learning successful. And machine learning allows us to ask entirely different types of questions that we couldnt before. Thats going to lead to very exciting science.

There are many people at SLAC who have been doing machine learning for a long time. Now were codifying our lab-wide approach to machine learning and providing more structure and support for everyone who wants to apply these new tools in their research. My goal for our initiative is to provide a central locus for people to discuss, collaborate and come up with new ideas and educate themselves. This lets us scale up machine learning efforts across the lab and make everyone more effective. We have this big community of people who are actively using machine learning every day in their science research, and that number is only going to grow.

One of the things we emphasize is that the goal of machine learning is not to do what we're doing today and do it 10% better. We want to do completely new science. We want to do things 10 times better, 100 times better, a million times better. And we want to start seeing examples of that in the next couple of years and enable science at SLAC that wasnt possible before.

Machine learning projects at SLAC are supported by DOEs Office of Science and the Office of Energy Efficiency and Renewable Energy. Machine learning is a DOE priority, and the department recently established an Artificial Intelligence and Technology Office.

For questions or comments, contact the SLAC Office of Communications atcommunications@slac.stanford.edu.

SLAC is a vibrant multiprogram laboratory that explores how the universe works at the biggest, smallest and fastest scales and invents powerful tools used by scientists around the globe. With research spanning particle physics, astrophysics and cosmology, materials, chemistry, bio- and energy sciences and scientific computing, we help solve real-world problems and advance the interests of the nation.

SLAC is operated by Stanford University for the U.S. Department of Energys Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time.

Follow this link:
Q&A: How machine learning helps scientists hunt for particles, wrangle floppy proteins and speed discovery - Stanford University News

How machine learning is bringing National Library of Scotland’s maps to life – The Scotsman

Posted on October 4, 2020 by admin

LifestyleOutdoorsWebsites belonging to Scotland's national records offices hold a treasure trove of data but to get any value from these sites you have to know what you are looking for.

Friday, 2nd October 2020, 9:05 pm

What if machine learning meant that you didn't have to have a definitive starting point and the reams of records in the archives could be explored and enjoyed visually?

That is the vision of Martin Disley who has been creating datasets from across the National Library of Scotland's (NLS) map collection.

His project, which is part of the Creative Informatics Resident Entrepreneur project at the University of Edinburgh, curated datasets of images previously scanned by the NLS to feed a machine learning model.The newly-created machine learning model then creates 'fake' versions of the images that it is trained upon.

The generated output from this process can be animated to produce visions of machines dreaming, in this case the fake maps animated and brought to life. This has the effect of synthesising these large collections down in short videos.

Fake maps and towns can be created from the model and then animated.

When the animation starts with a small town and ends with a large developed town, the viewer can watch the town grow in an organic manner as the model has been trained on how towns of every size grow over time.

He said: "People can view thousands of images online but this can quickly become overwhelming and it is a struggle to get people to get people to engage with the content.

We are working on a tool that will allow users to interact with the model, to be able to control what it produces.

The technology which drives Martin's machine learning model is based on the GAN machine learning architecture which gained national attention when it was used to create the website thispersondoesnotexist.com.

Over 70,000 facial images from Flickr were used to train the model meaning it was able to learn patterns in human face composition and then create new faces.

Martin said: "If you consider maps, you are already starting with a fake. It is a pictorial representation of reality.

The models I have made have learnt the grammar of these maps; you can read these fake maps like you can read any of the originals. You are able to build an internal representation of the map in your head; you can imagine what these places might look like.

Martin said the process of creating his model was one of fine tuning having started with a large dataset and then whittling out the images of maps that were creating bad results.

"When you are training the model you get to see the dataset in motion.

"As I go through the dataset I take out what I don't like and pick points that are producing interesting results.

"The National Library are excited about the potential for the increased public engagement that synthesising these overwhelmingly large collections into visually exciting media might bring

Martin Disley is a participant in Creative Informatics Resident Entrepreneur project which delivered by the University of Edinburgh in partnership with Edinburgh Napier University, CodeBase and Creative Edinburgh and is one of nine programmes across the UK that make up the Creative Industries Clusters Programme, funded by the Arts and Humanities Research Council as part of the UK Governments Industrial strategy. Creative Informatics is part of the Edinburgh and South East Scotland City Region Deal initiative (DDI Programme) and is also supported by the Scottish Funding Council.

Excerpt from:
How machine learning is bringing National Library of Scotland's maps to life - The Scotsman

Global Machine Learning-as-a-Service (MLaaS) Market Size 2020 | Covid-19 Analysis, Trends, Top Key Players, Statistics, Growth Opportunities and…

Posted on October 4, 2020 by admin

Global Machine Learning-as-a-Service (MLaaS) Market report explores the Machine Learning-as-a-Service (MLaaS) industry around the globe offers details about industry review, classification, meaning, and possibility along with key regions and countries. This research report delivers detailed insights on each and every aspect of the Machine Learning-as-a-Service (MLaaS) Market.

Additionally, the research study divided the market on the basis of product types, application as well as end-user industries of Shooting Ranges.A 360 degree summarize of the competitive scenario of the Global Machine Learning-as-a-Service (MLaaS) Market is presented by Reportspedia, The recent study on the Machine Learning-as-a-Service (MLaaS) market Analysis report provides information about this industry with a thorough assessment of this business.

Sample Copy of This [emailprotected]:

https://www.reportspedia.com/report/others/2015-2027-global-machine-learning-as-a-service-(mlaas)-industry-market-research-report,-segment-by-player,-type,-application,-marketing-channel,-and-region/64734#request_sample

Major Players in the Machine Learning-as-a-Service (MLaaS) market are:

SAS Institute Inc.Fair Isaac Corporation (FICO)Sift Science Inc.Amazon Web Services Inc.Google LLCHewlett Packard Enterprise Development LPYottamine Analytics LLCPurePredictive Inc.BigML Inc.IBM Corp.Microsoft Corp.Iflowsoft Solutions Inc.

Machine Learning-as-a-Service (MLaaS) market growth has been segregated into the Americas, APAC, Europe, Middle East & Africa. The Machine Learning-as-a-Service (MLaaS) market size is appropriately divided into pivotal segments in the report. A synopsis of the industry with regards to market size concerning remuneration and volume aspects along with the current Machine Learning-as-a-Service (MLaaS) market shares scenario is also offered in the report.

Ask for [emailprotected]:

https://www.reportspedia.com/discount_inquiry/discount/64734

Types covered in the Machine Learning-as-a-Service (MLaaS) industry are:

Marketing and AdvertisementPredictive MaintenanceAutomated Network ManagementFraud Detection and Risk AnalyticsOther

Applications covered in the report are:

IT and TelecomAutomotiveHealthcareAerospace and DefenseRetailGovernmentBFSIOther End Users

The study wanted to focus on key manufacturers, competitive landscape, and SWOT analysis for the Machine Learning-as-a-Service (MLaaS) industry. Apart from looking into the geographical regions, the report concentrated on key trends and segments that are either driving the enlargement of the industry. Researchers have also focused on individual growth trends besides their contribution to the overall market.

This is probable to drive the Global Machine Learning-as-a-Service (MLaaS) Market over the forecast period. This research report covers the market landscape and its progress prospects in the near future. After study key companies, the report focuses on the new entrant contributing to the enlargement of the market. Most companies in the Global Machine Learning-as-a-Service (MLaaS) Market are currently adopted new technological trends in the market.

Inquiry Before [emailprotected]:

Key highlights of the global Machine Learning-as-a-Service (MLaaS) Market research report:

Some of the key questions answered in this Machine Learning-as-a-Service (MLaaS) Market report:

Table of Contents: Machine Learning-as-a-Service (MLaaS) Market

Chapter 1: Overview of Machine Learning-as-a-Service (MLaaS) Market

Chapter 2: Global Market Status and Forecast by Regions

Chapter 3: Global Machine Learning-as-a-Service (MLaaS) Market Status and Forecast by Types

Chapter 4: Global Machine Learning-as-a-Service (MLaaS) industry Status and Forecast by Downstream Industry

Chapter 5: Machine Learning-as-a-Service (MLaaS) industry Market Driving Factor Analysis

Chapter 6: Market Competition Status by Major Manufacturers

Chapter 7: Major Manufacturers Introduction and Market Data

Chapter 8: Upstream and Downstream Machine Learning-as-a-Service (MLaaS) industry Analysis

Chapter 9: Cost and Gross Margin Analysis

Chapter 10: Marketing Status Analysis

Chapter 11: Machine Learning-as-a-Service (MLaaS) industry Market Report Conclusion

Chapter 12: Research Methodology and Reference

Get ToC for the overview of the premium report @:

Go here to see the original:
Global Machine Learning-as-a-Service (MLaaS) Market Size 2020 | Covid-19 Analysis, Trends, Top Key Players, Statistics, Growth Opportunities and...

Will AI cross the proverbial chasm? Algorithmia resolves the practical pitfalls of machine learning – ZDNet

Posted on October 1, 2020 by admin

"A lot of people in academia are not very good at software engineering," says Kenny Daniel, co-founder and chief technology officer of cloud computing startup Algorithmia. "I always had more of the software engineering bent."

That, in a nutshell, is some of what makes six-year-old, Seattle-based Algorithmia uniquely focused in a world over-run with machine learning offerings.

Amazon, Microsoft, Google, IBM, Salesforce, and other large companies have for some time been offering cut-and-paste machine learning in their cloud services. Why would you want to stray to a small, young company?

No reason, unless that startup had a particular knack for hands-on support of machine learning.

That's the premise of Daniel's firm, founded with Diego Oppenheimer, a graduate of Carnegie Mellon and a veteran of Microsoft. The two became best friends in undergrad at CMU, and when Oppenheimer went to industry, Daniel went to pursue a PhD in machine learning at USC. While researching ML, Daniel realized he wanted to build things more than he wanted to just theorize.

"I had the idea for Algorithmia in grad school," Daniel recalled in an interview with ZDNet. "I saw the struggle of getting the work out into the real world; my colleagues and I were developing state-of-the-art [machine learning] models, but not really getting them adopted in the real world the way we wanted."

He dropped out of USC and hooked up with Oppenheimer to found the company. Oppenheimer had seen from the industry side that even for large companies such as Microsoft, there was a struggle to get enough talent to get things deployed and in production.

The duo initially set out to create an App Store for machine learning, a marketplace in which people could buy and sell ML models, or programs. They got seed funding from venture firm Madrona Ventures, and took up residence in Seattle's Pike Place. "There's a tremendous amount of ML talent out here, and the rents are not as crazy" as Silicon Valley, he explained.

"If companies are not getting the pay-off, if there's a lack of progress, we could be looking at another hype cycle," says Kenny Daniel, CTO and co-founder of machine learning operations service provider Algorithmia.

Their intent was to match up consumers of machine learning, companies that wanted the models, with developers. But Daniel noticed something was breaking down. The majority of customers using the service were consuming machine learning from their own teams. There was little transaction volume because companies were just trying to get stuff to work.

"We said, okay, there's something else going on here: people don't have a great way of turning their models into scalable, production-ready APIs that are highly available and resilient," he recalled having realized.

"A lot of these companies would have data scientists building models in Jupyter on their laptop, and not really having a good way to hook them up to a million iOS apps that are trying to recognize images, or a back-end data pipeline that's trying to process terabytes of data a day."

There was, in other words, "a gap there in software engineering." And so the business shifted from a focus on a marketplace to a focus on providing the infrastructure to make customers' machine learning models scale up.

The company had to solve a lot of the multi-tenant challenges that were fundamental limitations, long before those techniques became mainstream with the big cloud platforms.

Also: How do we know AI is ready to be in the wild? Maybe a critic is needed

"We were running functions before AWS Lambda," says Daniel, referring to Amazon's server-less offering.

Problems such as, "How do you manage GPUs, because GPUs were not built for this kind of thing, they were built to make games run fast, not for multi-tenant users to run jobs on them."

Daniel and Oppenheimer started meeting with big financial and insurance firms, to discuss solving their deployment problems. Training a machine learning model might be fine on AWS. But when it came time to make predictions with the trained model, to put it into production for a high volume of requests, companies were running into issues.

The companies wanted their own instances of their machine learning models in virtual private clouds, on AWS or Azure, with the ability to have dedicated customer support, metrics, management and monitoring.

That lead to the creation of an Algorithmia Enterprise service in 2016. That was made possible by fresh capital, an infusion of $10.5 million from Gradient Ventures, Google's AI investment operation, followed by a $25 million round last summer. In total. Algorithmia has received $37.9 million in funding.

Today, the company has seven-figure deals with large institutions, most of it for running private deployments. You could get something like what Algorithmia offers by using Amazon's SageMaker, for example. But SageMaker is all about using only Amazon's resources. The appeal with Algorithmia is that the deployments will run in multiple cloud facilities, wherever a customer needs machine learning to live.

"A number of these institutions need to have parity across wherever their data is," said Daniel. "You may have data on premise, or maybe you did acquisitions, and things are across multiple clouds; being able to have parity across those is one of the reasons people choose Algorithmia."

Amazon and other cloud giants each tout their offerings as end-to-end services, said Daniel. But that runs counter to reality, which is that there is a soup composed of many technologies that need to be brought together to make ML work.

"In the history of software, there hasn't been a clear end-to-end, be-all winner," Daniel observed. "That's why GitHub, and GitLab, and Bitbucket and all these continue to exist, and there are different CI [continuous integration] systems, and Jenkins, and different deployment systems and different container systems."

"It takes a fair amount of expertise to wire all these things together."

There is some independent support for what Daniel claims. Gartner analyst Arun Chandrasekaran puts Algorithmia in a basket that he calls "ModelOps." The application "life cycle" of artificial intelligence programs,

Chandrasekaran told ZDNet, is different from that of traditional applications, "due to the sheer complexity and dynamism of the environment."

"Most organizations underestimate how long it will take to move AI and ML projects into production."

Also: Recipe for selling software in a pandemic: Be essential, add some machine learning, and focus, focus, focus

Chandrasekaran predicts the market for ModelOps will expand as more and more companies try to deploy AI and run up against the practical hurdles.

While there is the risk that cloud operators will subsume some of what Algorithmia offers, said Chandrasekaran, the need to deploy outside a single cloud supports the role of independent ModelOps vendors such as Algorithmia.

"AI deployments tend to be hybrid, both from the perspective of spanning multiple environments (on-premises, cloud) as well as the different AI techniques that customers may use," he told ZDNet.

Aside from cloud vendors, Algorithmia competitors include Datarobot, H20.ai, RapidMiner, Hydrosphere, Modelop and Seldon.

Some companies may go 100% AWS, conceded Daniel. And some customers may be fine with generic abilities of cloud vendors. For example, Amazon has made a lot of progress with text translation technology as a service, he noted.

But industry-specific, or vertical market machine learning, is something of a different story. One customer of Algorithmia, a large financial firm, needed to deploy an application for fraud detection. "It sounds crazy, but we had to figure out all this stuff of, how do we know this data over here is used to train this model? It's important because its an issue of their [the client's] liability."

The immediate priority for Algorithmia is a new product version called Teams that lets companies organize an invite-only, hosted gathering of those working on a particular model. It can stretch across multiple "federated" instances of a model, said Daniel. The pricing is by compute usage, so it's a pay-as-you-go option, versus the annual billing of the Enterprise version.

Also: AI startup Abacus goes live with commercial deep learning service, takes $13M Series A financing

To Daniel, the gulf that he observed in academia between pure research and software engineering is the thing that has always shot down AI in past. The so-called "AI winter" periods over the decades were in large part a result of the practical obstacles, he believes.

"Those were periods when there was hype for AI and ML, and companies invested a lot of money," he said. "If companies are not getting the pay-off, if there's a lack of progress, we could be looking at another hype cycle."

By contrast, if more companies can be successful in deployment, it may lead to a flourishing of the kind of marketplace that he and Oppenheimer originally envisioned.

"It's like the Unix philosophy, these small things combining, that's the way that I see it," he said. "Ultimately, this will just enable all sorts of things, completely new scenarios, and that's incredibly valuable, things that we can make available in a free market of machine learning."

More:
Will AI cross the proverbial chasm? Algorithmia resolves the practical pitfalls of machine learning - ZDNet

Using Machine Learning To Reduce Carbon Emissions In The Trucking Industry – Forbes

Posted on October 1, 2020 by admin

LONG BEACH, CA - 9/8: Trucks near the Ports of Long Beach and Los Angeles, the busiest port complex ... [+] in the US. In September, a federal judge tentatively denied the largest trucking association from blocking the new anti-pollution measures. The $1.6 billion Clean Trucks Program was developed after years of debate between drivers, city officials, and environmentalists. It targets toxic diesel emissions and bans all 16,800 pre-1989 trucks from the ports. (Photo by David McNew/Getty Images)

In 2018, freight trucks contributed 23% of the greenhouse gas emissions in 2018, and by sector, transportation contributes an overall 28% of those emissions.

Shared truckloads on less than load (LTL) freight, which lets several shippers share space in one full semi-truck, could reduce carbon emissions according to Lu Saenz, VP of Engineering and ProductDevelopment at Flock Freight.

Saenz says theres a huge benefit to adopting the new shared truckload model in the logistics industry.

With the traditional LTL model, freight zigzags through the outdated hub-and-spoke system and is wasteful in its approach both in the time it takes shipments to arrive on shelves and environmentally, but also because items are constantly getting damaged by being taken on and off trucks along the route, said Saenz.

In September 2020, Walmart announced it will require its suppliers and their carriers to deliver all ordersby their must-arrive-by dates 98% of the time or be fined three percent of the cost of the goods.

Saenz says Flock Freight solves the damage and on-time delivery problems within the LTL industry with data science, machine learning in their logistics planning software.

This optimal integration of tech into the freight industry is weve seen [..] a 65% increase in its shared truckload shipping, added Saenz.

The machine learning-based product, FlockDirect, pools less-than-truckload freight consisting of a few pallets together to create full truckloads. It optimizes routes by pooling freight heading in the same direction so that trucks only stop at each drop-off, avoiding traditional terminals.

Saenz says to create shared truckloads, their pooling algorithms sifts through an enormous number of possible shipment permutations to find only those which are feasible to execute and economically advantageous for all parties.

Origin, destination, weight, dimensions, commodity type, scheduling, shipping cost these are just a handful of the numerous shipping constraints that our technology must account for to propose shared truckload pools that will actually work, said Saenz.

Saenz says shared truckload shipping negates the need for carbon-intensive terminals. And, because shared truckload shipments only load and unload once, 99.9% of shipments arrive damage-free, eliminating the environmental harm of remanufacturing and reshipping duplicate goods.

In August 2020, Flock Freight earned a B Corporation certification. Saenz says this reinforces their commitment to sustainable freight shipping. In 2019, the company reduced CO2 from the LTL industry by 4,335 metric tons. And, for 2020, the company has committed to reducing CO2 by 5,000 metric tons.

Flock Freight raised a total of $70.5M to date with the their latest round Bof $50M in early 2020.

Here is the original post:
Using Machine Learning To Reduce Carbon Emissions In The Trucking Industry - Forbes

Differential machine learning: the shape of things to come – Risk.net

Posted on October 1, 2020 by admin

CLICK HERE TO DOWNLOAD THE PDF

Brian Huge and Antoine Savine combine automatic adjoint differentiation with modern machine learning. In addition, they introduce general machinery for training fast, accurate pricing and risk approximations, applicable to arbitrary transactions or trading books, and arbitrary stochastic models, effectively resolving the computational bottlenecks of derivatives risk reports and regulations

Pricing approximation has proved tremendously useful with advanced

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact [emailprotected] or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to print this content. Please contact [emailprotected] to find out more.

You are currently unable to copy this content. Please contact [emailprotected] to find out more.

A machine learning approach to define antimalarial drug action from heterogeneous cell-based screens – Science Advances

Posted on October 1, 2020 by admin

Abstract

Drug resistance threatens the effective prevention and treatment of an ever-increasing range of human infections. This highlights an urgent need for new and improved drugs with novel mechanisms of action to avoid cross-resistance. Current cell-based drug screens are, however, restricted to binary live/dead readouts with no provision for mechanism of action prediction. Machine learning methods are increasingly being used to improve information extraction from imaging data. These methods, however, work poorly with heterogeneous cellular phenotypes and generally require time-consuming human-led training. We have developed a semi-supervised machine learning approach, combining human- and machine-labeled training data from mixed human malaria parasite cultures. Designed for high-throughput and high-resolution screening, our semi-supervised approach is robust to natural parasite morphological heterogeneity and correctly orders parasite developmental stages. Our approach also reproducibly detects and clusters drug-induced morphological outliers by mechanism of action, demonstrating the potential power of machine learning for accelerating cell-based drug discovery.

Cell-based screens have substantially advanced our ability to find new drugs (1). However, most screens are unable to predict the mechanism of action (MoA) of identified hits, necessitating years of follow-up after discovery. In addition, even the most complex screens frequently find hits against cellular processes that are already targeted (2). Limitations in finding new targets are becoming especially important in the face of rising antimicrobial resistance across bacterial and parasitic infections. This rise in resistance is driving increasing demand for screens that can intuitively find new antimicrobials with novel MoAs. Demand for innovation in drug discovery is exemplified in efforts on targeting Plasmodium falciparum, the parasite that causes malaria. Malaria continues to be a leading cause of childhood mortality, killing nearly half a million children each year (3). Drug resistance has emerged to every major antimalarial to date including rapidly emerging resistance to frontline artemisinin-based combination therapies (4). While there is a healthy pipeline of developmental antimalarials, many target common processes (5) and may therefore fail quickly because of prevalent cross-resistance. Thus, solutions are urgently sought for the rapid identification of new drugs that have a novel MoA at the time of discovery.

Parasite cell morphology within the human contains inherent MoA-predictive capacity. Intracellular parasite morphology can distinguish broad stages along the developmental continuum of the asexual parasite (responsible for all disease pathology). This developmental continuum includes early development (early and late ring form), feeding (trophozoite), genome replication or cell division (schizont), and extracellular emergence [merozoite; see (6) for definitions]. Hence, drugs targeting a particular stage should manifest a break in the continuum. Morphological variation in the parasite cell away from the continuum of typical development may also aid drug MoA prediction if higher information granularity can be generated during a cell-based screen. Innovations in automated fluorescence microscopy have markedly expanded available data content in cell imaging (7). By using multiple intracellular markers, an information-rich landscape can be generated from which morphology, and, potentially, drug MoA can be deduced. This increased data content is, however, currently inaccessible both computationally and because it requires manual expert-level analysis of cell morphology. Thus, efforts to use cell-based screens to find drugs and define their MoA in a time-efficient manner are still limited.

Machine learning (ML) methods offer a powerful alternative to manual image analysis, particularly deep neural networks (DNNs) that can learn to represent data succinctly. To date, supervised ML has been the most successful application for classifying imaging data, commonly based on binning inputs into discrete, human-defined outputs. Supervised methods using this approach have been applied to study mammalian cell morphologies (8, 9) and host-pathogen interactions (10). However, discrete outputs are poorly suited for capturing a continuum of morphological phenotypes, such as those that characterize either malaria parasite development or compound-induced outliers, since it is difficult or impossible to generate labels of all relevant morphologies a priori. A cell imaging approach is therefore needed that can function with minimal discrete human-derived training data before computationally defining a continuous analytical space, which mirrors the heterogeneous nature of biological space.

Here, we have created a semi-supervised model that discriminates diverse morphologies across the asexual life cycle continuum of the malaria parasite P. falciparum. By receiving input from a deep metric network (11) trained to represent similar consumer images as nearby points in a continuous coordinate space (an embedding), our DNN can successfully define unperturbed parasite development with a much finer information granularity than human labeling alone. The same DNN can quantify antimalarial drug effects both in terms of life cycle distribution changes [e.g., killing specific parasite stage(s) along the continuum] and morphological phenotypes or outliers not seen during normal asexual development. Combining life cycle and morphology embeddings enabled the DNN to group compounds by their MoA without directly training the model on these morphological outliers. This DNN analysis approach toward cell morphology therefore addresses the combined needs of high-throughput cell-based drug discovery that can rapidly find new hits and predict MoA at the time of identification.

Using ML, we set out to develop a high-throughput, cell-based drug screen that can define cell morphology and drug MoA from primary imaging data. From the outset, we sought to embrace asynchronous (mixed stage) asexual cultures of the human malaria parasite, P. falciparum, devising a semi-supervised DNN strategy that can analyze fluorescence microscopy images. The workflow is summarized in Fig. 1 (A to C).

(A) To ensure all life cycle stages were present during imaging and analysis, two transgenic malaria cultures, continuously expressing sfGFP, were combined (see Materials and Methods); these samples were incubated with or without drugs before being fixed and stained for automated multichannel high-resolution, high-throughput imaging. Resulting datasets (B) contained parasite nuclei (blue), cytoplasm (not shown), and mitochondrion (green) information, as well as the RBC plasma membrane (red) and brightfield (not shown). Here, canonical examples of a merozoite, ring, trophozoite, and schizont stage are shown. These images were processed for ML analysis (C) with parasites segregated from full field of views using the nuclear stain channel, before transformation into embedding vectors. Two networks were used; the first (green) was trained on canonical examples from human-labeled imaging data, providing MLderived labels (pseudolabels) to the second semi-supervised network (gray), which predicted life cycle stage and compound phenotype. Example images from human-labeled datasets (D) show that disagreement can occur between human labelers when categorizing parasite stages (s, schizont; t, trophozoite; r, ring; m, merozoite). Each thumbnail image shows (from top left, clockwise) merged channels, nucleus staining, cytoplasm, and mitochondria. Scale bar, 5 m.

The P. falciparum life cycle commences when free micron-sized parasites (called merozoites; Fig. 1B, far left) target and invade human RBCs. During the first 8 to 12 hours after invasion, the parasite is referred to as a ring, describing its diamond ringlike morphology (Fig. 1B, left). The parasite then feeds extensively (trophozoite stage; Fig. 1B, right), undergoing rounds of DNA replication and eventually divides into ~20 daughter cells (the schizont-stage; Fig. 1B, far right), which precedes merozoite release back into circulation (6). This discrete categorization belies a continuum of morphologies between the different stages.

The morphological continuum of asexual development represents a challenge when teaching ML models, as definitions of each stage will vary between experts (Fig. 1D and fig. S1). To embrace this, multiple human labels were collected. High-resolution three-dimensional (3D) images of a 3D7 parasite line continuously expressing superfolder green fluorescent protein (sfGFP) in the cytoplasm (3D7/sfGFP) were acquired using a widefield fluorescence microscope (see Materials and Methods), capturing brightfield DNA [4,6-diamidino-2-phenylindole (DAPI), cytoplasm (constitutively expressed sfGFP), mitochondria (MitoTracker abbreviated subsequently to MITO)], and the RBC membrane [fluorophore-conjugated wheat germ agglutinin (WGA)]. 3D datasets were converted to 2D images using maximum intensity projection. Brightfield was converted to 2D using both maximum and minimum projection, resulting in six channels of data for the ML. Labels (5382) were collected from human experts, populating the categories of ring, trophozoite, schizont, merozoite, cluster-of-merozoites (multiple extracellular merozoites attached after RBC rupture), or debris. For initial validation and as a test of reproducibility between experts, an additional 448 parasites were collected, each labeled by five experts (Fig. 1D).

As demonstrated (Fig. 1D and fig. S1A), human labelers show some disagreement, particularly between ring and trophozoite stages. This disagreement is to be expected, with mature ring stage and early trophozoite stage images challenging to define even for experts. When comparing the human majority vote versus the model classification (fig. S1B and note S1), some disagreement was seen, particularly for human-labeled trophozoites being categorized as ring stages by the ML algorithm.

Image patches containing parasites within the RBC or after merozoite release were transformed into input embeddings using the deep metric network architecture originally trained on consumer images (11) and previously shown for microscopy images (12). Embeddings are vectors of floating point numbers representing a position in high-dimensional space, trained so related objects are located closer together. For our purposes, each image channel was individually transformed into an embedding of 64 dimensions before being concatenated to yield one embedding of 384 dimensions per parasite image.

Embeddings generated from parasite images were next transformed using a two-stage workflow to represent either on-cycle (for mapping the parasite life cycle continuum) or off-cycle effects (for mapping morphology or drug induced outliers). Initially, an ensemble of fully connected two-layer DNN models was trained on the input embeddings to predict the categorical human life cycle labels for dimethyl sulfoxide (DMSO) controls. DMSO controls consisted of the vehicle liquid for drug treatments (DMSO) being added to wells containing no drugs. For consistency, the volume of DMSO was normalized in all wells to 0.5%. This training gave the DNN robustness to control for sample heterogeneity and, hence, sensitivity for identifying unexpected results (outliers). The ensemble was built from three pairs of fully supervised training conditions (six total models). Models only differed in the training data they received. Each network pair was trained on separate (nonoverlapping) parts of the training data, providing an unbiased estimate of the model prediction variance.

After initial training, the supervised DNN predicted its own labels (i.e., pseudolabels) for previously unlabeled examples. As with human-derived labels, DNN pseudolabeling was restricted to DMSO controls (with high confidence) to preserve the models sensitivity to off-cycle outliers (which would not properly fit into on-cycle outputs). High confidence was defined as images given the same label prediction from all six models and when all models were confident of their own prediction (defined as twice the probability of selecting the correct label at random). This baseline random probability is a fixed number for a dataset or classification and provided a suitable baseline for model performance.

A new ensemble of models was then trained using the combination of human-derived labels and DNN pseudolabels. The predictions from this new ensemble were averaged to create the semi-supervised model.

The semi-supervised model was first used to represent the normal (on-cycle) life cycle continuum. We selected the subset of dimensions in the unnormalized final prediction layer that corresponded to merozoites, rings, trophozoites, and schizonts. This was projected into 2D space using principal components analysis (PCA) and shifted such that its centroid was at the origin. This resulted in a continuous variable where angles represent life cycle stage progression, referred to as Angle-PCA. This Angle-PCA approach permitted the full life cycle to be observed as a continuum with example images despite data heterogeneity (Fig. 2A and fig. S2) and 2D projection (Fig. 2B) following the expected developmental order of parasite stages. This ordered continuum manifested itself without specific constraints being imposed, except those provided by the categorical labels from human experts (see note S2).

After learning from canonical human-labeled parasite images (for examples, please see Fig. 1B) and filtering debris and other outliers, the remaining life cycle data from asynchronous cultures was successfully ordered by the model. The parasites shown are randomly selected DMSO control parasites from multiple imaging runs, sorted by Angle PCA (A). The colored, merged images show RBC membrane (red), mitochondria (green), and nucleus (blue). For a subset of parasites on the right, the colored, merged image plus individual channels are shown: (i) merged, (ii) brightfield minimum projection, (iii) nucleus, (iv) cytoplasm, (v) mitochondria, and (vi) RBC membrane (brightfield maximum projection was also used in ML but is not shown here). The model sorts the parasites in life cycle stage order, despite heterogeneity of signal due to nuances such as imaging differences between batches. The order of the parasites within the continuum seen in (A) is calculated from the angle within the circle created by projecting model outputs using PCA, creating a 2D scatterplot (B). This represents a progression through the life cycle stages of the parasite, from individual merozoites (purple) to rings (yellow), trophozoites (green), schizonts (dark green), and finishing with a cluster of merozoites (blue). The precision-recall curve (C) shows that human labelers and the model have equivalent accuracy in determining the earlier/later parasite in pairs. The consensus of the human labelers was taken as ground truth, with individual labelers (orange) agreeing with the consensus on 89.5 to 95.8% of their answers. Sweeping through the range of too close to call values with the ML model yields the ML curve shown in black. Setting this threshold to 0.11 radians, the median angle variance across the individual models used in the ensemble yields the blue dot.

To validate the accuracy of the continuous life cycle prediction, pairs of images were shown to human labelers to define their developmental order (earlier/later) with the earliest definition being the merozoite stage. Image pairs assessed also included those considered indistinguishable (i.e., too close to call). Of the 295 pairs selected for labeling, 276 measured every possible pairing between 24 parasites, while the remaining 19 pairs were specifically selected to cross the trophozoite/schizont boundary. Human expert agreement with the majority consensus was between 89.5 and 95.8% (note S3), with parasite pairs called equal (too close to call) to 25.7 to 44.4% of the time. These paired human labels had more consensus than the categorical (merozoite, ring, trophozoite, and schizont) labels that had between 60.9 and 78.4% agreement between individual human labels and the majority consensus.

The Angle-PCA projection results provide an ordering along the life cycle continuum, allowing us to compare this sort order to that by human experts. With our ensemble of six models, we could also evaluate the consensus and variation between angle predictions for each example. The consensus between models for relative angle between two examples was greater than 96.6% (and an area under the precision-recall curve score of 0.989; see note S4 for definition), and the median angle variation across all labeled examples was 0.11 radians. The sensitivity of this measurement can be tuned by selecting a threshold for when two parasites are considered equal, resulting in a precision-recall curve (Fig. 2C). When we use the median angle variation of the model as the threshold for examples that are too close to call, we get performance (light blue point) that is representative of the human expert average. These results demonstrate that our semi-supervised model successfully identified and segregated asynchronous parasites and infected RBCs from images that contain >90% uninfected RBCs (i.e., <10% parasitaemia) and classifies parasite development logically along the P. falciparum asexual life cycle.

Having demonstrated the semi-supervised model can classify asynchronous life cycle progression consistently with fine granularity, the model was next applied to quantify on-cycle differences (i.e., life cycle stage-specific drug effects) in asynchronous, asexual cultures treated with known antimalarial drugs. Two drug treatments were initially chosen that give rise to aberrant cellular development: the ATP4ase inhibitor KAE609 (also called Cipargamin) (13) and the mitochondrial inhibiting combinational therapy of atovaquone and proguanil (14) (here referred to as Ato/Pro). KAE609 reportedly induces cell swelling (15), while Ato/Pro reduces mitochondrial membrane potential (16). Drug treatments were first tested at standard screening concentrations (2 M) for two incubation periods (6 and 24 hours). Next, drug dilutions were carried out to test the semi-supervised models sensitivity to lower concentrations using half-median inhibitory concentrations (IC50s) of each compound (table S1). IC50 and 2 M datasets were processed through the semi-supervised model and overlaid onto DMSO control data as a histogram to explore on-cycle drug effects (Fig. 3). KAE609 treatment exhibited a consistent skew toward ring stage parasite development (8 to 12 hours after RBC invasion; Fig. 3) without an increase within this stage of development, while the Ato/Pro treatment led to reduced trophozoite stages (~12 to 30 hours after RBC invasion; Fig. 3). This demonstrates that the fine-grained continuum has the sensitivity to detect whether drugs affect specific stages of the parasite life cycle.

Asynchronous Plasmodium falciparum cultures were treated with the ATPase4 inhibitor KAE609 or the combinational MITO treatment of atovaquone and proguanil (Ato/Pro) with samples fixed and imaged 6 (A) and 24 (B) hours after drug additions. Top panels show histograms indicating the number of parasites across life cycle continuum. Compared to DMSO controls (topmost black histogram), both treatments demonstrated reduced parasite numbers after 24 hours. Shown are four drug/concentration treatment conditions: low-dose Ato/Pro (yellow), high-dose Ato/Pro (orange), low-dose KAE609 (light blue), and high-dose KAE609 (dark blue). Box plots below demonstrate life cycle classifications in the DMSO condition of images from merozoites (purple) to rings (yellow), trophozoites (green), and finishing with schizonts (dark green).

The improved information granularity was extended to test whether the model could identify drug-based morphological phenotypes (off-cycle) toward determination of MoA. Selecting the penultimate 32-dimensional layer of the semi-supervised model meant that, unlike the Angle-PCA model, outputs were not restricted to discrete on-cycle labels but instead represented both on- and off-cycle changes. This 32-dimensional representation is referred to as the morphology embedding.

Parasites were treated with 1 of 11 different compounds, targeting either PfATP4ase (ATP4) or mitochondria (MITO) and DMSO controls (table S1). The semi-supervised model was used to evaluate three conditions: random, where compound labels were shuffled; Angle-PCA, where the two PCA coordinates are used; and full embedding, where the 32-dimensional embedding was combined with the Angle-PCA. To add statistical support that enables compound level evaluation, a bootstrapping of the analysis was performed, sampling a subpopulation of parasites 100 times.

As expected, the randomized labels led to low accuracy (Fig. 4A), serving as a baseline for the log odds (probability). When using the 2D Angle-PCA (on-cycle) information, there was a significant increase over random in the log odds ratio (Fig. 4A). This represents the upper-bound information limit for binary live/dead assays due to their insensitivity to parasite stages. When using the combined full embedding, there was a significant log odds ratio increase over both the random and Angle-PCA conditions (Fig. 4A). To validate that this improvement was not a consequence of having a larger dimensional space compared to the Angle-PCA, an equivalent embedding from the fully supervised model trained only on expert labels (and not on pseudolabels) demonstrated approximately the same accuracy and log odds ratio as Angle-PCA. Thus, our semi-supervised model can create an embedding sensitive to the phenotypic changes under distinct MoA compound treatment.

To better define drug effect on Plasmodium falciparum cultures, five mitochondrial (orange text) and five PfATP4ase (blue text) compounds were used; after a 24-hour incubation, images were collected and analyzed by the semi-supervised model. To test performance, various conditions were used (A). For random, images and drug names were scrambled, leading to the model incorrectly grouping compounds based on known MoA (B). Using life cycle stage definition (as with Fig. 3), the model generated improved grouping of compounds (C) versus random. Last, by combining the life cycle stage information with the penultimate layer (morphological information, before life cycle stage definition) of the model, it led to correct segregation of drugs based on their known MoA (D).

To better understand drug MoA, we evaluated how the various compounds were grouped together by the three approaches (random, Angle-PCA, and morphology embedding), performing a hierarchical linkage dendrogram (Fig. 4, B to D). The random approach shows that, as expected, the different compounds do not reveal MoA similarities. For the Angle-PCA output, the MITO inhibitors atovaquone and antimycin are grouped similarly, but the rest of the clusters are a mixture of compounds from the two MoA groups. Last, the morphology embedding gave rise to an accurate separation between the two groups of compounds having different MoA. One exception for grouping was atovaquone (when used alone), which was found to poorly cluster with either group (branching at the base of the dendrogram; Fig. 4D). This result is likely explained by the drug dosages used, as atovaquone is known to have a much enhanced potency when used in combination with proguanil (16).

The semi-supervised model was able to consistently cluster MITO inhibitors away from ATP4ase compounds in a dimensionality that suggested a common MoA. Our semi-supervised model can therefore successfully define drug efficacy in vitro and simultaneously assign a potential drug MoA from asynchronous (and heterogeneous) P. falciparum parasite cultures using an imaging-based screening assay with high-throughput capacity.

Driven by the need to accelerate novel antimalarial drug discovery with defined MoA from phenotypic screens, we applied ML to images of asynchronous P. falciparum cultures. This semi-supervised ensemble model could identify effective drugs and cluster them according to MoA, based on life cycle stage (on-cycle) and morphological outliers (off-cycle).

Recent image-based ML approaches have been applied to malaria cultures but have, however, focused on automated diagnosis of gross parasite morphologies from either Giemsa- or Leishman-stained samples (1719), rather than phenotypic screening for drug MoA. ML of fluorescence microscopy images have reported malaria identification of patient-derived blood smears (20) and the use of nuclear and mitochondrial specific dyes for stage categorization and viability (21), although the algorithmic approach did not include deep learning. Previous unsupervised and semi-supervised ML approaches have been applied to identify phenotypic similarities in other biological systems, such as cancer cells (12, 2224), but none have addressed the challenge of capturing the continuum of biology within the heterogeneity of control conditions. We therefore believe our study represents a key milestone in the use of high-resolution imaging data beyond diagnostics to predict the life cycle continuum of a cell type (coping with biological heterogeneity), as well as using this information to indicate drug-induced outliers and successfully group these toward drug MoA.

Through semi-supervised learning, only a small number of human-derived discrete but noisy labels from asynchronous control cultures were required for our DNN method to learn and distribute data as a continuous variable, with images following the correct developmental order. By reducing expert human input, which can lead to image identification bias (see note S2), this approach can control for interexpert disagreement and is more time efficient. This semi-supervised DNN therefore extends the classification parameters beyond human-based outputs, leading to finer information granularity learned from the data automatically through pseudolabels. This improved information, derived from high-resolution microscopy data, permits the inclusion of subtle but important features to distinguish parasite stages and phenotypes that would otherwise be unavailable.

Our single model approach was trained on life cycle stages through embedding vectors, whose distribution allows identification of two readouts, on-cycle (sensitive to treatments that slow the life cycle or kill a specific parasite stage) and off-cycle (sensitive to treatments that cluster away from control distributions). We show that this approach with embeddings was sensitive to stage-specific effects at IC50 drug concentrations (Fig. 3), much lower than standard screening assays. Drug-based outliers were grouped in a MoA-dependent manner (Fig. 4), with data from similar compounds grouped closer than data with unrelated mechanisms.

The simplicity of fluorescence imaging means that this method could be applied to different subcellular parasite features, potentially improving discrimination of cultures treated with other compounds. In addition, imaging the sexual (gametocyte) parasite stages with and without compound treatments will build on the increasing need for drugs, which target multiple stages of the parasite life cycle (25). Current efforts to find drugs targeting the sexual stages of development are hampered by the challenges of defining MoA from a nonreplicating parasite life cycle stage (25). This demonstrates the potential power of a MoA approach, applied from the outset of their discovery, simply based on cell morphology.

In the future, we envisage that on-cycle effects could elucidate the power of combinational treatments (distinguishing treatments targeting different life cycle stages) for a more complete therapy. Using off-cycle, this approach could identify previously unidentified combinational treatments based on MoA. Because of the sample preparation simplicity, this approach is also compatible with using drug-resistant parasite lines.

New drugs against malaria are seen as a key component of innovation required to bend the curve toward the diseases eradication or risk a return to premillennium rates (3, 26). Seen in this light, application of ML-driven screens should enable the rapid, large-scale screening and identification of drugs with concurrent determination of predicted MoA. Since ML-identified drugs will start from the advanced stage of predicted MoA, these should bolster the much-needed development of new chemotherapeutics for the fight against malaria.

To generate parasite line 3D7/sfGFP, 3D7 ring stages were transfected with both plasmids pkiwi003 (p230p-bsfGFP) and pDC2-cam-co.Cas9-U6.2-hDHFR _P230p (50 g each; fig. S3) following standard procedures (27) and selected on 4 nM WR99210 (WR) for 10 days. pDC2-cam-co.Cas9-U6.2-hDHFR _P230p encodes for Cas9 and the guide RNA for the P230p locus. pkiwi003 comprises the repair sequence to integrate into the P230p locus after successful double-strand break induced by the Cas9. pkiwi003 (p230p-bsfGFP) was obtained by inserting two polymerase chain reaction (PCR) fragments both encoding parts of P230p (PF3D7_0208900) consecutively into the pBluescript SK() vector with Xho I/Hind III and Not I/Sac I, respectively. sfGFP together with the hsp70 (bip) 5 untranslated region was PCR-amplified from pkiwi002 and cloned into pkiwi003 with Hind III/Not I. pkiwi002 is based on pBSp230pDiCre (28), where the FRB (binding domain of the FKBP12rapamycin-associated protein) and Cre60 cassette (including promoter and terminator) was removed with Afe I/Spe I, and the following linkers inserted are as follows: L1_F cctttttgcccccagcgctatataactagtACAAAAAAGTATCAAG and L1_R CTTGATACTTTTTTGTactagttatatagcgctgggggcaaaaagg. In a second step, FKBP (the immunophilin FK506-binding protein) and Cre59 were removed with Nhe I/Pst I and replaced by sfGFP, which was PCR-amplified from pCK301 (29). pDC2-cam-co.Cas9-U6.2-hDHFR _P230p was obtained by inserting the guide RNA (AGGCTGATGAAGACATCGGG) into pDC2-cam-co.Cas9-U6.2-hDHFR (30) with Bbs I. Integration of pkiwi003 into the P230p locus was confirmed by PCR using primers #99 (ACCATCAACATTATCGTCAG), #98 (TCTTCATCAGCCTGGTAAC), and #56 (CATTTACACATAAATGTCACAC; fig. S3).

The transgenic 3D7/sfGFP P. falciparum asexual parasites were cultured at 37C (with a gas mixture of 90% N2, 5% O2, and 5% CO2) in human O+ erythrocytes under standard conditions (31), with RMPI-Hepes medium supplemented with 0.5% AlbuMAX-II. Two independent stocks (culture 1 and culture 2; Fig. 1A) of 3D7/sfGFP parasites were maintained in culture and synchronized separately with 5% d-sorbitol on consecutive days to ensure acquisition of all stages of the asexual cycle on the day of sample preparation. Samples used for imaging derived from cultures harboring an approximate 1:1:1 ratio of rings, trophozoites, and schizonts, with a parasitaemia around 10%.

Asexual cultures were diluted 50:50 in fresh media before 50 nM MitoTracker CMXRos (Thermo Fisher Scientific) was added for 20 min at 37C. Samples were then fixed in phosphate-buffered saline (PBS) containing 4% formaldehyde and 0.25% glutaraldehyde and placed on a roller at room temperature, protected from light for 20 min. The sample was then washed 3 in PBS before 10 nM DAPI, and WGA (5 g/ml) conjugated to Alexa Fluor 633 was added for 10 min and protected from light. The sample was then washed 1 in PBS and diluted 1:30 in PBS before pipetting 100 l into each well of a CellVis (Mountain View, CA) 96-well plate.

Samples were imaged using a Nikon Ti-Eclipse widefield microscope and Hamamatsu electron multiplying charge-coupled device camera, with a 100 Plan Apo 1.4 numerical aperture (NA) oil objective lens (Nikon); the NIS-Elements JOBS software package (Nikon) was used to automate the plate-based imaging. The five channels [brightfield, DNA (DAPI), cytoplasm (sfGFP-labeled), mitochondria (MitoTracker or MITO), and RBC (WGA-633)] were collected serially at Nyquist sampling as a 6-m z-stack, with fluorescent excitation from the CoolLED light source. To collect enough parasite numbers per treatment, 32 fields of view (sites) were randomly generated and collected within each well, with treatments run in technical triplicate. Data were saved directly onto an external hard drive for short-term storage and processing (see below).

The 3D images were processed via a custom macro using ImageJ and transformed into 2D maximum intensity projection images. Brightfield channels were also projected using the minimum intensity projection as this was found to improve analysis of the food vacuole and anomalies including double infections. Converting each whole-site image to per-parasite embedding vectors was performed as previously described (12), with some modifications: The Otsu threshold was set to the minimum of the calculated threshold or 1.25 of the foreground mean of the image, and centers closer than 100 pixels were pruned. Each channel image was separately fed as a grayscale image into the deep metric network for conversion into a 64-dimension embedding vector. The six embedding vectors (one from each fluorescent channel and both minimum and maximum projections of the brightfield channel) were concatenated to yield a final 384 dimension embedding vector.

All labels were collected using the annotation tool originally built for collecting diabetic retinopathy labels (32). For each set of labels gathered, tiled images were stitched together to create a collage for all parasites to be labeled. These collages contained both stains in grayscale and color overlays to aid identification. Collages and a set of associated questions were uploaded to the annotation tool, and human experts (Imperial College London) provided labels (answers). In cases where multiple experts labeled the same image, a majority vote was used to determine the final label.

Initial labels for training classified parasites into 1 of 11 classes: merozoite, ring, trophozoite, schizont, cluster of merozoites, multiple infection, bad image, bad patch (region of interest) location, parasite debris, unknown parasite inside an RBC, or other. Subsequent labels were collected with parasite debris classified further into the following: small debris remnant, cluster of debris, and death inside a RBC (table S2). For training, the following labels were dropped: bad image, bad patch location, unknown parasite inside an RBC, unspecified parasite debris, and other. For these labels, five parasites were randomly sampled from each well of experiments.

To validate the model performance, an additional 448 parasites were labeled by five experts. The parasites were selected from eight separate experimental plates using only control image data (DMSO only).

Last, paired labels were collected to validate the sort-order results. For these labels, the collage included two parasites, and experts identified which parasite was earlier in the life cycle or whether the parasites were too close to call. Here, data from the 448 parasite validation set were used, limited to cases where all experts agreed that the images were of a parasite inside an RBC. From this set, 24 parasites were selected, and all possible pairings of these 24 parasites were uploaded as questions (24 choose 2 = 276 questions uploaded). In addition, another 19 pairs were selected that were near the trophozoite/schizont boundary to enable angle resolution analysis.

To prepare the data for analysis, the patch embeddings were first joined with the ground truth labels for patches with labels. Six separate models were trained on embeddings to classify asexual life cycle stages and normal anomalies such as multiple infection, cell death, and cellular debris. Each model was a two-layered (64 and 32 dimensions), fully connected (with ReLu nonlinearities) neural network. To create training data for each of the six models, human-labeled examples were partitioned so that each example within a class is randomly assigned to one of four partitions. Each partition is a split of the data with example images randomly placed into a partition (subject to the constraint that it is balanced for each life cycle category). Each model was then trained on one of the six ways to select a pair from the four partitions. Training was carried out with a batch size of 128 for 1000 steps using the Adam optimizer (33) with a learning rate of 2 104. Following the initial training, labels were predicted on all unlabeled data using all six models, and for each class, 400 examples were selected with the highest mean probability (and at least a mean probability of 0.4) and with an SD of the probability less than 0.07 (which encompasses the majority of the predictions with labels). The training procedure was repeated with the original human labels and predicted (pseudo-) labels to generate our final model. The logits are extracted from the trained model, and a subspace representing the normal life cycle stages is projected using 2D by PCA. The life cycle angle is computed as arctan(y/x), where x and are the first and second coordinates of the projection, respectively.

For each drug with a certain dose and application duration, the evaluation of its effect is based on the histogram of the classified asexual life cycle stages, and finer binned stages obtained from the estimated life cycle angle. A breakdown of labeled images for drug morphologies is given in table S3.

WHO, World Malaria Report (Geneva, 2019).

J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, Y. Wu, Learning Fine-Grained Image Similarity with Deep Ranking, paper presented at the Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2014), pp. 13861393.

View original post here:
A machine learning approach to define antimalarial drug action from heterogeneous cell-based screens - Science Advances

Comprehensive Report on Artificial Intelligence and Machine Learning Market 2020 | Size, Growth, Demand, Opportunities & Forecast To 2026 |…

Posted on October 1, 2020 by admin

Artificial Intelligence and Machine Learning Market research report is the new statistical data source added by A2Z Market Research.

Artificial Intelligence and Machine Learning Market is growing at a High CAGR during the forecast period 2020-2026. The increasing interest of the individuals in this industry is that the major reason for the expansion of this market.

Artificial Intelligence and Machine Learning Market research is an intelligence report with meticulous efforts undertaken to study the right and valuable information. The data which has been looked upon is done considering both, the existing top players and the upcoming competitors. Business strategies of the key players and the new entering market industries are studied in detail. Well explained SWOT analysis, revenue share and contact information are shared in this report analysis.

Get Sample Copy (Including FULL TOC, Graphs and Tables) of this report @:

https://www.a2zmarketresearch.com/sample?reportId=321664

Note In order to provide more accurate market forecast, all our reports will be updated before delivery by considering the impact of COVID-19.

Top Key Players Profiled in this report are:

GoogleInc., Microsoft Corporation, Amazon Web Services Inc., SAS Institute Inc., SAP SE, BaiduInc., IBM Corporation, Hewlett Packard Enterprise Development LP (HPE), BigMLInc., Fair Isaac Corporation, Intel Corporation

The key questions answered in this report:

Various factors are responsible for the markets growth trajectory, which are studied at length in the report. In addition, the report lists down the restraints that are posing threat to the global Artificial Intelligence and Machine Learning market. It also gauges the bargaining power of suppliers and buyers, threat from new entrants and product substitute, and the degree of competition prevailing in the market. The influence of the latest government guidelines is also analyzed in detail in the report. It studies the Artificial Intelligence and Machine Learning markets trajectory between forecast periods.

Get up to 30% Discount on this Premium Report @:

https://www.a2zmarketresearch.com/discount?reportId=321664

The cost analysis of the Global Artificial Intelligence and Machine Learning Market has been performed while keeping in view manufacturing expenses, labor cost, and raw materials and their market concentration rate, suppliers, and price trend. Other factors such as Supply chain, downstream buyers, and sourcing strategy have been assessed to provide a complete and in-depth view of the market. Buyers of the report will also be exposed to a study on market positioning with factors such as target client, brand strategy, and price strategy taken into consideration.

Global Artificial Intelligence and Machine Learning Market Segmentation:

Market Segmentation by Type:

Deep LearningNatural Language ProcessingMachine VisionOthers

Market Segmentation by Application:

BFSIHealthcare and Life SciencesRetailTelecommunicationGovernment and DefenseManufacturingEnergy and UtilitiesOthers

The report provides insights on the following pointers:

Table of Contents

Global Artificial Intelligence and Machine Learning Market Research Report 2020 2026

Chapter 1 Artificial Intelligence and Machine Learning Market Overview

Chapter 2 Global Economic Impact on Industry

Chapter 3 Global Market Competition by Manufacturers

Chapter 4 Global Production, Revenue (Value) by Region

Chapter 5 Global Supply (Production), Consumption, Export, Import by Regions

Chapter 6 Global Production, Revenue (Value), Price Trend by Type

Chapter 7 Global Market Analysis by Application

Chapter 8 Manufacturing Cost Analysis

Chapter 9 Industrial Chain, Sourcing Strategy and Downstream Buyers

Chapter 10 Marketing Strategy Analysis, Distributors/Traders

Chapter 11 Market Effect Factors Analysis

Chapter 12 Global Artificial Intelligence and Machine Learning Market Forecast

Buy Exclusive Report @:

https://www.a2zmarketresearch.com/buy?reportId=321664

If you have any special requirements, please let us know and we will offer you the report as you want.

About A2Z Market Research:

The A2Z Market Research library provides syndication reports from market researchers around the world. Ready-to-buy syndication Market research studies will help you find the most relevant business intelligence.

Our Research Analyst Provides business insights and market research reports for large and small businesses.

The company helps clients build business policies and grow in that market area. A2Z Market Research is not only interested in industry reports dealing with telecommunications, healthcare, pharmaceuticals, financial services, energy, technology, real estate, logistics, F & B, media, etc. but also your company data, country profiles, trends, information and analysis on the sector of your interest.

Roger Smith

1887 WHITNEY MESA DR HENDERSON, NV 89014

[emailprotected]

+1 775 237 4147

See original here:
Comprehensive Report on Artificial Intelligence and Machine Learning Market 2020 | Size, Growth, Demand, Opportunities & Forecast To 2026 |...

Machine Learning Market 2020 Will Emerge Globally And Grow upto 44.3% of CAGR by 2026 – The Daily Chronicle

Posted on October 1, 2020 by admin

The Global Machine Learning Market size is projected to reach USD 33.4 Bn by 2026 from USD 1.7 Bn in 2018, at a CAGR of 44.3% during the forecast period.

The Machine LearningMarket Research Report helps out market players to improve their business plans and ensure long-term success. The extensive research study provides in-depth information on Global Innovations, New Business Techniques, SWOT Analysis with Key Players, Capital Investment, Technology Innovation, and Future Trends Outlook.

Browes complete Report and Toc,https://www.alltheresearch.com/report/285/Machine-Learning

The market research study covers historical data of previous years along with a forecast of upcoming years based on revenue (USD million). The Machine Learning Market reports also cover market dynamics, market overview, segmentation, market drivers, and restraints together with the impact they have on the Machine Learningdemand over the forecast period. Moreover, the report also delivers the study of opportunities available in the Machine Learningmarket globally. The Machine Learningmarket report study and forecasts is based on a worldwide and regional level.

If you are investor/shareholder in the Machine LearningMarket, the provided study will help you to understand the growth model of Machine LearningIndustry after impact of COVID-19. Request for sample report (including ToC, Tables and Figures with detailed information) @https://www.alltheresearch.com/impactC19-request/285

The report assesses the key opportunities in the market and outlines the factors that are and will be driving the growth of the Machine Learningindustry. Growth of the overall Machine Learningmarket has also been forecasted for the period 2019-2025, taking into consideration the previous growth patterns, the growth drivers and the current and future trends.

Market Segments and Sub-segments Covered in the Report are as per below:

Based on Product Type Machine Learningmarket is segmented into:

Based on Application Machine Learningmarket is segmented into:

The major players profiled in this report include:

Get Exclusive Sample Report on Machine Learning Market is available at https://www.alltheresearch.com/sample-request/285

Regional Coverage of the Machine LearningMarket:

Key Questions Answered in this Report:

What is the market size of the Machine Learningindustry?This report covers the historical market size of the industry (2013-2019), and forecasts for 2020 and the next 5 years. Market size includes the total revenues of companies.

What is the outlook for the Machine Learningindustry?This report has over a dozen market forecasts (2020 and the next 5 years) on the industry, including total sales, a number of companies, attractive investment opportunities, operating expenses, and others.

What industry analysis/data exists for the Machine Learningindustry?This report covers key segments and sub-segments, key drivers, restraints, opportunities, and challenges in the market and how they are expected to impact the Machine Learningindustry. Take a look at the table of contents below to see the scope of analysis and data on the industry.

How many companies are in the Machine Learningindustry?This report analyzes the historical and forecasted number of companies, locations in the industry, and breaks them down by company size over time. The report also provides company rank against its competitors with respect to revenue, profit comparison, operational efficiency, cost competitiveness, and market capitalization.

What are the financial metrics for the industry?This report covers many financial metrics for the industry including profitability, Market value- chain, and key trends impacting every node with reference to the companys growth, revenue, return on sales, etc.

What are the most important benchmarks for the Machine Learningindustry?Some of the most important benchmarks for the industry include sales growth, productivity (revenue), operating expense breakdown, span of control, organizational make-up. All of which youll find in this market report.

Get Chance of 20% Extra Discount, If your Company is Listed in Above Key Players Listhttps://www.alltheresearch.com/speak-to-analyst/285

FOR ALL YOUR RESEARCH NEEDS, REACH OUT TO US AT:Contact Name: Rohit B.Email:[emailprotected]Phone: 1-888-691-6870

View original post here:
Machine Learning Market 2020 Will Emerge Globally And Grow upto 44.3% of CAGR by 2026 - The Daily Chronicle

From the Sky Above to the Sea Below – UCI News

Posted on October 1, 2020 by admin

The UCI researchers who probe the Earth and sky for answers to momentous questions about the environment, the oceans and the atmosphere have gotten smart about unlocking solutions. The key: They turn billions of pieces of data into insights by embracing artificial intelligence and machine learning.

AI with its growing presence on campuses worldwide has already transformed what computers can do for scientists and is now being touted as potentially revolutionizing academic research in the next few years.

The excitement is palpable, says James Bullock, dean of the School of Physical Sciences and professor of physics & astronomy, describing the effect machine learning has had on basic physical science research. There is a sense that were about to experience a phase change in the way science is done.

Roughly half of the schools faculty are using or developing machine learning algorithms to drive new discoveries, he says, in areas that include climate change, particle physics, quantum simulation and materials science.

Probably the most oversubscribed talk series in the School of Physical Sciences is our Machine Learning Nexus seminar, which brings together researchers from all of our subfields astronomy, chemistry, climate science and physics to share techniques and ideas in this domain, Bullock notes.

In addition to its uses in academic research, machine learning equips students with the expertise to pursue a variety of careers in well-established companies or in industrial research.

Shane Coffield, a Ph.D. candidate in Earth system science who was drawn to UCI because of its AI program, is collaborating with colleagues on predictive models for the intensity and spread of California wildfires. Photo by Steve Zylius/UCI.

Shane Coffield, a Ph.D. candidate in Earth system science, says that the universitys machine learning program influenced his decision to choose UCI for his graduate education.

It started a really fruitful collaboration, he says. I get to work with experts who help me understand the best tools to apply to my science. Coffield and his UCI research partners received international media coverage for their wildfire predictions. The group is just one of several UCI Earth system science teams that have had studies utilizing machine learning tools published recently.

Another such team includes Associate Professor Michael Pritchard, who calls AI a big game changer in his field. Hes a cloud guy, with office walls that sport posters of swirling white satellite images.

I remember being enamored with clouds when I was 5 or 6 years old, he says. My family moved around, and I went on a lot of international flights. I fought with my brothers because I always wanted to be next to the window to look at the clouds.

Pritchard worked with researchers from UCI, the Ludwig Maximilian University of Munich and Columbia University to develop deep machine learning tools that would factor clouds into climate models. Their research was published in September 2018 by the Proceedings of the National Academy of Sciences.

Clouds play a major role in the Earths climate by transporting heat and moisture, reflecting and absorbing the suns rays, trapping infrared heat rays, and producing precipitation, says Pritchard, a next-generation climate modeler.

He is particularly fond of stratocumulus clouds the marine layer common in Southern California. Theyre the beautiful clouds you see out your window if you fly from here to Hawaii. Theyre unbroken, ripply, a dazzling bright layer, he says.

But clouds frustrate climate scientists: Its difficult to factor them into predictive models because of their size and variability.

They can be formed by eddies as small as a few hundred meters, much tinier than a standard climate model grid resolution of 50 to 100 kilometers, so simulating them appropriately takes an enormous amount of computer power and time, Pritchard says.

Its hard to know if a warmer world will bring more low-lying clouds that shield Earth from the sun, cooling the planet, or fewer of them, warming it up, he says.

AI, which Earth system scientists find masterfully efficient, could make a difference. If we try to simulate the whole planets atmosphere, were horsepower limited, because we have to simulate it for a hundred years, Pritchard says. But with machine learning, we could speed that up maybe we only have to simulate three months of atmosphere. Then we could really do justice to detailed cloud physics.

While Pritchard looks to the sky for his field of study, his colleague Adam Martiny, professor of Earth system science as well as ecology & evolutionary biology, focuses on the deep blue sea.

AI offers a new way of doing science thats very exciting, says Martiny, noting that it can present the unexpected. His research has contradicted prevalent views about the effects of global warming on phytoplankton in tropical waters. The study was published in January in Nature Geoscience; his co-author was Earth system science professor Francois Primeau.

Martiny, the lead researcher, explains: AI is a set of tools that are super useful when were working with large amounts of data because it helps us see new patterns. In the past, based on our best theories, we had predicted that as the ocean heats, the plankton that lives there would become more stressed.

Instead, using AI, they were able to forecast that phytoplankton populations in low-latitude waters will expand by the end of the 21st century.

We give the model tons and tons of data, Martiny says. Artificial intelligence tools can help us challenge existing paradigms.

Thats exactly what happened in this project. The researchers had a very large dataset describing the abundance of phytoplankton in various regions. We asked what the relationship was between common environmental factors such as nutrients and temperature, Martiny says. Much to our surprise, in the low-latitude regions and tropics, we saw a very significant positive relationship between temperature and abundance of these plankton.

It wasnt what they expected to find. We were very puzzled, he says, because it essentially offered a very different outcome on how warming and stratification would affect these populations.

One possible explanation for the growth focuses on the life cycle of phytoplankton. When plankton die especially these small species they sit around for a while longer, Martiny says. And maybe at higher temperatures, living plankton can more easily degrade them and recycle the nutrients back to build new biomass.

Whats next? The AI and mechanistic models give such opposite results. I dont know which one is right, he says. Thats for the next round of research to figure out. But AI really opened a door.

The same is true for doctoral student Coffield and his collaborators, who have been using AI and machine learning to determine which wildfires will burn out of control. The technique they developed helps project the final size of a fire from the moment of ignition, thus allowing firefighters to more efficiently allocate scarce resources.

The researchers analysis, which focused on Alaskan fires, is highlighted in a study published in September 2019 in the International Journal of Wildland Fire. Coffield worked with an interdisciplinary team and with co-author James Randerson, professor and Ralph J. & Carol M. Cicerone Chair in Earth System Science at UCI.

Alaska was used as the locale for the project because the state has been plagued over the past decade by a rash of concurrent fires in its boreal forests, threatening people and vulnerable ecosystems.

In Alaska, there can be huge numbers of fires burning at the same time, explains lead author Coffield. Our goal was to help fire managers predict the largest fires before they get out of control kind of a triaging system.

Machine learning is promising because were living in a world where theres so much data, he says. You dump in all the data you have, and it figures out the underlying patterns, whereas in the more traditional approach, you know the laws of physics, and you put in the rules and start from the bottom up.

Coffield and the UCI team are now focusing on California fires and forests. Were building off some of the things we learned from the Alaska study, he says, and are creating a more complex machine learning based model for predicting fire spread in California.

Machine learning is a really powerful tool, Coffield adds. Theres so much to be learned from it.

Continued here:
From the Sky Above to the Sea Below - UCI News