Julia and PyCaret Latest Versions, arXiv on Kaggle, UK’s AI Supercomputer And More In This Week’s Top AI News – Analytics India Magazine

Every week, we at Analytics India Magazine aggregate the most important news stories that affect the AI/ML industry. Lets take a look at all the top news stories that took place recently. The following paragraphs summarise the news, and you can click on the hyperlinks for the full coverage.

This was one of the biggest news of the week for all data scientists and ML enthusiasts. arXiv, the most comprehensive repository of research papers, has recently stated that they are offering a free and open pipeline of its dataset, with all the relevant features like article titles, authors, categories, abstracts, full-text PDFs, and more. Now, with the machine-readable dataset of 1.7 million articles, the Kaggle community would benefit tremendously from the rich corpus of information.

The objective of the move is to promote developments in fields such as machine learning and artificial intelligence. arXiv hopes that Kaggle users can further drive the boundaries of this innovation using its knowledge base, and it can be a new outlet for the research community to collaborate on machine learning innovation. arXiv has functioned as the knowledge hub for public and research communities by providing open access to scholarly articles.

The India Meteorological Department (IMD) is aiming to use artificial intelligence in weather forecasting. The use of AI here is particularly focused on issuing nowcasts, which can help in almost real-time (3-6 hours) prediction of drastic weather episodes; the Director-General Mrutunjay Mohapatra said last week. In this regard, IMD has invited research firms to evaluate how AI is of value to enhance weather forecasting.

Weather forecasting has typically been done by physical models of the atmosphere, which are uncertain to perturbations, and therefore are erroneous for significant periods. Since machine learning methods are more robust against perturbations, researchers have been investigating their application in weather forecasting to produce more precise weather predictions for substantial periods of time. Artificial intelligence helps in understanding past weather models, and this can make decision-making faster, Mohapatra said.

PyCaret- the open-source low-code machine learning library in Python has come up with the new version PyCaret 2.0. The design and simplicity of PyCaret is inspired by the emerging role of citizen data scientists and users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more expertise.

The latest release aims to reduce the hypothesis to insights cycle time in an ML experiment and enables data scientists to perform end-to-end experiments quickly and efficiently. Some major updates in the new release of PyCaret include features like Logging back-end, Modular Automation, Command Line Interface (CLI), GPU enabled training, and Parallel Processing.

Global manufacture of mobile devices and technology solutions company Nokia said it would set up a robotics lab at Indian Institute of Science to drive research on use cases on 5G and emerging technologies. The lab will be hosted by Nokia Center of Excellence for Networked Robotics and serve as an interdisciplinary laboratory which will power socially relevant use cases across areas like disaster and emergency management, farming and manufacturing automation.

Apart from research activity, the lab will also promote engagement among ecosystem partners and startups in generating end-to-end use cases. This will also include Nokia student fellowships which will be granted to select IISC students that engage in the advancement of innovative use cases.

Julia recently launched its new version. The launch introduces many new features and performance enhancements for users. Some of the new features and updates include Struct layout and allocation optimisations, multithreading API stabilisation & improvements, Per-module optimisation levels, latency improvements, making Pkg Protocol the default, Automated rr-based bug reports and more.

It has also brought about some impressive algorithmic improvements for some popular cases such as generating normally-distributed double-precision floats.

In an important update relating to the technology infrastructure, the Ministry of Electronics and Information Technology (MeitY) may soon launch a national policy framework for building data centres across India. Keeping in sync with the demands of Indias burgeoning digital sector, the data centre national framework will make it easy for companies to establish hardware necessary to support the rising data workloads, and support business continuity.

The data centre policy framework will focus on the usage of renewable power, state-level subsidy in electricity costs for data centres, and easing other regulations for companies. According to a report, the national framework will boost the data centre industry in India and facilitate a single-window clearance for approvals. Read more here.

A new commission has been formed by Oxford University to advise world leaders on effective ways to use Artificial Intelligence (AI) and machine learning in public administration and governance.

The Oxford Commission on AI and Good Governance (OxCAIGG) will bring together academics, technology experts and policymakers to analyse the AI implementation and procurement challenges faced by governments around the world. Led by the Oxford Internet Institute, the Commission will make recommendations on how AI-related tools can be adapted and adopted by policymakers for good governance now and in the near future. The report outlines four significant challenges relating to AI development and application that need to be overcome for AI to be put to work for good governance and leverage it as a force for good in government responses to the COVID-19 pandemic.

The University of Oxford has partnered with Atos to build the UKs AI-focused supercomputer. The AI supercomputer will be built on the Nvidia DGX SuperPOD architecture and comprises 63 nodes. The deal with Atos has cost 5 million ($6.5 million) and is funded by the Engineering and Physical Sciences Research Council (EPSRC) and Joint Academic Data Science Endeavor, a consortium of 20 universities and the Turing Institute.

Known as JADE2, the AI supercomputer aims to build on the success of the current JADE^1 facility a national resource in the United Kingdom, which provides advanced GPU computing capabilities to AI and machine learning experts.

comments

Vishal Chawla is a senior tech journalist at Analytics India Magazine and writes about AI, data analytics, cybersecurity, cloud computing, and blockchain. Vishal also hosts AIM's video podcast called Simulated Reality- featuring tech leaders, AI experts, and innovative startups of India. Reach out at vishal.chawla@analyticsindiamag.com

Read the original:
Julia and PyCaret Latest Versions, arXiv on Kaggle, UK's AI Supercomputer And More In This Week's Top AI News - Analytics India Magazine

BMW, Red Hat, and Malong Share Insights on AI and Machine Learning During Transform 2020 – ENGINEERING.com

BMW, Red Hat, and Malong Share Insights on AI and Machine Learning During Transform 2020Denrie Caila Perez posted on August 07, 2020 | Executives from BMW, Red Hat and Malong discuss how AI is transforming manufacturing and retail.

(From left to right) Maribel Lopez of Lopez Research, Jered Floyd of Red Hat, Jimmy Nassif of BMW Group, and Matt Scott of Malong Technologies.

The VentureBeat 2020 conference welcomed the likes of BMW Groups Jimmy Nassif, Red Hats Jered Floyd, and Malong CEO Matt Scott, who shared their insights on challenges with AI in their respective industries. Nassif, who deals primarily with robotics, and Floyd, who works in retail, both agreed that edge computing and the Internet of Things (IoT) has become powerful in accelerating production while introducing new capabilities in operations. According to Nassif, BMWs car sales have already doubled over the past decade, with 2.5 million in 2019. With over 4,500 suppliers dealing 203,000 unique parts, logistics problems are bound to occur. In addition to that, approximately 99 percent of orders are unique, which means there are over 100 end-customer options.

Thanks to platforms such as NVIDIAs Isaac, Jetson AXG Xavier, and DGX, BMW was able to come up with five navigation and manipulation robots that transport and manage parts around its warehouses. Two of the robots have already been deployed to four facilities in Germany. Using computer vision techniques, the robots are able to successfully identify parts, as well as people and potential obstacles. According to BMW, the algorithms are also constantly being optimized using NVIDIAs Omniverse simulator, which BMW engineers can access anytime from any of their global facilities.

In contrast, Malong uses machine learning in a totally different playing fieldself-checkout stations in retail locations. Overhead cameras are able to feed images of products as they pass the scanning bed to algorithms capable of detecting mis-scans. This includes mishaps such as occluded barcodes, products left in shopping carts, dissimilar products, and even ticket switching, which is when a products barcode is literally switched with that of a cheaper product.

These algorithms also run on NVIDIA hardware and are trained with minimal supervision, allowing them to learn and identify products using various video feeds on their own. According to Scott, edge computing is particularly significant in this area due to the necessity of storing closed-circuit footage via the cloud. Not only that, but it enables easier scalability to thousands of stores in the long term.

Making an AI system scalable is very different from making it run, he explained. Thats sometimes a mirage that happens when people are starting to play with these technologies.

Floyd also stressed how significant open platforms are when playing with AI and edge computing technology. With open source, everyone can bring their best technologies forward. Everyone can come with the technologies they want to integrate and be able to immediately plug them into this enormous ecosystem of AI components and rapidly connect them to applications, he said.

Malong has been working with Open Data Hub, a platform that allows for end-to-end AI and is designed for engineers to conceptualize AI solutions without needing complicated and costly machine learning workflows. In fact, its the very foundation of Red Hats data science software development stack.

All three companies are looking forward to more innovation in applications and new technologies.

Visit VentureBeats website for more information on Transform 2020. You can also watch the Transform 2020 sessions on demand here.

For more news and stories, check out how a machine learning system detects manufacturing defects using photos here.

Go here to read the rest:
BMW, Red Hat, and Malong Share Insights on AI and Machine Learning During Transform 2020 - ENGINEERING.com

Hypotenuse AI wants to take the strain out of copywriting for ecommerce – TechCrunch

Imagine buying a dress online because a piece of code sold you on its flattering, feminine flair or convinced you romantic floral details would outline your figure with timeless style. The very same day your friend buy the same dress from the same website but shes sold on a description of vibrant tones, fresh cotton feel and statement sleeves.

This is not a detail from a sci-fi short story but the reality and big picture vision of Hypotenuse AI, a YC-backed startup thats using computer vision and machine learning to automate product descriptions for ecommerce.

One of the two product descriptions shown below is written by a human copywriter. The other flowed from the virtual pen of the startups AI, per an example on its website.

Can you guess which is which?* And if you think you can well, does it matter?

Screengrab: Hypotenuse AIs website

Discussing his startup on the phone from Singapore, Hypotenuse AIs founder Joshua Wong tells us he came up with the idea to use AI to automate copywriting after helping a friend set up a website selling vegan soap.

It took forever to write effective copy. We were extremely frustrated with the process when all we wanted to do was to sell products, he explains. But we knew how much description and copy affect conversions and SEO so we couldnt abandon it.

Wong had been working for Amazon, as an applied machine learning scientist for its Alexa AI assistant. So he had the technical smarts to tackle the problem himself. I decided to use my background in machine learning to kind of automate this process. And I wanted to make sure I could help other ecommerce stores do the same as well, he says, going on to leave his job at Amazon in June to go full time on Hypotenuse.

The core tech here computer vision and natural language generation is extremely cutting edge, per Wong.

What the technology looks like in the backend is that a lot of it is proprietary, he says. We use computer vision to understand product images really well. And we use this together with any metadata that the product already has to generate a very human fluent type of description. We can do this really quickly we can generate thousands of them within seconds.

A lot of the work went into making sure we had machine learning models or neural network models that could speak very fluently in a very human-like manner. For that we have models that have kind of learnt how to understand and to write English really, really well. Theyve been trained on the Internet and all over the web so they understand language very well. Then we combine that together with our vision models so that we can generate very fluent description, he adds.

Image credit: Hypotenuse

Wong says the startup is building its own proprietary data-set to further help with training language models with the aim of being able to generate something thats very specific to the image but also specific to the companys brand and writing style so the output can be hyper tailored to the customers needs.

We also have defaults of style if they want text to be more narrative, or poetic, or luxurious but the more interesting one is when companies want it to be tailored to their own type of branding of writing and style, he adds. They usually provide us with some examples of descriptions that they already have and we used that and get our models to learn that type of language so it can write in that manner.

What Hypotenuses AI is able to do generate thousands of specifically detailed, appropriately styled product descriptions within seconds has only been possible in very recent years, per Wong. Though he wont be drawn into laying out more architectural details, beyond saying the tech is completely neural network-based, natural language generation model.

The product descriptions that we are doing now the techniques, the data and the way that were doing it these techniques were not around just like over a year ago, he claims. A lot of the companies that tried to do this over a year ago always used pre-written templates. Because, back then, when we tried to use neural network models or purely machine learning models they can go off course very quickly or theyre not very good at producing language which is almost indistinguishable from human.

Whereas now we see that people cannot even tell which was written by AI and which by human. And that wouldnt have been the case a year ago.

(See the above example again. Is A or B the robotic pen? The Answer is at the foot of this post)

Asked about competitors, Wong again draws a distinction between Hypotenuses pure machine learning approach and others who relied on using templates to tackle this problem of copywriting or product descriptions.

Theyve always used some form of templates or just joining together synonyms. And the problem is its still very tedious to write templates. It makes the descriptions sound very unnatural or repetitive. And instead of helping conversions that actually hurts conversions and SEO, he argues. Whereas for us we use a completely machine learning based model which has learnt how to understand language and produce text very fluently, to a human level.

There are now some pretty high profile applications of AI that enable you to generate similar text to your input data but Wong contends theyre just not specific enough for a copywriting business purpose to represent a competitive threat to what hes building with Hypotenuse.

A lot of these are still very generalized, he argues. Theyre really great at doing a lot of things okay but for copywriting its actually quite a nuanced space in that people want very specific things it has to be specific to the brand, it has to be specific to the style of writing. Otherwise it doesnt make sense. It hurts conversions. It hurts SEO. So we dont worry much about competitors. We spent a lot of time and research into getting these nuances and details right so were able to produce things that are exactly what customers want.

So what types of products doesnt Hypotenuses AI work well for? Wong says its a bit less relevant for certain product categories such as electronics. This is because the marketing focus there is on specs, rather than trying to evoke a mood or feeling to seal a sale. Beyond that he argues the tool has broad relevance for ecommerce. What were targeting it more at is things like furniture, things like fashion, apparel, things where you want to create a feeling in a user so they are convinced of why this product can help them, he adds.

The startups SaaS offering as it is now targeted at automating product description for ecommerce sites and for copywriting shops is actually a reconfiguration itself.

The initial idea was to build a digital personal shopper to personalize the ecommerce experence. But the team realized they were getting ahead of themselves. We only started focusing on this two weeks ago but weve already started working with a number of ecommerce companies as well as piloting with a few copywriting companies, says Wong, discussing this initial pivot.

Building a digital personal shopper is still on the roadmap but he says they realized that a subset of creating all the necessary AI/CV components for the more complex digital shopper proposition was solving the copywriting issue. Hence dialling back to focus in on that.

We realized that this alone was really such a huge pain-point that we really just wanted to focus on it and make sure we solve it really well for our customers, he adds.

For early adopter customers the process right now involves a little light onboarding typically a call to chat through their workflow is like and writing style so Hypotenuse can prep its models. Wong says the training process then takes a few days. After which they plug in to it as software as a service.

Customers upload product images to Hypotenuses platform or send metadata of existing products getting corresponding descriptions back for download. The plan is to offer a more polished pipeline process for this in the future such as by integrating with ecommerce platforms like Shopify .

Given the chaotic sprawl of Amazons marketplace, where product descriptions can vary wildly from extensively detailed screeds to the hyper sparse and/or cryptic, there could be a sizeable opportunity to sell automated product descriptions back to Wongs former employer. And maybe even bag some strategic investment before then However Wong wont be drawn on whether or not Hypotenuse is fundraising right now.

On the possibility of bagging Amazon as a future customer hell only say potentially in the long run thats possible.

Joshua Wong (Photo credit: Hypotenuse AI)

The more immediate priorities for the startup are expanding the range of copywriting its AI can offer to include additional formats such as advertising copy and even some listicle style blog posts which can stand in as content marketing (unsophisticated stuff, along the lines of 10 things you can do at the beach, per Wong, or 10 great dresses for summer etc).

Even as we want to go into blog posts were still completely focused on the ecommerce space, he adds. We wont go out to news articles or anything like that. We think that that is still something that cannot be fully automated yet.

Looking further ahead he dangles the possibility of the AI enabling infinitely customizable marketing copy meaning a website could parse a visitors data footprint and generate dynamic product descriptions intended to appeal to that particular individual.

Crunch enough user data and maybe it could spot that a site visitor has a preference for vivid colors and like to wear large hats ergo, it could dial up relevant elements in product descriptions to better mesh with that persons tastes.

We want to make the whole process of starting an ecommerce website super simple. So its not just copywriting as well but all the difference aspects of it, Wong goes on. The key thing is we want to go towards personalization. Right now ecommerce customers are all seeing the same standard written content. One of the challenges there its hard because humans are writing it right now and you can only produce one type of copy and if you want to test it for other kinds of users you need to write another one.

Whereas for us if we can do this process really well, and we are automating it, we can produce thousands of different kinds of description and copy for a website and every customer could see something different.

Its a disruptive vision for ecommerce (call it A/B testing on steroids) that is likely to either delight or terrify depending on your view of current levels of platform personalization around content. That process can wrap users in particular bubbles of perspective and some argue such filtering has impacted culture and politics by having a corrosive impact on the communal experiences and consensus which underpins the social contract. But the stakes with ecommerce copy arent likely to be so high.

Still, once marketing text/copy no longer has a unit-specific production cost attached to it and assuming ecommerce sites have access to enough user data in order to program tailored product descriptions theres no real limit to the ways in which robotically generated words could be reconfigured in the pursuit of a quick sale.

Even within a brand there is actually a factor we can tweak which is how creative our model is, says Wong, when asked if theres any risk of the robots copy ending up feeling formulaic. Some of our brands have like 50 polo shirts and all of them are almost exactly the same, other than maybe slight differences in the color. We are able to produce very unique and very different types of descriptions for each of them when we cue up the creativity of our model.

In a way its sometimes even better than a human because humans tends to fall into very, very similar ways of writing. Whereas this because its learnt so much language over the web it has a much wider range of tones and types of language that it can run through, he adds.

What about copywriting and ad creative jobs? Isnt Hypotenuse taking an axe to the very copywriting agencies his startup is hoping to woo as customers? Not so, argues Wong. At the end of the day there are still editors. The AI helps them get to 95% of the way there. It helps them spark creativity when you produce the description but that last step of making sure it is something that exactly the customer wants thats usually still a final editor check, he says, advocating for the human in the AI loop. It only helps to make things much faster for them. But we still make sure theres that last step of a human checking before they send it off.

Seeing the way NLP [natural language processing] research has changed over the past few years it feels like were really at an inception point, Wong adds. One year ago a lot of the things that we are doing now was not even possible. And some of the things that we see are becoming possible today we didnt expect it for one or two years time. So I think it could be, within the next few years, where we have models that are not just able to write language very well but you can almost speak to it and give it some information and it can generate these things on the go.

*Per Wong, Hypotenuses robot is responsible for generating description A. Full marks if you could spot the AIs tonal pitfalls

Excerpt from:
Hypotenuse AI wants to take the strain out of copywriting for ecommerce - TechCrunch

Machine learning in rare disease: is the future here? – PharmaLive

By Alex Garner,Chief Product Officer, Raremark

The healthcare industry is increasingly focusing on niche patient populations. Around half of FDA approvals in the past two years were for rare or orphan drugs that serve fewer than 200,000 patients in total in the US and 1 in 2,000 patients in Europe. By 2024, orphan drug sales are expected to capture one-fifth of worldwide prescription sales.

However, finding these hard-to-reach patients is difficult and keeping them engaged over time even more so. Could machine learning platforms that deliver personalized experiences for patients and caregivers be part of the answer? Patient insight over time can help brands to understand niche patient populations, informing launch strategies, which in rare conditions can feel like launching in the dark or based on conversations with just a few people.

Chances are that at some point in the last few hours youll have used an application powered by machine learning in some form or other. Netflix, Facebook, Google and Siri all use machine learning to personalize how we experience their service. Machine learning is essentially feeding a computer lots of information for it to then find and act on patterns in the data. For example, Facebooks machine learning algorithm analyzes how each user interacts with content on the platform and then, based on that, decides what content users should see next, making my Facebook feed look very different to yours.

Building a better road to diagnosis using machine learning

For healthcare, one benefit of machine learning lies in the ability to process enormous data sets and reliably find certain trends or insights that can improve and potentially disrupt the current levels of care patients are currently getting. For example, Microsoft is working on a way to automatically spot tumors from healthy tissue in radiological imagery. Other innovators are building prediction models to identify patients that could be at high risk of sepsis or heart failure and some are even developing facial recognition apps that help detect genetic disorders. All of these are very much a work in progress, and we are only just scratching the surface of machine learning in healthcare. One aspect we can be sure of though is that machine learning relies heavily on big data sets something not readily available in rare disease.

A huge challenge for patients with rare diseases is getting an accurate diagnosis. Patients typically may have waited eight years to get one, usually down to a lack of knowledge and awareness of their disease by healthcare providers. There are around 7,000 rare diseases with small globally dispersed populations, and detailed medical literature and research on each of these diseases is often scarce.

There are some exciting developments happening in the rare disease space where innovators are using machine learning to try and improve diagnosis journeys.

Volv, a Swiss digital health and life sciences company, has made some great strides in this area. Their prediction model can diagnose patients with a rare disease with 97% accuracy using medical health records. Volv feeds information around symptoms, patient journeys, instances of misdiagnosis, clinical decision making and other clinical data points into its model to help it learn about a particular rare disease. Then they give it access to a huge dataset of anonymous medical records, which it analyzes and finds those patients at risk of a particular rare disease. The company recently shared a case study of its model in action and it found a whole new cohort of patients at risk of a rare disease, who did not have it as a diagnosis on their medical record. The anonymized patients found were being treated for other conditions. This enlightened approach could dramatically accelerate the diagnostic journey for many rare conditions.

Another area where machine learning can be used is in medical imaging. The award-winning breast cancer screening AI Mia, built by the British med tech company Kheiron, uses novel deep learning methods and radiology insights to find malignancies in mammograms. Kheiron was recently granted a UK government grant to help determine the best use of Mia to increase the automation of breast screening services. Boston-based biotech FDNA is also focusing on medical imagery to improve diagnosis. Their Face2Gene tool helps researchers analyze patient faces to determine whether they have a genetic disorder. In fact, its already being used in multiple studies investigating rare diseases, such as a 2020 study looking at Mucolipidosis type IV (ML-IV), a rare autosomal, recessive lysosomal storage disease, where researchers want to see whether patients with this disease share identifiable facial features not yet described in medical literature.

How is machine learning helping rare disease patients at the moment?

Alex Garner

A new wave of online patient platforms has emerged in the past decade, aimed at bringing patients together in one place to share experiences and learn from the wisdom of the crowd. Some of these platforms are researching and discovering machine learning techniques to enhance the experience of users. Our platform Raremark is one of them. Its the worlds largest patient experience network in rare disease. Our platform makes the right information available to patients at the right time. We understand that patients who have just been diagnosed have different needs and questions than someone who has been living with the condition for many years.

Raremark is continuing to research and develop new ways to match members to content and opportunities to share their lived experiences through a novel combination of machine learning techniques and behavioural science models. We believe that an effective matching algorithm for online health resources is to recognise that each member will have a different set of personal characteristics. These characteristics will determine how they confront the realities of living or caring for someone with a rare condition. Using these technologies to learn and automate when to recommend the right type of experiences to read or contribute on the platform help our members build a valuable knowledge base about their disease.

We have learnt the best way to find and engage with people affected by a rare disease is by firstly understanding their digital journeys and starting conversations on those channels first. Once a relationship has been established, we invite them to become a Raremark member, where we begin to build their trust by listening to and responding to their needs through our personalized content recommendation system. We can then go a step further and begin to study user behavior to gain some insight into areas like the motivations behind taking part in research and clinical studies or the reasons for treatment non-adherence for certain rare diseases. We keep our intentions clear and transparent our members know that with their explicit consent we share certain member experiences and survey results with researchers and companies studying their disease to advance the field further.

Rare disease and the machine learning frontier

We still have a long way to go before the full potential of machine learning and AI are realized. Its important not to overestimate the capabilities of machine learning and AI, we are still only touching the surface of its full potential. In rare disease, a challenge for all of these models is the small data sets that come with small patient populations, as well as the format of rare disease research where insights are hidden in dense literature. Advances are already being made to find key information from reams of text.

Despite machine learning and artificial intelligence still being in its infancy, every day were seeing new and exciting innovations happening in health and with every new project or setback, were getting closer to making true artificial intelligence a reality. Its an exciting road ahead.

About the author

Alex Garner, chief product officer, is responsible for the upkeep and future development of Raremarks digital real-estate. He discovered a passion for building health-tech products from over five years of implementing and designing digital applications for the NHS. Along with a masters degree in business management and innovation, Alex is a firm believer in the principles of user-centric design and constant learning

Read the rest here:
Machine learning in rare disease: is the future here? - PharmaLive

COVID-19 Update: Global Data Science and Machine Learning Service Market is Expected to Grow at a Healthy CAGR with Top players: DataScience.com, ZS,…

The latest Data Science and Machine Learning Servicemarket report estimates the opportunities and current market scenario, providing insights and updates about the corresponding segments involved in the global Data Science and Machine Learning Servicemarket for the forecast period of 2020-2026. The report provides detailed assessment of key market dynamics and comprehensive information about the structure of the Data Science and Machine Learning Serviceindustry. This market study contains exclusive insights into how the global Data Science and Machine Learning Servicemarket is predicted to grow during the forecast period.

The primary objective of the Data Science and Machine Learning Service market report is to provide insights regarding opportunities in the market that are supporting the transformation of global businesses associated with Data Science and Machine Learning Service. This report also provides an estimation of the Data Science and Machine Learning Servicemarket size and corresponding revenue forecasts carried out in terms of US$. It also offers actionable insights based on the future trends in the Data Science and Machine Learning Servicemarket. Furthermore, new and emerging players in the global Data Science and Machine Learning Servicemarket can make use of the information presented in the study for effective business decisions, which will provide momentum to their businesses as well as the global Data Science and Machine Learning Servicemarket.

Get Exclusive Sample copy on Data Science and Machine Learning Service Market is available at https://inforgrowth.com/sample-request/6317405/data-science-and-machine-learning-service-market

The study is relevant for manufacturers, suppliers, distributors, and investors in the Data Science and Machine Learning Servicemarket. All stakeholders in the Data Science and Machine Learning Servicemarket, as well as industry experts, researchers, journalists, and business researchers can influence the information and data represented in the report.

Data Science and Machine Learning Service Market 2020-2026: Segmentation

The Data Science and Machine Learning Service market report covers major market players like DataScience.com, ZS, LatentView Analytics, Mango Solutions, Microsoft, International Business Machine, Amazon Web Services, Google, Bigml, Fico, Hewlett-Packard Enterprise Development, At&T

Data Science and Machine Learning Service Market is segmented as below:

By Product Type: Consulting, Management Soluti

Breakup by Application:Banking, Insurance, Retail, Media & Entertainment, Other

Impact of COVID-19:Data Science and Machine Learning ServiceMarket report analyses the impact of Coronavirus (COVID-19) on the Data Science and Machine Learning Serviceindustry.Since the COVID-19 virus outbreak in December 2019, the disease has spread to almost 180+ countries around the globe with the World Health Organization declaring it a public health emergency. The global impacts of the coronavirus disease 2019 (COVID-19) are already starting to be felt, and will significantly affect the Data Science and Machine Learning Servicemarket in 2020.

The outbreak of COVID-19 has brought effects on many aspects, like flight cancellations; travel bans and quarantines; restaurants closed; all indoor events restricted; emergency declared in many countries; massive slowing of the supply chain; stock market unpredictability; falling business assurance, growing panic among the population, and uncertainty about future.

COVID-19 can affect the global economy in 3 main ways: by directly affecting production and demand, by creating supply chain and market disturbance, and by its financial impact on firms and financial markets.

Download the Sample ToC and understand the COVID19 impact and be smart in redefining business strategies. https://inforgrowth.com/CovidImpact-Request/6317405/data-science-and-machine-learning-service-market

Global Data Science and Machine Learning Service Market Report Answers Below Queries:

To know about the global trends impacting the future of market research, contact at: https://inforgrowth.com/enquiry/6317405/data-science-and-machine-learning-service-market

Key Questions Answered in this Report:

What is the market size of the Data Science and Machine Learning Service industry?This report covers the historical market size of the industry (2013-2019), and forecasts for 2020 and the next 5 years. Market size includes the total revenues of companies.

What is the outlook for the Data Science and Machine Learning Service industry?This report has over a dozen market forecasts (2020 and the next 5 years) on the industry, including total sales, a number of companies, attractive investment opportunities, operating expenses, and others.

What industry analysis/data exists for the Data Science and Machine Learning Service industry?This report covers key segments and sub-segments, key drivers, restraints, opportunities and challenges in the market and how they are expected to impact the Data Science and Machine Learning Service industry. Take a look at the table of contents below to see the scope of analysis and data on the industry.

How many companies are in the Data Science and Machine Learning Service industry?This report analyzes the historical and forecasted number of companies, locations in the industry, and breaks them down by company size over time. The report also provides company rank against its competitors with respect to revenue, profit comparison, operational efficiency, cost competitiveness, and market capitalization.

What are the financial metrics for the industry?This report covers many financial metrics for the industry including profitability, Market value- chain and key trends impacting every node with reference to companys growth, revenue, return on sales, etc.

What are the most important benchmarks for the Data Science and Machine Learning Service industry?Some of the most important benchmarks for the industry include sales growth, productivity (revenue), operating expense breakdown, the span of control, organizational make-up. All of which youll find in this market report.

Get Special Discount UP TO 50% for this Report:https://inforgrowth.com/discount/6317405/data-science-and-machine-learning-service-market

FOR ALL YOUR RESEARCH NEEDS, REACH OUT TO US AT:Address: 6400 Village Pkwy suite # 104, Dublin, CA 94568, USAContact Name: Rohan S.Email:[emailprotected]Phone: US: +1-909-329-2808UK: +44 (203) 743 1898

View original post here:
COVID-19 Update: Global Data Science and Machine Learning Service Market is Expected to Grow at a Healthy CAGR with Top players: DataScience.com, ZS,...

State of the Art in Automated Machine Learning – InfoQ.com

Key Takeaways

In recent years, machine learning has been very successful in solving a wide range of problems.

In particular, neural networks have reached human, and sometimes super-human, levels of ability in tasks such as language translation, object recognition, game playing, and even driving cars.

With this growth in capability has come a growth in complexity. Data scientists and machine learning engineers must perform feature engineering, design model architectures, and optimize hyperparameters.

Since the purpose of the machine learning is to automate a task normally done by humans, naturally the next step is to automate the tasks of data scientists and engineers.

This area of research is called automated machine learning, or AutoML.

There have been many exciting developments in AutoML recently, and it's important to take a look at the current state of the art and learn about what's happening now and what's coming up in the future.

InfoQ reached out to the following subject matter experts in the industry to discuss the current state and future trends in AutoML space.

InfoQ:What is AutoML and why is it important?

Francesca Lazzeri:AutoML is the process of automating the time consuming, iterative tasks of machine learning model development, including model selection and hyperparameter tuning. When automated systems are used, the high costs of running a single experiment (e.g. training a deep neural network) and the high sample complexity (i.e. large number of experiments required) can be decreased. Auto ML is important because data scientists, analysts, and developers across industries can leverage it to:

Matthew Tovbin:Similarly to how we use software to automate repetitive or complex processes, automated machine learning is a set of techniques we apply to efficiently build predictive models without manual effort. Such techniques include methods for data processing, feature engineering, model evaluation, and model serving. With AutoML, we can focus on higher-level objectives such as answering questions and delivering business values faster while avoiding mundane tasks, e.g., data wrangling, by standardizing the methods we apply.

Adrian de Wynter:AutoML is the idea that the machine learning process, from data selection to modeling, can be automated by a series of algorithms and heuristics. In its most extreme version, AutoML is a fully automated system: you give it data, and it returns a model (or models) that generalizes to unseen data. The common hurdles that modelers face, such as tuning hyperparameters, feature selection--even architecture selection--are handled by a series of algorithms and heuristics.

I think its importance stems from the fact that a computer does precisely what you want it to do, and it is fantastic at repetition. The large majority of the hurdles I mentioned above are precisely that: repetition. Finding a hyperparameter set that works for a problem is arduous. Finding a hyperparameter set and an architecture that works for a problem is even harder. Add to the mix data preprocessing, the time spent on debugging code, and trying to get the right environment to work, and you start wondering whether computers are actually helping you solve said problem, or just getting in the way. Then, you have a new problem, and you have to start all over again.

The key insight of AutoML is that you might be able to get away by using some things you tried out before (i.e., your prior knowledge) to speed up your modeling process. It turns out that said process is effectively an algorithm, and thus it can be written into a computer program for automation.

Leah McGuire:AutoML is machine learning experts automating themselves. Creating quality models is a complex, time-consuming process. It requires understanding the dataset and question to be answered. This understanding is then used to collect and join the needed data, select features to use, clean the data and features, transform the features into values that can be used by a model, select an appropriate model type for the question, and tune feature-engineering and model parameters. AutoML uses algorithms based on machine learning best practices to build high-quality models without time-intensive work from an expert.

AutoML is important because it makes it possible to create high quality models with less time and expertise. Companies, non-profits, and government agencies all collect vast amounts of data; in order for this data to be utilized, it needs to be synthesized to answer pertinent questions. Machine learning is an effective way of synthesizing data to answer relevant questions, particularly if you do not have the resources to employ analysts to spend huge amounts of time looking at the data. However, machine learning requires both expertise and time to implement. AutoML seeks to decrease these barriers. This means that more data can be analyzed and used to make decisions.

Marios Michailidis:Broadly speaking, I would call it the process of automatically deriving or extracting useful information from data via harnessing the power of machines. Digital data is being produced at an incredible pace. Now that companies have found ways to harness it to extract value, it has become imperative to invest in data science and machine learning. However, the supply of data science (in human resource) is not enough to meet the current needs, hence making existing data scientists more productive is of the essence. This is where the notion of automated machine learning can provide the most value, via equipping the existing data scientists with tools and processes that can make their work easier, quicker, and generally more efficient.

InfoQ:What parts of the ML process can be automated and what are some parts unlikely to be automated?

Lazzeri:With Automated ML, the following tasks can be automated:

However, there are a few important tasks that cannot be automated during the model development cycle, such us developing industry-specific knowledge and data acumen, which are hard to automate and it is impossible to not keep humans in the loop. Another important aspect to consider is about operationalizing machine learning models: AutoML is very useful for the machine learning model development cycle; however, for the automation of the deployment step, there are other tools that need to be used, such as MLOps, which enables data science and IT teams to collaborate and increase the pace of model development and deployment via monitoring, validation, and governance of machine learning models.

Tovbin:Through the years of development of the machine learning domain, we have seen that a large number of tasks around data manipulation, feature engineering, feature selection, model evaluation, hyperparameter tuning can be defined as an optimization problem and, with enough computing power, efficiently automated. We can see numerous proofs for that not only in research but also in the software industry as platform offerings or open-source libraries. All these tools use predefined methods for data processing, model training, and evaluation.

The creative approach to framing problems and applying new techniques to existing problems is the one that is not likely to be replicated by machine automation, due to a large number of possible permutations, complex context, and expertise the machine lacks. As an example, look at the design of neural net architectures and their applications, a problem where the search space is so ample, where the progress is still mostly human-driven.

de Wynter:In theory, the entire ML process is computationally hard. From fitting data to, say, a neural network, to hyperparameter selection, to neural architecture search (NAS), these are all hard problems in the general case. However, all of these components have been automated with varying degrees of success for specific problems thanks to a combination of algorithmic advances, computational power, and patience.

I would like to think that the data preprocessing step and feature selection processes are the hardest to automate, given that a machine learning model will only learn what it has seen, and its performance (and hence the solution provided by the system) is dependent on its input. That said, there is a growing body of research on that aspect, too, and I hope that it will not remain hard for many natural problems.

McGuire:I would break the process of creating a machine learning model into four main components: data ETL and cleaning, feature engineering, model selection and tuning, and model explanation and evaluation.

Data cleaning can be relatively straight forward or incredibly challenging, depending on your data set. One of the most important factors is history; if you have information about your data at every point in time, data cleaning can be automated quite well. If you have only a static representation of current state, cleaning becomes much more challenging. Older data systems designed before relatively cheap storage tend to keep only the current state of information. This means that many important datasets do not have a history of actions taken on the data. Cleaning this type of history-less data has been a challenge for AutoML to provide good quality models for our customers.

Feature engineering is - again - a combination of easy and extremely difficult to automate steps. Some types of feature engineering are easy to automate given sufficient metadata about particular features. For example, parsing a phone number to validate and extract the location from the area code is straightforward as long as you know that a particular string is a phone number. However, feature engineering that requires intimate, domain-specific knowledge of how a business works are unlikely to be automated. For example, if profits from a sale need to account for local taxes before being analyzed for cost-to-serve, some human input is likely required to establish this relationship (unless you have a massive amount of data to learn from). One reason deep learning has overtaken feature engineering in fields like vision and speech is the massive amounts of high quality training data. Tabular data is often quite source specific making it difficult to generalize and feature engineering remains a challenge. In addition, defining the correct way to combine sources of data is often incredibly complex and labor intensive. Once you have the relationship defined, the combination can be automated, but establishing this relationship takes a fair amount of manual work and is unlikely to be automated any time soon.

Model selection and tuning is the easiest component to automate and many libraries already do this; there are even AutoML algorithms to find entirely new deep learning architectures. However, model selection and tuning libraries assume that the data you are using for modeling is clean and that you have a good way of evaluating the efficacy of your model. Massive data sets also help. Establishing clean datasets and evaluation frameworks still remain the biggest challenges.

Model explanations have been an important area of research for machine learning in general. While it is not strictly speaking part of AutoML, the growth of AutoML makes it even more important. It is also the case that the way in which you implement automation has implications for explainability. Specifically tracking metadata about what was tried and selected determines how deep explanations can go. Building explanations into AutoML requires a conscious effort and is very important. At some point the automation has to stop and someone will look at and use the result. The more information the model provides about how it works the more useful it is to the end consumer.

Michailidis:I would divide the areas where automation can be applied to the following main areas:

Regarding problems which are hard to automate, the first thing that pops into my mind is anything related to translating the business problem into a machine learning problem. For AutoML to succeed, it would require mapping the business problem into a type of solvable machine learning problem. It will also need to be supported by the right data quality/relevancy. The testing of the model and the success criteria need to be defined carefully by the data scientist.

Another area that will be hard for AutoML to succeed is whenethical dilemmasmay arise from the use of machine learning. For example, if there is an accident involved due to an algorithmic error, who will be responsible? I feel this kind of situation can be a challenge for AutoML.

InfoQ: What type of problems or use cases are better candidates to use AutoML?

Lazzeri:Classification, regression, and time series forecasting are the best candidates for AutoML. Azure Machine Learning offers featurizations specifically for these tasks, such as deep neural network text featurizers for classification.

Common classification examples include fraud detection, handwriting recognition, and object detection. Different from classification where predicted output values are categorical, regression models predict numerical output values based on independent predictors. For example automobile price based on features like, gas mileage, safety rating, etc.

Finally, building forecasts is an integral part of any business, whether its revenue, inventory, sales, or customer demand. Data Scientists can use automated ML to combine techniques and approaches and get a recommended, high-quality time series forecast.

Tovbin:Classification or regression problems relying on structured or semi-structured data, where one can define an evaluation metric, can usually be automated. For example, predicting user churn, real estate price prediction, autocomplete.

de Wynter:It depends. Let us assume that you want the standard goal of machine learning: you need to learn an unseen probability distribution from samples. You also know that there is some AutoML system that does an excellent job for various, somewhat related tasks. Theres absolutely no reason why you shouldnt automate it, especially if you dont have the time to be trying out possible solutions by yourself.

I do need to point out, however, that in theory a model that performs well for a specific problem does not have any guarantees around other problemsin fact, it is well-known that there exists at least one task where it will fail. Still, this statement is quite general and can be worked around in practice.

On the other hand, from an efficiency point of view, a problem that has been studied for years by many researchers might not be a great candidate, unless you are particularly interested in marginal improvements. This follows immediately from the fact that most AutoML results, and more concretely, NAS results, for well-known problems usually are equivalent within a small delta to the human-designed solutions. However, making the problem "interesting" (e.g., by including newer constraints such as parameter size) makes it effectively a new problem, and again perfect for AutoML.

McGuire:If you have a clean dataset that has a very well defined evaluation method it is a good candidate for AutoML. Early advances in AutoML have focused on areas such as hyper parameter tuning. This is a well defined but time consuming problem. These AutoML solutions are essentially taking advantage of increases in computational power combined with models of the problem space to arrive at solutions that are often better than an expert could achieve with less human time input. The key here is the clean dataset with a direct and easily measurable effect on the well defined evaluation set. AutoML will maximize your evaluation criteria very well. However, if there is any mismatch between that criteria and what you are trying to do or any confounding factors in the data AutoML will not see that in the way a human expert (hopefully) would.

Michailidis:Well-defined problemsare good use cases for AutoML. In these problems, the preparatory work has already been done. There are clear inputs and outputs and well-defined success criteria. Under these constraints, AutoML can produce the best results.

InfoQ: What are some important research problems in AutoML?

Lazzeri:An interesting research open question in AutoML is the problem of feature selection in supervised learning tasks. This is also called the differentiable feature selection problem, a gradient-based search algorithm for feature selection. Feature selection remains a crucial step in machine learning pipelines and continues to see active research: a few researchers from Microsoft Research are developing a feature selection method that is statistically efficient and computationally efficient.

Tovbin:The two significant ones that come to my mind are the transparency and bias of trained models.

Both experts and users often disagree or do not understand why ML systems, especially automated ones, make specific predictions. It is crucial to provide deeper insights into model predictions to allow users to gain confidence in such predictive systems. For example, when providing recommendations of products to consumers, a system can additionally highlight the contributing factors that influenced particular recommendations. In order to provide such functionality, in addition to the trained model, one would need to maintain additional metadata and expose it together with provided recommendations, which often cannot be easily achieved due to the size of the data or privacy concerns.

The same concerns apply to model bias, but the problem has different roots, e.g., incorrect data collection resulting in skewed datasets. This problem is more challenging to address because we often need to modify business processes and costly software. With applied automation, one can detect invalid datasets and sometimes even data collection practices early and allow removing bias from model predictions.

de Wynter:I think first and foremost, provably efficient and correct algorithms for hyperparameter optimization (HPO) and NAS. The issue with AutoML is that you are solving the problem of, well, problem solving (or rather, approximation), which is notoriously hard in the computational sense. We as researchers often focus on testing a few open benchmarks and call it a day, but, more often than not, such algorithms fail to generalize, and, as it was pointed out last year, they tend to not outperform a simple random search.

There is also the issue that from a computational point of view, a fully automated AutoML system will face problems that are not necessarily similar to the ones that it has seen before; or worse, they might have a similar input but completely different solutions. Normally, this is related to the field of "learning to learn", which often involves some type of reinforcement learning (or neural network) to learn how previous ML systems solved a problem, and approximately solve a new one.

McGuire:I think there is a lot of interesting work to do on automating feature engineering and data cleaning. This is where most of the time is spent in machine learning and domain expertise can be hugely important. Add to that the fact that most real world data is extremely messy and complex and you see that the biggest gains from automation are from automating as much data processing and transformation as possible.

Automating the data preparation work that currently takes a huge amount of human expertise and time is not a simple task. Techniques that have removed the need for custom feature engineering in fields like vision and language do not currently generalize to small messy datasets. You can use deep learning to identify pictures of cats because a cat is a cat and all you need to do is get enough labeled data to let a complex model fill in the features for you. A table tracking customer information for a bank is very different from a table tracking customer information for a clothing store. Using these datasets to build models for your business is a small data problem. Such problems cannot be solved simply by throwing enough data at a model that can capture the complexities on its own. Hand cleaning and feature engineering can use many different approaches and determining the best is currently something of an art form. Turning these steps into algorithms that can be applied across a wide range of data is a challenging but important area of research.

Being able to automatically create and more importantly explain models of such real world data is invaluable. Storage is cheap but experts are not. There is a huge amount of data being collected in the world today. Automating the cleaning and featurization of such data provides the opportunity to use it to answer important real world questions.

Michailidis:I personally find the area of (automation-aided)explainable AIand machine learning interpretability very interesting and very important for bridging the gap between Blackbox modelling and a model that stakeholders can comfortably trust.

Another area I am interested in is "model compression". I think it can be a huge game changer if we can automatically go from a powerful, complicated solution down to a much simpler one that canbasically produce the same/similar performance, but much faster, utilizing less resources.

InfoQ What are some AutoML techniques and open-source tool practitioners can use now?

Lazzeri:AutoML democratizes the machine learning model development process, and empowers its users, no matter their data science expertise, to identify an end-to-end machine learning pipeline for any problem. There are several AutoML techniques that practitioners can use now, my favorite ones are:

Tovbin:In recent years we have seen an explosion of tooling for machine learning practitioners starting from cloud platforms (Google Cloud AutoML, Salesforce Einstein, AWS SageMaker Autopilot, H2O AutoML) to open-source software (TPOT, AutoSklearn, TransmogrifAI). Here one can find more information on these and other solutions:

de Wynter:Disclaimer: I work for Amazon. This is an active area of research, and theres quite a few well-known algorithms (with more appearing every day) focusing on different parts of the pipeline, and with well-known successes on various problems. Its hard to name them all, but some of the best-known examples are grid search, Bayesian, and gradient-based methods for HPO; and search strategies (e.g., hill climbing), population/RL-based methods (e.g., ENAS, DARTS for one-shot NAS, and the algorithm used for AmoebaNet) for NAS. On the other hand, full end-to-end systems have achieved good results for a variety of problems.

McGuire:Well of course I need to mention our own open source AutoML library TransmogrifAI. We focus mainly on automating data cleaning and feature engineering with some model selection and are built on top of Spark.

There are also a large number of interesting AutoML libraries coming out in python including Hyperopt, scikit-optimize, and TPOT.

Michailidis:In the open source space, H2O.ai for has a tool called AutoML, that incorporates many of the elements I mentioned in the previous questions. It is also very scalable and can be used in any OS.Other tools are the autosklearnor autoweka.

InfoQ: What are the limitations of AutoML?

Lazzeri:Auto ML is raising a few challenges such as model parallelization, result collection, resource optimization, and iteration. Searching for the best model and hyperparameters is an iterative process constrained by many limitations, such as compute, money and time. Machine learning pipelines provide a solution to answer those AutoML challenges with a clear definition of the process and automation features. Azure Machine Learning pipeline is an independently executable workflow of a complete machine learning task. Pipelines should focus on machine learning tasks such as:

Tovbin:One problem that AutoML does not handle well is complex data types. The majority of automated methods expect certain data types, e.g., numerical, categorical, text, geo coordinates, and, therefore, specific distributions. Such methods are a poor fit to handle more complicated scenarios, such as behavioral data, e.g., online store visit sessions.

Another problem is feature engineering that needs to consider domain-specific properties of the data. For example, if we would like to build a system to automate email classification for an insurance sales team. The input from the sales team members that define which parts of the email are and are not necessary would usually be more valuable than a metric. When building such systems, it is essential to reinforce the system with domain expert feedback to achieve more reliable results.

de Wynter:There is the practical limitation of the sheer amount of computational resources you have to throw at a problem to get it solved. It is not a true obstacle insofar as you can always use more machines, but--environmentally speakingthere are consequences associated with such a brute-force approach. Now, not all of AutoML is brute-force (as I mentioned earlier, this is a computationally hard problem, so brute-forcing a problem will only get you so far), and relies heavily on heuristics, but you still need sizable compute to solve a given AutoML problem, since you have to try out multiple solutions end-to-end. Theres a push in the science community to obtain better, "greener" algorithms, and I think its fantastic and the way to go.

From a theoretical point of view, the hardness of AutoML is quite interestingultimately, it is a statement on how intrinsically difficult the problem is, regardless of what type or number of computers you use. Add to that what I mentioned earlier that there is no such thing as "one model to rule them all," (theoretically) and AutoML becomes a very complex computational problem.

Lastly, current AutoML systems have a well-defined model search space (e.g., neural network layers, or a mix of classifiers), which is expected to work for every input problem. This is not the case. However, the search spaces that provably generalize well for all possible problems are somewhat hard to implement in practice, so there is still an open question on how to bridge such a gap.

McGuire:I dont think AutoML is ready to replace having a human in the loop. AutoML can build a model, but as we automate more and more of modeling, developing tools to provide transparency into what the model is doing becomes more and more important. Models are only as good as the data used to build them. As we move away from having a human spending time to clean and deeply understand relationships in the data we need to provide new tools to allow users of the model to understand what the models are doing. You need a human to take a critical look at the models and the elements of the data they use and ask: is this the right thing to predict, and is this data OK to use? Without tools to answer these questions for AutoML models we run the risk unintentionally shooting ourselves in the foot. We need the ability to ensure we are not using inappropriate models or perpetuating and reinforcing issues and biases in society without realizing it.

Michailidis:This was covered mostly in previous sections. Another thing I would like to mention is that performance is greatly affected by theresources allocated. More powerful machines will be to cover a search space of potential algorithms, features and techniques much faster.

These tools (unless they are built to support very specific applications)do not have domain knowledgebut are made to solve generic problems. For example, they would not know out of the box that if a field in the data is called "distance travelled" and another one is called "duration in time" , they can be used to compute "speed" which may be an important feature for a given task. They may have a chance to generate that feature via stochastically trying different transformations in the data but a domain expert would figure this out much quicker, hence these tools will produce better results under the hands of an experienced data practitioner. Hence, these tools will be more successful if they have the option to incorporate domain knowledge coming from the expert.

The panelists agreed that AutoML is important because it saves time and resources, removing much of the manual work and allowing data scientist to deliver business value faster and more efficiently. The panelists predict, however, that AutoML will not likely remove the need for a "human in the loop," particularly for industry-specific knowledge and the ability to translate business problems into machine-learning problems. Important research areas in AutoML include feature engineering and model explanation.

The panelists highlighted several existing commercial and open-source AutoML tools and described the different parts of the machine-learning process that can be automated. Several panelists noted that one limitation of AutoML is the amount of computational resources required, while others pointed out the need for domain knowledge and model transparency.

Francesca Lazzeri, PhD is an experienced scientist and machine learning practitioner with over 12 years of both academic and industry experience. She is the author of a number of publications, including technology journals, conferences, and books. She currently leads an international team of cloud advocates and AI developers at Microsoft. Before joining Microsoft, she was a research fellow at Harvard University in the Technology and Operations Management Unit. Find her on Twitter:@frlazzeriand Medium:@francescalazzeri

Matthew Tovbinis a Co-Founder of Faros AI, a software automation platform for DevOps. Before founding Faros AI, he acted as Software Engineering Architect at Salesforce, developing the Salesforce Einstein AI platform, which powers the worlds smartest CRM. In addition, Matthew is a creator of TransmogrifAI, co-organizer of Scala Bay meetup, presenter and an active member in numerous functional programming groups. Matthew lives in the San Francisco Bay area with his wife and kid, enjoys photography, hiking, good whisky and computer gaming.

Adrian de Wynteris an Applied Scientist in Alexa AIs Secure AI Foundations organization. His work can be categorized in three broad, sometimes overlapping, areas: language modeling, neural architecture search, and privacy-preserving machine learning. His research interests involve meta-learning and natural language understanding, with a special emphasis on the computational foundations of these topics.

Leah McGuireis a Machine Learning Architect at Salesforce, working on automating as many of the steps involved in machine learning as possible. This automation has been instrumental in developing and shipping a number of customer facing machine learning offerings at Salesforce. Our goal is to bring intelligence to each customers unique data and business goals. Before focusing on developing machine learning products, she completed a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of California, San Francisco, and at University of California, Berkeley, where she studied the neural encoding and integration of sensory signals.

MariosMichailidisis a Competitive data scientist at H2O.ai, developing the next generation of machine learning products in the AutoML space. He holds a Bsc in accounting Finance from the University of Macedonia in Greece, an Msc in Risk Management from the University of Southampton and a PhD in machine learning from the University College London (UCL) with focus on ensemble modelling. He is the creator ofKazAnova, a freeware GUI for credit scoring and data mining 100% made in Java as well as is the creator ofStackNet Meta-Modelling Framework. In his spare time he loves competing on data science challenges where he was ranked1st out of 500,000 members in the popular Kaggle.comdata science platform.

Originally posted here:
State of the Art in Automated Machine Learning - InfoQ.com

The pandemic has changed how criminals hide their cashand AI tools are trying to sniff it out – MIT Technology Review

The pandemic has forced criminal gangs to come up with new ways to move money around. In turn, this has upped the stakes for anti-money laundering (AML) teams tasked with detecting suspicious financial transactions and following them back to their source.

Key to their strategies are new AI tools. While some larger, older financial institutions have been slower to adapt their rule-based legacy systems, smaller, newer firms are using machine learning to look out for anomalous activity, whatever it might be.

It is hard to assess the exact scale of the problem. But according to the United Nations Office on Drugs and Crime, between 2% and 5% of global GDPbetween $800 billion and $2 trillion at current figuresis laundered every year. Most goes undetected. Estimates suggest that only around 1% of profits earned by criminals is seized.

And that was before covid-19 hit. Fraud is up, with fears around covid-19 creating a lucrative market for counterfeit protective gear or medication. More people spending time online also creates a bigger pool for phishing attacks and other scams. And, of course, drugs are still being bought and sold.

Lockdown made it harder to hide the proceedsat least to begin with. The problem for criminals is that many of the best businesses for laundering money were also those hit hardest by the pandemic. Small shops, restaurants, bars, and clubs are favored because they are cash-heavy, which makes it easier to mix up ill-gotten gains with legal income.

With bank branches closed, it has been harder to make large cash deposits. Wire transfer services like Western Unionwhich usually allow anyone to walk in off the street and send money overseasshut their premises, too.

But criminals are nothing if not opportunistic. As the normal channels for money laundering closed, new ones opened up. Vast sums of money have started flowing into small businesses again thanks to government bailouts. This creates a flurry of financial activity that provides cover for money laundering.

The upshot is that there are more demands being placed on AML tech. Older systems rely on hand-crafted rules, such as that transactions over a certain amount should raise an alert. But these rules lead to many false flags and real criminal transactions get lost in the noise. More recently, machine-learning based approaches try to identify patterns of normal activity and raise flags only when outliers are detected. These are then assessed by humans, who reject or approve the alert.

This feedback can be used to tweak the AI model so that it adjusts itself over time. Some firms, including Featurespace, a firm based in the US and UK that uses machine learning to detect suspicious financial activity, and Napier, another firm that builds machine learning tools for AML, are developing hybrid approaches in which correct alerts generated by an AI can be turned into new rules that shape the overall model.

The rapid shifts in behavior in recent months have made the advantages of more adaptable systems clear. Financial regulators around the world have released new guidance on what sort of activity AML teams should look out for but for many it was too late, says Araliya Samm, head of financial crime at Featurespace. When something like covid happens, where everybody's payment patterns change suddenly, you dont have time to put new rules in place.

You need tech that can catch it as it is happening, she says: Otherwise by the time youve detected something and alerted the people who need to know, the money is gone.

For Dave Burns, chief revenue officer for Napier, covid-19 caused long-simmering problems to boil over. This pandemic was the tipping point in many ways, he says. It's a bit of a wake-up call that we really need to think differently. And, he adds, some of the larger players in the industry have been caught flat-footed.

But that doesnt simply mean adopting the latest tech. You cant just do AI for AIs sake because that will spew out garbage, says Burns. Whats needed, he says, is a bespoke approach for each bank or payment provider.

AML technology still has a long way to go. The pandemic has revealed cracks in existing systems that have people worried, says Burns. And that means that things could change faster than they were going to. Were seeing a greater degree of urgency, he says. What is traditionally very long, bureaucratic decision-making is being accelerated dramatically.

More:
The pandemic has changed how criminals hide their cashand AI tools are trying to sniff it out - MIT Technology Review

Apple using machine learning for almost everything, and privacy-first approach actually better – 9to5Mac

Apples artificial intelligence (AI) chief says that Apple is using machine learning in almost every aspect of how we interact with our devices, but there is much more to come.

John Giannandrea says he moved from Google to Apple because the potential of machine learning (ML) to impact peoples lives is so much greater at the Cupertino company

Giannandrea spoke with ArsTechnicas Samuel Axon, outlining how Apple uses ML now.

Theres a whole bunch of new experiences that are powered by machine learning. And these are things like language translation, or on-device dictation, or our new features around health, like sleep and hand washing, and stuff weve released in the past around heart health and things like this. I think there are increasingly fewer and fewer places in iOS where were not using machine learning.

Its hard to find a part of the experience where youre not doing some predicative [work]. Like, app predictions, or keyboard predictions, or modern smartphone cameras do a ton of machine learning behind the scenes to figure out what they call saliency, which is like, whats the most important part of the picture? Or, if you imagine doing blurring of the background, youre doing portrait mode []

Savvy iPhone owners might also notice that machine learning is behind the Photos apps ability to automatically sort pictures into pre-made galleries, or to accurately give you photos of a friend named Jane when her name is entered into the apps search field []

Most [augmented reality] features are made possible thanks to machine learning []

Borchers also pointed out accessibility features as important examples. They are fundamentally made available and possible because of this, he said. Things like the sound detection capability, which is game-changing for that particular community, is possible because of the investments over time and the capabilities that are built in []

All of these things benefit from the core machine learning features that are built into the core Apple platform. So, its almost like, Find me something where were not using machine learning.

He was, though, surprised at areas where Apple had not been using ML before he joined the company.

When I joined Apple, I was already an iPad user, and I loved the Pencil, Giannandrea (who goes by J.G. to colleagues) told me. So, I would track down the software teams and I would say, Okay, wheres the machine learning team thats working on handwriting? And I couldnt find it.It turned out the team he was looking for didnt exista surprise, he said, given that machine learning is one of the best tools available for the feature today.

I knew that there was so much machine learning that Apple should do that it was surprising that not everything was actually being done.

That has changed, and will continue to change, however.

That has changed dramatically in the last two to three years, he said. I really honestly think theres not a corner of iOS or Apple experiences that will not be transformed by machine learning over the coming few years.

Its long been thought that Apples privacy focus wanting to do everything on the device, and not analyzing huge volumes of personal data means that it cant compete with Google, because it cant benefit from masses of data pulled from millions of users. Giannandrea says this is absolutely not the case.

I understand this perception of bigger models in data centers somehow are more accurate, but its actually wrong. Its actually technically wrong. Its better to run the model close to the data, rather than moving the data around.

In other words, you get better results when an ML model learns from your usage of your device than when it relies on aggregated data from millions of users. Local processing can also be used in situations where it simply wouldnt be realistic to send data to a server, like choosing the exact moment to act on you pressing the Camera app shutter release button for the best frame.

Understandably, Giannandrea wouldnt be drawn on what Apple is working on now, but did give one example of what might be possible when you combine the power of Apple Silicon Macs with machine learning.

Imagine a video editor where you had a search box and you could say, Find me the pizza on the table. And it would just scrub to that frame.

The whole piece is very much worth reading.

Photo: WFMJ

FTC: We use income earning auto affiliate links. More.

Check out 9to5Mac on YouTube for more Apple news:

Here is the original post:
Apple using machine learning for almost everything, and privacy-first approach actually better - 9to5Mac

Hey software developers, youre approaching machine learning the wrong way – The Next Web

I remember the first time I ever tried to learn to code. I was in middle school, and my dad, a programmer himself, pulled open a text editor and typed this on the screen:

Excuse me? I said.

It prints Hello World, he replied.

Whats public? Whats class? Whats static? Whats

Ignore that for now. Its just boilerplate.

But I was pretty freaked out by all that so-called boilerplate I didnt understand, and so I set out to learn what each one of those keywords meant. That turned out to be complicated and boring, and pretty much put the kibosh on my young coder aspirations.

Its immensely easier to learn software development today than it was when I was in high school, thanks to sites likecodecademy.com, the ease of setting up basic development environments, and a generalsway towards teaching high-level, interpreted languageslike Python and Javascript. You can go from knowing nothing about coding to writing your first conditional statements in a browser in just a few minutes. No messy environmental setup, installations, compilers, or boilerplate to deal with you can head straight to the juicy bits.

This is exactly how humans learn best. First, were taught core concepts at a high level, and onlythencan we appreciate and understand under-the-hood details and why they matter. We learn Python,thenC,thenassembly, not the other way around.

Unfortunately, lots of folks who set out to learn Machine Learning today have the same experience I had when I was first introduced to Java. Theyre given all the low-level details up front layer architecture, back-propagation, dropout, etc and come to think ML is really complicated and that maybe they should take a linear algebra class first, and give up.

Thats a shame, because in the very near future, most software developers effectively using Machine Learning arent going to have to think or know about any of that low-level stuff. Just as we (usually) dont write assembly or implement our own TCP stacks or encryption libraries, well come to use ML as a tool and leave the implementation details to a small set of experts. At that point after Machine Learning is democratized developers will need to understand not implementation details but instead best practices in deploying these smart algorithms in the world.

Today, if you want to build a neural network that recognizes your cats face in photos or predicts whether your next Tweet will go viral, youd probably set off to learn eitherTensorFloworPyTorch. These Python-based deep learning libraries are the most popular tools for designing neural networks today, and theyre both under 5 years old.

In its short lifespan, TensorFlow has already become way,waymore user-friendly than it was five years ago. In its early days, you had to understand not only Machine Learning but also distributed computing and deferred graph architectures to be an effective TensorFlow programmer. Even writing a simple print statement was a challenge.

Just earlier this fall, TensorFlow 2.0 officially launched, making the framework significantly more developer-friendly. Heres what a Hello-World-style model looks like in TensorFlow 2.0:

If youve designed neural networks before, the code above is straight-forward and readable. But if you havent or youre just learning, youve probably got some questions. Like, what is Dropout? What are these dense layers, how many do you need and where do you put them? Whatssparse_categorical_crossentropy? TensorFlow 2.0 removes some friction from building models, but it doesnt abstract away designing the actual architecture of those models.

So what will the future of easy-to-use ML tools look like? Its a question that everyone from Google to Amazon to Microsoft and Apple are spending clock cycles trying to answer. Also disclaimer it is whatIspend all my time thinking about as an engineer at Google.

For one, well start to see many more developers using pre-trained models for common tasks, i.e. rather than collecting our own data and training our own neural networks, well just use Googles/Amazons/Microsofts models. Many cloud providers already do something like this. For example, by hitting a Google Cloud REST endpoint, you can use a pre-trained neural networks to:

You can also run pre-trained models on-device, in mobile apps, using tools like GooglesML Kitor ApplesCore ML.

The advantage to using pre-trained models over a model you build yourself in TensorFlow (besides ease-of-use) is that, frankly, you probably cannot personally build a model more accurate than one that Google researchers, training neural networks on a whole Internet of data and tons GPUs andTPUs, could build.

The disadvantage to using pre-trained models is that they solve generic problems, like identifying cats and dogs in images, rather than domain-specific problems, like identifying a defect in a part on an assembly line.

But even when it comes to training custom models for domain-specific tasks, our tools are becoming much more user-friendly.

Screenshot of Teachable Machine, a tool for building vision, gesture, and speech models in the browser.

Googles freeTeachable Machinesite lets users collect data and train models in the browser using a drag-and-drop interface. Earlier this year, MIT released a similarcode-free interfacefor building custom models that runs on touchscreen devices, designed for non-coders like doctors.Microsoftand startups likelobe.aioffer similar solutions. Meanwhile,Google Cloud AutoMLis an automated model-training framework for enterprise-scale workloads.

As ML tools become easier to use, the skills that developers hoping to use this technology (but not become specialists) will change. So if youre trying to plan for where, Wayne-Gretsky-style, the puck is going, what should you study now?

What makes Machine Learning algorithms distinct from standard software is that theyre probabilistic. Even a highly accurate model will be wrong some of the time, which means its not the right solution for lots of problems, especially on its own. Take ML-powered speech-to-text algorithms: it might be okay if occasionally, when you ask Alexa to Turn off the music, she instead sets your alarm for 4 AM. Its not ok if a medical version of Alexa thinks your doctor prescribed you Enulose instead of Adderall.

Understanding when and how models should be used in production is and will always be a nuanced problem. Its especially tricky in cases where:

Take medical imaging. Were globally short on doctors and ML models are oftenmore accuratethan trained physicians at diagnosing disease. But would you want an algorithm to have the last say on whether or not you have cancer? Same thing with models that help judges decide jail sentences.Models can be biased, but so are people.

Understanding when ML makes sense to use as well as how to deploy it properly isnt an easy problem to solve, but its one thats not going away anytime soon.

Machine Learning models are notoriously opaque. Thats why theyre sometimes called black boxes. Its unlikely youll be able to convince your VP to make a major business decision with my neural network told me so as your only proof. Plus, if you dont understand why your model is making the predictions it is, you might not realize its making biased decisions (i.e. denying loans to people from a specific age group or zip code).

Its for this reason that so many players in the ML space are focusing on building Explainable AI features tools that let users more closely examine what features models are using to make predictions. We still havent entirely cracked this problem as an industry, but were making progress. In November, for example, Google launched a suite of explainability tools as well as something calledModel Cards a sort of visual guide for helping users understand the limitations of ML models.

Googles Facial Recognition Model Card shows the limitations of this particular model.

There are a handful of developers good at Machine Learning, a handful of researchers good at neuroscience, and very few folks who fall in that intersection. This is true of almost any sufficiently complex field. The biggest advances well see from ML in the coming years likely wont be from improved mathematical methods but from people with different areas of expertise learning at least enough Machine Learning to apply it to their domains. This is mostly the case in medical imaging, for example, where themost exciting breakthroughs being able to spot pernicious diseases in scans are powered not by new neural network architectures but instead by fairly standard models applied to a novel problem. So if youre a software developer lucky enough to possess additional expertise, youre already ahead of the curve.

This, at least, is whatIwould focus on today if I were starting my AI education from scratch. Meanwhile, I find myself spending less and less time building custom models from scratch in TensorFlow and more and more time using high-level tools like AutoML and AI APIs and focusing on application development.

This article was written by Dale Markowitz, an Applied AI Engineer at Google based in Austin, Texas, where she works on applying machine learning to new fields and industries. She also likes solving her own life problems with AI, and talks about it on YouTube.

Read more:
Hey software developers, youre approaching machine learning the wrong way - The Next Web

Moderna Announced Partnership With Amazon Web Services for Their Analytics and Machine Learning Services – Science Times

The $29 billion biotech company Modernahas announced on Wednesday, August 5, that they will be partnering with Amazon Web Servicesto become their preferred cloud partner.

Moderna is currently considered the lead COVID-19 vaccine developer as it is the first company to reach the third phase of vaccine development in late July.

(Photo : Getty Images)CAMBRIDGE, MASSACHUSETTS - MAY 08: A view of Moderna headquarters on May 08, 2020 in Cambridge, Massachusetts. Moderna was given FDA approval to continue to phase 2 of Coronavirus (COVID-19) vaccine trials with 600 participants. (Photo by Maddie Meyer/Getty Images)

Read Also: 'Very Low' Dose Moderna COVID-19 Vaccine Elicits Immune Response with No Side Effect, First Human Trial Show

Vaccine development could take years of research and lab testing before it can be administered to people. As one of the leading companies who joined the race for a COVID-19 vaccine, Moderna gave 30,000 peoplelast week their first vaccine candidate that reached phase 3 of testing the United States.

At present, Moderna has been using AWS to run its everyday operations in accounting and inventory management and also to power its production facility, robotic tools, and engineering systems. According to the press release by the biotech company, this allows them to achieve greater efficiency and visibility across its operations.

Moderna CEO Stphane Bancel said that with AWS, the company's researchers could have the ability to quickly design and perform experiments and, in no time, uncover novel insights to produce faster life-saving treatments.

Modernizing IT infrastructures through the use of artificial intelligenceis one of the things that biotech companies, such as Moderna, are looking into helping them in the race of developing new medicines and treatments.

The race for a COVID-19 vaccine has made the biotechnology sector a sought-after market these days. Like AWS, its rival Microsoft Azurehas recently inked a big cloud and artificial intelligence deal with drugmaker Novartis as well.

According to biotech analyst Michael Yee, the vaccine test results could be made public in October.

Read Next: Is Moderna Coronavirus Vaccine Leading the Race? Early Trials Show the Jab Gives Off Immunity

Moderna Therapeutics' co-founder and chairman, Dr. Noubar Afeyan, said that the biotech company is the first US firm to enter Phase 3 of a clinical trial for their candidate COVID-19 vaccine.

The blind trial will include 30,000 volunteers in which half of them will receive Moderna's drug, and the other half will receive a placebo of sodium and water. The volunteers are 18 years old and older who are interested in participating in the clinical trial.

Afeyan said that the Food and Drug Administration's authorization would be based on how fast some 150 cases of the infection occur. If the trial proves to be successful, those people who received the vaccine should have a disproportionately lower number of cases than those who received the placebo.

At the end of the day, the FDAmust ensure that the vaccine meets all the necessary safety and efficacy measures. The administration mandated at least 50% protection value for any vaccine before considering authorizing them.

Moreover, Moderna hopes to have authorization from the FDA by the last quarter of 2020. Afeyan said that they expect to have 500 million to 1 billion doses of their vaccine ready for distribution once they get the FDA authorization.

Read More: Moderna COVID-19 Vaccine Trial Volunteer Suffered 'Severe Adverse Reaction'

Read the original here:
Moderna Announced Partnership With Amazon Web Services for Their Analytics and Machine Learning Services - Science Times