Automated Machine Learning is the Future of Data Science – Analytics Insight

As the fuel that powers their progressing digital transformation endeavors, organizations wherever are searching for approaches to determine as much insight as could reasonably be expected from their data. The accompanying increased demand for advanced predictive and prescriptive analytics has, thus, prompted a call for more data scientists capable with the most recent artificial intelligence (AI) and machine learning (ML) tools.

However, such highly-skilled data scientists are costly and hard to find. Truth be told, theyre such a valuable asset, that the phenomenon of the citizen data scientist has of late emerged to help close the skills gap. A corresponding role, as opposed to an immediate substitution, citizen data scientists need explicit advanced data science expertise. However, they are fit for producing models utilizing best in class diagnostic and predictive analytics. Furthermore, this ability is incomplete because of the appearance of accessible new technologies, for example, automated machine learning (AutoML) that currently automate a significant number of the tasks once performed by data scientists.

The objective of autoML is to abbreviate the pattern of trial and error and experimentation. It burns through an enormous number of models and the hyperparameters used to design those models to decide the best model available for the data introduced. This is a dull and tedious activity for any human data scientist, regardless of whether the individual in question is exceptionally talented. AutoML platforms can play out this dreary task all the more rapidly and thoroughly to arrive at a solution faster and effectively.

A definitive estimation of the autoML tools isnt to supplant data scientists however to offload their routine work and streamline their procedure to free them and their teams to concentrate their energy and consideration on different parts of the procedure that require a more significant level of reasoning and creativity. As their needs change, it is significant for data scientists to comprehend the full life cycle so they can move their energy to higher-value tasks and sharpen their abilities to additionally hoist their value to their companies.

At Airbnb, they continually scan for approaches to improve their data science workflow. A decent amount of their data science ventures include machine learning and numerous pieces of this workflow are tedious. At Airbnb, they use machine learning to build customer lifetime value models (LTV) for guests and hosts. These models permit the company to improve its decision making and interactions with the community.

Likewise, they have seen AML tools as generally valuable for regression and classification problems involving tabular datasets, anyway, the condition of this area is rapidly progressing. In outline, it is accepted that in specific cases AML can immensely increase a data scientists productivity, often by an order of magnitude. They have used AML in many ways.

Unbiased presentation of challenger models: AML can rapidly introduce a plethora of challenger models utilizing a similar training set as your incumbent model. This can help the data scientist in picking the best model family. Identifying Target Leakage: In light of the fact that AML builds candidate models amazingly fast in an automated way, we can distinguish data leakage earlier in the modeling lifecycle. Diagnostics: As referenced prior, canonical diagnostics can be automatically created, for example, learning curves, partial dependence plots, feature importances, etc. Tasks like exploratory data analysis, pre-processing of data, hyper-parameter tuning, model selection and putting models into creation can be automated to some degree with an Automated Machine Learning system.

Companies have moved towards enhancing predictive power by coupling huge data with complex automated machine learning. AutoML, which uses machine learning to create better AI, is publicized as affording opportunities to democratise machine learning by permitting firms with constrained data science expertise to create analytical pipelines equipped for taking care of refined business issues.

Including a lot of algorithms that automate that writing of other ML algorithms, AutoML automates the end-to-end process of applying ML to real-world problems. By method for representation, a standard ML pipeline consists of the following: data pre-processing, feature extraction, feature selection, feature engineering, algorithm selection, and hyper-parameter tuning. In any case, the significant ability and time it takes to execute these strides imply theres a high barrier to entry.

In an article distributed on Forbes, Ryohei Fujimaki, the organizer and CEO of dotData contends that the discussion is lost if the emphasis on AutoML systems is on supplanting or decreasing the role of the data scientist. All things considered, the longest and most challenging part of a typical data science workflow revolves around feature engineering. This involves interfacing data sources against a rundown of wanted features that are assessed against different Machine Learning algorithms.

Success with feature engineering requires an elevated level of domain aptitude to recognize the ideal highlights through a tedious iterative procedure. Automation on this front permits even citizen data scientists to make streamlined use cases by utilizing their domain expertise. More or less, this democratization of the data science process makes the way for new classes of developers, offering organizations a competitive advantage with minimum investments.

View post:
Automated Machine Learning is the Future of Data Science - Analytics Insight

Nothing to hide? Then add these to your ML repo, Papers with Code says DEVCLASS – DevClass

In a bid to make advancements in machine learning more reproducible, ML resource and Facebook AI Research (FAIR) appendage Papers With Code has introduced a code completeness checklist for machine learning papers.

It is based on the best practices the Papers with Code team has seen in popular research repositories and the Machine Learning Reproducibility Checklist which Joelle Pineau, FAIR Managing Director, introduced in 2019, as well as some additional work Pineau and other researchers did since then.

Papers with Code was started in 2018 as a hub for newly published machine learning papers that come with source code, offering researchers an easy to monitor platform to keep up with the current state of the art. In late 2019 it became part of FAIR to further accelerate our growth, as founders Robert Stojnic and Ross Taylor put it back then.

As part of FAIR, the project will get a bit of a visibility push since the new checklist will also be used in the submission process for the 2020 edition of the popular NeurIPS conference on neural information processing systems.

The ML code completeness checklist is used to assess code repositories based on the scripts and artefacts that have been provided within it to enhance reproducibility and enable others to more easily build upon published work. It includes checks for dependencies, so that those looking to replicate a papers results have some idea what is needed in order to succeed, training and evaluation scripts, pre-trained models, and results.

While all of these seem like useful things to have, Papers with Code also tried using a somewhat scientific approach to make sure they really are indicators for a useful repository. To verify that, they looked for correlations between the number of fulfilled checklist items and the star-rating of a repository.

Their analysis showed that repositories that hit all the marks got higher ratings implying that the checklist score is indicative of higher quality submissions and should therefore encourage researchers to comply in order to produce useful resources. However, they simultaneously admitted that marketing and the state of documentation might also play into a repos popularity.

They nevertheless went on recommending to lay out the five elements mentioned and link to external resources, which always is a good idea. Additional tips for publishing research code can be found in the projects GitHub repository or the report on NeurIPS reproducibility program.

View original post here:
Nothing to hide? Then add these to your ML repo, Papers with Code says DEVCLASS - DevClass

How Will the Emergence of 5G Affect Federated Learning? – IoT For All

As development teams race to build outAI tools, it is becoming increasingly common to train algorithms on edge devices. Federated learning, a subset of distributed machine learning, is a relatively new approach that allows companies to improve their AI tools without explicitlyaccessing raw user data.

Conceived byGoogle in 2017, federated learning is a decentralized learning model through which algorithms are trained on edge devices. In regard to Googles on-device machine learning approach, the search giant pushed their predictive text algorithm to Android devices, aggregated the data and sent a summary of the new knowledge back to a central server. To protect the integrity of the user data, this data was eitherdelivered via homomorphic encryption or differential privacy, which is the practice of adding noise to the data in order to obfuscate the results.

Generally speaking, with federated learning, the AI algorithm is trained without ever recognizing any individual users specific data; in fact, the raw data never leaves the device itself. Only aggregated model updates are sent back. These model updates are thendecrypted upon delivery to the central server. Test versions of the updated model are then sent back to select devices, and after this process is repeated thousands of times, the AI algorithm is significantly improvedall while never jeopardizing user privacy.

This technology is expected to make waves in the healthcare sector. For example, federated learning is currently being explored by medical start-up Owkin. Seeking to leverage patient data from several healthcare organizations, Owkin uses federated learning to build AI algorithms with data from various hospitals. This can have far-reaching effects, especially as its invaluable that hospitals are able to share disease progression data with each other while preserving the integrity of patient data and adhering to HIPAA regulations. By no means is healthcare the only sector employing this technology; federated learning will be increasingly used by autonomous car companies, smart cities, drones, and fintech organizations. Several other federated learning start-ups are coming to market, includingSnips,S20.ai, andXnor.ai, which was recently acquired by Apple.

Seeing as these AI algorithms are worth a great deal of money, its expected that these models will be a lucrative target for hackers. Nefarious actors will attempt to perform man-in-the-middle attacks. However, as mentioned earlier, by adding noise and aggregating data from various devices and then encrypting this aggregate data, companies can make things difficult for hackers.

Perhaps more concerning are attacks that poison the model itself. A hacker could conceivably compromise the model through his or her own device, or by taking over another users device on the network. Ironically, because federated learning aggregates the data from different devices and sends the encrypted summaries back to the central server, hackers who enter via a backdoor are given a degree of cover. Because of this, it is difficult, if not impossible, to identify where anomalies are located.

Althoughon-device machine learning effectively trains algorithms without exposing raw user data, it does require a ton of local power and memory. Companies attempt to circumvent this by only training their AI algorithms on the edge when devices are idle, charging, or connected to Wi-Fi; however, this is a perpetual challenge.

As 5G expands across the globe, edge devices will no longer be limited by bandwidth and processing speed constraints.According to a recentNokia report, 4G base stations can support 100,000 devices per square kilometer; whereas, the forthcoming 5G stations will support up to 1 million devices in the same area.Withenhanced mobile broadband and low latency, 5G will provide energy efficiency, while facilitating device-to-device communication (D2D). In fact, it is predicted that 5G will usher in a 10-100x increase in bandwidth and a 5-10x decrease in latency.

When 5G becomes more prevalent, well experience faster networks, more endpoints, and a larger attack surface, which may attract an influx of DDoS attacks. Also, 5G comes with a slicing feature, which allows slices (virtual networks) to be easily created, modified, and deleted based on the needs of users.According to aresearch manuscript on the disruptive force of 5G, it remains to be seen whether this network slicing component will allay security concerns or bring a host of new problems.

To summarize, there are new concerns from both a privacy and a security perspective; however, the fact remains: 5G is ultimately a boon for federated learning.

Read more:
How Will the Emergence of 5G Affect Federated Learning? - IoT For All

Its Time to Improve the Scientific Paper Review Process But How? – Synced

Head image courtesy Getty Images

The level-headed evaluation of submitted research by other experts in the field is what grants scientific journals and academic conferences their respected positions. Peer review determines which papers get published, and that in turn can determine which academic theories are promoted, which projects are funded, and which awards are won.

In recent years however peer review processes have come under fire especially from the machine learning community with complaints of long delays, inconsistent standards and unqualified reviewers.

A new paper proposes replacing peer review with a novel State-Of-the-Art Review (SOAR) system, a neoteric reviewing pipeline that serves as a plug-and-play replacement for peer review.

SOAR improves scaling, consistency and efficiency and can be easily implemented as a plugin to score papers and offer a direct read/dont read recommendation. The team explain that SOAR evaluates a papers efficacy and novelty by calculating the total occurrences in the manuscript of the terms state-of-the-art and novel.

If only a solution were that simple but yes, SOAR was an April Fools prank.

The paper was a product of SIGBOVIK 2020, a yearly satire event of the Association for Computational Heresy and Carnegie Mellon University that presents humorous fake research in computer science. Previous studies have included Denotational Semantics of Pidgin and Creole, Artificial Stupidity, Elbow Macaroni, Rasterized Love Triangles, and Operational Semantics of Chevy Tahoes.

Seriously though, since 1998 the volume of AI papers in peer-reviewed journals has grown by more than 300 percent, according to the AI Index 2019 Report. Meanwhile major AI conferences like NeurIPS, AAAI and CVPR are setting new paper submission records every year.

This has inevitably led to a shortage of qualified peer reviewers in the machine learning community. In a previous Synced story, CVPR 2019 and ICCV 2019 Area Chair Jia-Bin Huang introduced research that used deep learning to predict whether a paper should be accepted based solely on its visual appearance. He told Synced the idea of training a classifier to recognize good/bad papers has been around since 2010.

Huang knows that although his model achieves decent classification performance it is unlikely to ever be used in an actual conference. Such analysis and classification might however be helpful for junior authors when considering how to prepare for their paper submissions.

Turing awardee Yoshua Bengio meanwhile believes the fundamental problem with todays peer review process lies in a publish or perish paradigm that can sacrifice paper depth and quality in favour of speedy publication.

Bengio blogged on the topic earlier this year, proposing a rethink of the overall publication process in the field of machine learning, with reviewing being a crucial element to safeguard research culture amid the fields exponential growth in size.

Machine learning has almost completely switched to a conference publication model, Bengio wrote, and we go from one deadline to the next every two months. In the lead-up to conference submission deadlines, many papers are rushed and things are not checked properly. The race to get more papers out especially as first or co-first author can also be crushing and counterproductive. Bengio is strongly urging the community to take a step back, think deeply, verify things carefully, etc.

Bengio says he has been thinking of a potentially different publication model for ML, where papers are first submitted to a fast turnaround journal such as the Journal of Machine Learning Research for example, and then conference program committees select the papers they like from the list of accepted and reviewed (scored) papers.

Conferences have played a central role in ML, as they can speed up the research cycle, enable interactions between researchers, and generate a fast turnaround of ideas. And peer-reviewed journals have for decades been the backbone of the broader scientific research community. But with the growing popularity of preprint servers like arXiv and upcoming ML conferences going digital due to the COVID-19 pandemic, this may be the time to rethink, redesign and reboot the ML paper review and publication process.

Journalist: Yuan Yuan & Editor: Michael Sarazen

Like Loading...

Go here to read the rest:
Its Time to Improve the Scientific Paper Review Process But How? - Synced

Infragistics Adds Predictive Analytics, Machine Learning and More – Patch.com

Infragistics is excited to announce a major upgrade to its embedded data analytics software, Reveal. In addition to its fast, easy integration into any platform or deployment option, Reveal's newest features address the latest trends in data analytics: predictive and advanced analytics, machine learning, R and Python scripting, big data connectors, and much more. These enhancements allow businesses to quickly analyze and gain insights from internal and external data to sharpen decision-making.

Some of these advanced functions include:

Outliers DetectionEasily detect points in your data that are anomalies and differ from much of the data set.

Time Series ForecastingReveal will make visual predictions based on historical data and trends, useful in applications such as sales and revenue forecasting, inventory management, and others.

Linear RegressionReveal finds the relationship between two variables and creates a line that approximates the data, letting you easily see historical or future trends.

"Our new enhancements touch on the hottest topics and market trends, helping business users take actions based on predictive data," says Casey McGuigan, Reveal Product Manager. "And because Reveal is easy to use, everyday users get very sophisticated capabilities in a powerfully simple platform."

Machine Learning and Predictive Analytics

Reveal's new machine learning feature identifies and visually displays predictions from user data to enable more educated business-decision making. Reveal reads data from Microsoft Azure and Google BigQuery ML Platforms to render outputs in beautiful visualizations.

R and Python Scripting

R and Python are the leading programming languages focused on data analytics. With Reveal support, users such as citizen data scientists can leverage their knowledge around R and Python directly in Reveal to create more powerful visualizations and data stories. They only need to paste a URL to their R or Python scripts in Reveal or paste their code into the Reveal script editor.

Big Data Access

With support for Azure SQL, Azure Synapse, Goggle Big Query, Salesforce, and AWS data connectors, Reveal pulls in millions of records. And it creates visualizations fastReveal's been tested with 100 million records in Azure Synapse and it loads in a snap.

Additional connectors include those for Google Analytics and Microsoft SQL Server Reporting Services (SSRS). While Google Analytics offers reports and graphics, Reveal combines data from many sources, letting users build mashup-type dashboards with beautiful visualizations that tell a compelling story.

New Themes Match App's Look and Feel

The latest Reveal version includes two new themes that work in light and dark mode. They are fully customizable to match an app's look and feel when embedding Reveal into an application and provide control over colors, fonts, shapes and more.

More Information

For in-depth information about Reveal's newest features, visit the Reveal blog, Newest Reveal FeaturesPredictive Analytics, Big Data and More.

About Infragistics

Over the past 30 years, Infragistics has become the world leader in providing user interface development tools and multi-platform enterprise software products and services to accelerate application design and development, including building business solutions for BI and dashboarding. More than two million developers use Infragistics enterprise-ready UX and UI toolkits to rapidly prototype and build high-performing applications for the cloud, web, Windows, iOS and Android devices. The company offers expert UX services and award-winning support from its locations in the U.S., U.K., Japan, India, Bulgaria and Uruguay.

Here are ways to observe Good Friday, Easter and Passover services from the safety of your home.

By Gus Saltonstall, Patch Staff

Here is the original post:
Infragistics Adds Predictive Analytics, Machine Learning and More - Patch.com

6 Ways Machine Learning Is Revolutionizing the Warehouse – Robotics Tomorrow

While machine learning offers many benefits to the company, try to move your employees around to other human-based areas of the business. Here are some ways that you can begin using machine learning in a warehouse environment.

6 Ways Machine Learning Is Revolutionizing the Warehouse

Cory Levins, Director of Business Development | Air Sea Containers

Advancements in technology are impacting the warehouse industry all the time with new ways to track shipments, communicate, organize warehouses and more. But, machine learning is one of the newest types of technology on the block, and it's helping to improve warehouse safety and keep warehouses more organized and on top of shipments. When it comes to machine learning, its important to remember how this is impacting human jobs as more automated machines take the place of human workers. If you own or manage a warehouse and youre interested in integrating machine learning tech, its important to consider what will happen to employees. While machine learning offers many benefits to the company, try to move your employees around to other human-based areas of the business. Here are some ways that you can begin using machine learning in a warehouse environment.

Machine learning is a phrase used to refer to a series of algorithms and statistics that a computer uses to notice patterns and essentially learn how to complete a given task. Machine learning is a subset of artificial intelligence, which is the development of a computer that is able to carry out tasks typically performed only by humans. Unlike with robotic machines that are programmed to do one specific task or movement, machine learning encourages the computer to analyze and understand data so it can figure out how to do the task, not mindlessly carry out an order. This means that machines with DRL (deep reinforcement learning) are able to sense their surroundings and react to a limited extent. One of the greatest benefits of machine learning is the fact that it can eliminate a lot of human errors, though machine learning is not perfect either.

Machine learning can vastly improve your overall supply chain because these machines were designed to pick up patterns. If a device analyzes your supply chain system, it may be able to notice areas where defeats are created or identify parts of the system that can be improved, making the entire process more efficient. With a human assessor, it would take more time to inspect each product and notice a pattern of defective items. A computer can do this analysis quickly, and there is a smaller chance that the machine will accidentally skip over a defeat, whereas the human eye may miss something that could become a larger problem in the future.

Many companies are beginning to move toward entirely automated warehouses in which machines perform the tasks of preparing packages for shipment and tracking inventory. Although this would eliminate human jobs, it would be much more efficient and, again, eliminate the chance of error. Still, there would be a need for humans to help fix machines and oversee the process, shifting the jobs from one segment of the warehouse industry to another. In the beginning, it may seem expensive to invest in the equipment, but in the long run, having machine-learned robots run the warehouse would reduce your overhead costs.

Not only can machine-learned computers package your shipments, but they are also able to organize products. From the moment a shipment enters the warehouse, these devices can scan and report the shipment, keeping accurate track of your inventory. For employees working in warehouses, this task can be monotonous and time-consuming, but when machines are used in place of humans, the task can be completed much quicker and leave your employees with more time to carry out tasks that only a human can accomplish.

If your warehouse is carrying items that have a specific expiration date or food products that can go bad, you want to ensure you dont store any of these items past the sell-by date. With machines that have been conditioned with some level of artificial intelligence, its easy to transmit data detailing when items will expire and need to be sold or disposed of. Humans can easily forget which items need to be sent out first, which causes waste when products are thrown away. Machine learning can help reduce this issue. Integrating eco-friendly packaging into your warehouse procedures can also help reduce waste by lowering your warehouses carbon footprint.

Its always important to provide your customers with actual human customer support, as it can be frustrating trying to explain an issue to a robot. There are still some benefits to machine-learned customer support. If you have a website for your warehouse, you can add a support chat feature that allows people to communicate with a computer via a messaging app. This is a great way to allow people to ask quick and simple questions without clogging your phone lines or asking people to wait on long hold times. You can also use automated customer support on your phone line to filter out simple questions, but you must always offer a human support option as well.

Warehouses can be dangerous places with so many heavy boxes and large machinery moving around one space. Of course, you should have strict safety practices in place to keep your employees as safe as possible. When you integrate machines into the process, you can make the environment even less dangerous and improve warehouse safety. If AI robots are responsible for driving dangerous machinery and storing inventory in hard-to-reach places, its less likely that an accident will occur. And even if it does, a human worker will not be the one to suffer the consequences.

If youre looking for ways to upgrade your warehousing procedures, consider adding some machines that have been programmed with machine learning capabilities. Youll be able to make your business more efficient and reduce the chances of workplace injuries. You dont need to automate the entire warehouse if you value the work and impact of human employees, but youll find that machine-learned robots can speed up the process and make things easier for workers as well.

This post does not have any comments. Be the first to leave a comment below.

You must be logged in before you can post a comment. Login now.

The industry's first comprehensive Robot Integrator Program saves robot integrators significant time and cost investments by allowing them to mark each cell compliant with ANSI/RIA R15.06 with the TUV Rheinland Mark. As opposed to a traditional certification or an on-site field labeling, TV Rheinland's Robot Integrator Program certifies the knowledge and skill-set of robot integrators in addition to testing robotic cells and processes against ANSI/RIA R15.06. This reduces the need for frequent onsite or off site testing and allows manufacturers to apply a single TV Rheinland label to multiple cells. The Robot Integrator Program individually assesses a robot integrator's understanding of the ANSI/RIA R15.06 standard along with the ability to consistently produce compliant robot cells. Following the requirements and procedures of the new program will enable robot integrators to produce individually compliant robotic cells under one serialized TV Rheinland Mark, which meets the national electric code and allows acceptance by Authorities Having Jurisdiction (AHJ) and end users.

See the original post:
6 Ways Machine Learning Is Revolutionizing the Warehouse - Robotics Tomorrow

What Will Be the Future Prospects Of the Machine Learning Software Market? Trends, Factors, Opportunities and Restraints – Science In Me

Regal Intelligence has added latest report on Machine Learning Software Market in its offering. The global market for Machine Learning Software is expected to grow impressive CAGR during the forecast period. Furthermore, this report provides a complete overview of the Machine Learning Software Market offering a comprehensive insight into historical market trends, performance and 2020 outlook.

The report sheds light on the highly lucrative Global Machine Learning Software Market and its dynamic nature. The report provides a detailed analysis of the market to define, describe, and forecast the global Machine Learning Software market, based on components (solutions and services), deployment types, applications, and regions with respect to individual growth trends and contributions toward the overall market.

Request a sample of Machine Learning Software Market report @ https://www.regalintelligence.com/request-sample/102477

Market Segment as follows:

The global Machine Learning Software Market report highly focuses on key industry players to identify the potential growth opportunities, along with the increased marketing activities is projected to accelerate market growth throughout the forecast period. Additionally, the market is expected to grow immensely throughout the forecast period owing to some primary factors fuelling the growth of this global market. Finally, the report provides detailed profile and data information analysis of leading Machine Learning Software company.

Key Companies included in this report: Microsoft, Google, TensorFlow, Kount, Warwick Analytics, Valohai, Torch, Apache SINGA, AWS, BigML, Figure Eight, Floyd Labs

Market by Application: Application A, Application B, Application C

Market by Types: On-Premises, Cloud Based

Get Table of Contents @ https://www.regalintelligence.com/request-toc/102477

The Machine Learning Software Market research presents a study by combining primary as well as secondary research. The report gives insights on the key factors concerned with generating and limiting Machine Learning Software market growth. Additionally, the report also studies competitive developments, such as mergers and acquisitions, new partnerships, new contracts, and new product developments in the global Machine Learning Software market. The past trends and future prospects included in this report makes it highly comprehensible for the analysis of the market. Moreover, The latest trends, product portfolio, demographics, geographical segmentation, and regulatory framework of the Machine Learning Software market have also been included in the study.

Global Machine Learning Software Market Research Report 2020

Buy The Report @ https://www.regalintelligence.com/buyNow/102477

To conclude, the report presents SWOT analysis to sum up the information covered in the global Machine Learning Software market report, making it easier for the customers to plan their activities accordingly and make informed decisions. To know more about the report, get in touch with Regal Intelligence.

Read more:
What Will Be the Future Prospects Of the Machine Learning Software Market? Trends, Factors, Opportunities and Restraints - Science In Me

How Microsoft Teams will use AI to filter out typing, barking, and other noise from video calls – VentureBeat

Last month, Microsoft announced that Teams, its competitor to Slack, Facebooks Workplace, and Googles Hangouts Chat, had passed 44 million daily active users. The milestone overshadowed its unveiling of a few new features coming later this year. Most were straightforward: a hand-raising feature to indicate you have something to say, offline and low-bandwidth support to read chat messages and write responses even if you have poor or no internet connection, and an option to pop chats out into a separate window. But one feature, real-time noise suppression, stood out Microsoft demoed how the AI minimized distracting background noise during a call.

Weve all been there. How many times have you asked someone to mute themselves or to relocate from a noisy area? Real-time noise suppression will filter out someone typing on their keyboard while in a meeting, the rustling of a bag of chips (as you can see in the video above), and a vacuum cleaner running in the background. AI will remove the background noise in real time so you can hear only speech on the call. But how exactly does it work? We talked to Robert Aichner, Microsoft Teams group program manager, to find out.

The use of collaboration and video conferencing tools is exploding as the coronavirus crisis forces millions to learn and work from home. Microsoft is pushing Teams as the solution for businesses and consumers as part of its Microsoft 365 subscription suite. The company is leaning on its machine learning expertise to ensure AI features are one of its big differentiators. When it finally arrives, real-time background noise suppression will be a boon for businesses and households full of distracting noises. Additionally, how Microsoft built the feature is also instructive to other companies tapping machine learning.

Of course, noise suppression has existed in the Microsoft Teams, Skype, and Skype for Business apps for years. Other communication tools and video conferencing apps have some form of noise suppression as well. But that noise suppression covers stationary noise, such as a computer fan or air conditioner running in the background. The traditional noise suppression method is to look for speech pauses, estimate the baseline of noise, assume that the continuous background noise doesnt change over time, and filter it out.

Going forward, Microsoft Teams will suppress non-stationary noises like a dog barking or somebody shutting a door. That is not stationary, Aichner explained. You cannot estimate that in speech pauses. What machine learning now allows you to do is to create this big training set, with a lot of representative noises.

In fact, Microsoft open-sourced its training set earlier this year on GitHub to advance the research community in that field. While the first version is publicly available, Microsoft is actively working on extending the data sets. A company spokesperson confirmed that as part of the real-time noise suppression feature, certain categories of noises in the data sets will not be filtered out on calls, including musical instruments, laughter, and singing.

Microsoft cant simply isolate the sound of human voices because other noises also happen at the same frequencies. On a spectrogram of speech signal, unwanted noise appears in the gaps between speech and overlapping with the speech. Its thus next to impossible to filter out the noise if your speech and noise overlap, you cant distinguish the two. Instead, you need to train a neural network beforehand on what noise looks like and speech looks like.

To get his points across, Aichner compared machine learning models for noise suppression to machine learning models for speech recognition. For speech recognition, you need to record a large corpus of users talking into the microphone and then have humans label that speech data by writing down what was said. Instead of mapping microphone input to written words, in noise suppression youre trying to get from noisy speech to clean speech.

We train a model to understand the difference between noise and speech, and then the model is trying to just keep the speech, Aichner said. We have training data sets. We took thousands of diverse speakers and more than 100 noise types. And then what we do is we mix the clean speech without noise with the noise. So we simulate a microphone signal. And then you also give the model the clean speech as the ground truth. So youre asking the model, From this noisy data, please extract this clean signal, and this is how it should look like. Thats how you train neural networks [in] supervised learning, where you basically have some ground truth.

For speech recognition, the ground truth is what was said into the microphone. For real-time noise suppression, the ground truth is the speech without noise. By feeding a large enough data set in this case hundreds of hours of data Microsoft can effectively train its model. Its able to generalize and reduce the noise with my voice even though my voice wasnt part of the training data, Aichner said. In real time, when I speak, there is noise that the model would be able to extract the clean speech [from] and just send that to the remote person.

Comparing the functionality to speech recognition makes noise suppression sound much more achievable, even though its happening in real time. So why has it not been done before? Can Microsofts competitors quickly recreate it? Aichner listed challenges for building real-time noise suppression, including finding representative data sets, building and shrinking the model, and leveraging machine learning expertise.

We already touched on the first challenge: representative data sets. The team spent a lot of time figuring out how to produce sound files that exemplify what happens on a typical call.

They used audio books for representing male and female voices, since speech characteristics do differ between male and female voices. They used YouTube data sets with labeled data that specify that a recording includes, say, typing and music. Aichners team then combined the speech data and noises data using a synthesizer script at different signal to noise ratios. By amplifying the noise, they could imitate different realistic situations that can happen on a call.

But audiobooks are drastically different than conference calls. Would that not affect the model, and thus the noise suppression?

That is a good point, Aichner conceded. Our team did make some recordings as well to make sure that we are not just training on synthetic data we generate ourselves, but that it also works on actual data. But its definitely harder to get those real recordings.

Aichners team is not allowed to look at any customer data. Additionally, Microsoft has strict privacy guidelines internally. I cant just simply say, Now I record every meeting.'

So the team couldnt use Microsoft Teams calls. Even if they could say, if some Microsoft employees opted-in to have their meetings recorded someone would still have to mark down when exactly distracting noises occurred.

And so thats why we right now have some smaller-scale effort of making sure that we collect some of these real recordings with a variety of devices and speakers and so on, said Aichner. What we then do is we make that part of the test set. So we have a test set which we believe is even more representative of real meetings. And then, we see if we use a certain training set, how well does that do on the test set? So ideally yes, I would love to have a training set, which is all Teams recordings and have all types of noises people are listening to. Its just that I cant easily get the same number of the same volume of data that I can by grabbing some other open source data set.

I pushed the point once more: How would an opt-in program to record Microsoft employees using Teams impact the feature?

You could argue that it gets better, Aichner said. If you have more representative data, it could get even better. So I think thats a good idea to potentially in the future see if we can improve even further. But I think what we are seeing so far is even with just taking public data, it works really well.

The next challenge is to figure out how to build the neural network, what the model architecture should be, and iterate. The machine learning model went through a lot of tuning. That required a lot of compute. Aichners team was of course relying on Azure, using many GPUs. Even with all that compute, however, training a large model with a large data set could take multiple days.

A lot of the machine learning happens in the cloud, Aichner said. So, for speech recognition for example, you speak into the microphone, thats sent to the cloud. The cloud has huge compute, and then you run these large models to recognize your speech. For us, since its real-time communication, I need to process every frame. Lets say its 10 or 20 millisecond frames. I need to now process that within that time, so that I can send that immediately to you. I cant send it to the cloud, wait for some noise suppression, and send it back.

For speech recognition, leveraging the cloud may make sense. For real-time noise suppression, its a nonstarter. Once you have the machine learning model, you then have to shrink it to fit on the client. You need to be able to run it on a typical phone or computer. A machine learning model only for people with high-end machines is useless.

Theres another reason why the machine learning model should live on the edge rather than the cloud. Microsoft wants to limit server use. Sometimes, there isnt even a server in the equation to begin with. For one-to-one calls in Microsoft Teams, the call setup goes through a server, but the actual audio and video signal packets are sent directly between the two participants. For group calls or scheduled meetings, there is a server in the picture, but Microsoft minimizes the load on that server. Doing a lot of server processing for each call increases costs, and every additional network hop adds latency. Its more efficient from a cost and latency perspective to do the processing on the edge.

You want to make sure that you push as much of the compute to the endpoint of the user because there isnt really any cost involved in that. You already have your laptop or your PC or your mobile phone, so now lets do some additional processing. As long as youre not overloading the CPU, that should be fine, Aichner said.

I pointed out there is a cost, especially on devices that arent plugged in: battery life. Yeah, battery life, we are obviously paying attention to that too, he said. We dont want you now to have much lower battery life just because we added some noise suppression. Thats definitely another requirement we have when we are shipping. We need to make sure that we are not regressing there.

Its not just regression that the team has to consider, but progression in the future as well. Because were talking about a machine learning model, the work never ends.

We are trying to build something which is flexible in the future because we are not going to stop investing in noise suppression after we release the first feature, Aichner said. We want to make it better and better. Maybe for some noise tests we are not doing as good as we should. We definitely want to have the ability to improve that. The Teams client will be able to download new models and improve the quality over time whenever we think we have something better.

The model itself will clock in at a few megabytes, but it wont affect the size of the client itself. He said, Thats also another requirement we have. When users download the app on the phone or on the desktop or laptop, you want to minimize the download size. You want to help the people get going as fast as possible.

Adding megabytes to that download just for some model isnt going to fly, Aichner said. After you install Microsoft Teams, later in the background it will download that model. Thats what also allows us to be flexible in the future that we could do even more, have different models.

All the above requires one final component: talent.

You also need to have the machine learning expertise to know what you want to do with that data, Aichner said. Thats why we created this machine learning team in this intelligent communications group. You need experts to know what they should do with that data. What are the right models? Deep learning has a very broad meaning. There are many different types of models you can create. We have several centers around the world in Microsoft Research, and we have a lot of audio experts there too. We are working very closely with them because they have a lot of expertise in this deep learning space.

The data is open source and can be improved upon. A lot of compute is required, but any company can simply leverage a public cloud, including the leaders Amazon Web Services, Microsoft Azure, and Google Cloud. So if another company with a video chat tool had the right machine learners, could they pull this off?

The answer is probably yes, similar to how several companies are getting speech recognition, Aichner said. They have a speech recognizer where theres also lots of data involved. Theres also lots of expertise needed to build a model. So the large companies are doing that.

Aichner believes Microsoft still has a heavy advantage because of its scale. I think that the value is the data, he said. What we want to do in the future is like what you said, have a program where Microsoft employees can give us more than enough real Teams Calls so that we have an even better analysis of what our customers are really doing, what problems they are facing, and customize it more towards that.

Original post:
How Microsoft Teams will use AI to filter out typing, barking, and other noise from video calls - VentureBeat

With A.I., the Secret Life of Pets Is Not So Secret – The New York Times

This article is part of our latest Artificial Intelligence special report, which focuses on how the technology continues to evolve and affect our lives.

Most dog owners intuitively understand what their pet is saying. They know the difference between a bark for Im hungry and one for Im hurt.

Soon, a device at home will be able to understand them as well.

Furbo, a streaming camera that can dispense treats for your pet, snap photos and send you a notification if your dog is barking, provides a live feed of your home that you can check on a smartphone app.

In the coming months, Furbo is expected to roll out a new feature that allows it to differentiate among kinds of barking and alert owners if a dogs behavior appears abnormal.

Thats sort of why dogs were hired in the first place, to alert you of danger, said Andrew Bleiman, the North America general manager for Tomofun, the company that makes Furbo. So we can tell you not only is your dog barking, but also if your dog is howling or whining or frantically barking, and send you basically a real emergency alert.

The ever-expanding world of pet-oriented technology now allows owners to toss treats, snap a dog selfie and play with the cat all from afar. And the artificial intelligence used in such products is continuing to refine what we know about animal behavior.

Mr. Bleiman said the new version of Furbo was a result of machine learning from the video data of thousands of users. It relied on 10-second clips captured with its technology that users gave feedback on. (Furbo also allows users to opt out of sharing their data.)

The real evolution of the product has been on the computer vision and bioacoustics side, so the intelligence of the software, he said. When you have a camera that stares at a dog all day and listens to dogs all day, the amount of data is just tremendous.

The Furbo team is even able to refine the data by the breed or size of a dog: I can tell you, for example, that on average, at least as much as our camera picks up, a Newfoundland barks four times a day and a Husky barks 36 times a day.

Petcube is another interactive pet camera, the latest iteration of which is equipped with the Amazon Alexa voice assistant.

Yaroslav Azhnyuk, the companys chief executive and co-founder, is confident that A.I. is helping pet owners better understand their animals behavior. The company is working on being able to detect unusual behaviors.

We started applying algorithms to understand pet behavior and understand what they might be trying to say or how they are feeling, he said. We can warn you that OK, your dogs activity is lower than usual, you should maybe check with the vet.

Before the coronavirus pandemic forced many pet owners to work from home during the day, they were comforted by the ability to check on their pet in real time, which had driven demand for all kinds of cameras. Mr. Bleiman said the average Furbo user would check on their pet more than 10 times a day during the workweek.

Petcube users spent about 50 minutes a week talking to their pet through the camera, Mr. Azhnyuk said.

The same way you want to call your mom or child, you want to call your dog or cat, he said. Weve seen people using Petcubes for turtles and for snakes and chickens and pigs, all kinds of animals.

Now that shes working from home as part of measures to contain the spread of coronavirus in New York City, Patty Lynch, 43, has plenty of time to watch her dog, Sadie. When shes away from her Battery Park apartment, she uses a Google Nest to keep an eye on her. Ms. Lynch originally bought the camera three years ago to stream video of Sadie while she recovered from surgery.

I get alerts whenever she moves around, Ms. Lynch said. I also get noise alerts if she starts barking at something. Ill be able to go in and then see her in real time and figure out what shes doing.

Sometimes I just like to check in on her, she said. I just look at her and she makes me smile.

Lionel P. Robert Jr., associate professor at the University of Michigans school of information and a core faculty member at Michigans Robotics Institute, said A.I.-enabled technology has so far centered on the owners need for assurance that their pet was OK while they were away from home.

He predicted that future technology would focus more on the wellness of the pet.

There are a lot of people using these cameras because when they see their pet they feel assured and they feel comfortable. Right now, its less for the pet and more for the humans, he said.

Imagine if all that data was being fed to your veterinarian in real time and theyre sending back data. The idea of well-being for the pet, its weight, how far its walking.

Mr. Robert noted that other parts of the world had gone a step further with technology: Theyre actually adopting robotic pets.

While products like Petcube and Furbo are mostly used by dog owners, there are A.I. devices out there for cats as well. Many people track them throughout the day using interactive cameras, and one start-up has devised an intelligent laser for automated playtime.

Yuri Brigance came up with the idea about four years ago, after his divorce. He was away from the house, working up to 10 hours a day, and was worried about his two cats at home.

This idea came up of using a camera to track animals, where their positions are in the room and moving the laser intelligently instead of randomly so that they have something more real to chase, he said.

The result was Felik, a toy that can be scheduled via an app for certain playtimes and has features such as zone restriction, which designates areas in the home the laser cant go, such as on furniture.

Mr. Brigance said his product did not store video in the cloud and required an internet connection to work, like many video products. It analyzes data in the device.

We use machine-learning models to perform whats called semantic segmentation, which is basically separating the background, the room and all the objects in it, from interesting objects, things that are moving, like cats or humans, Mr. Brigance explained.

The device then determines where the cat has been and what it is currently doing, and predicts what it is about to do next, so it can create a playful game that mirrors chasing live prey.

The laser toy, Mr. Brigance said, has provided his cats, and those of his customers, with hours upon hours of playtime.

Some people are using it almost on a daily basis and theyre recording things like where they used to have a cat that would scratch furniture, that would get really agitated if it had nothing to do, that this actually prevents them from destroying the house, he said.

Or cats that meow in the morning and try to wake up their owners if you set a schedule for this thing to activate in the morning, it can distract the cat and let you sleep a little bit longer.

See the original post here:
With A.I., the Secret Life of Pets Is Not So Secret - The New York Times

What Is The Hiring Process Of Data Scientists At IBM? – Analytics India Magazine

In a world that is increasingly becoming digitalized, businesses are relying more heavily on data analytics to drive decision-making. In this setting, tech giant IBM has secured a firm footing in the domain of data science. With the opportunities that the company could offer in this space, how can aspirants get a leg up on a data science career with IBM?

According to the companys Asia Pacific Leader of Technical Elite Team for Data Warehouse & AI, Vishal Chahal, demonstrating holistic skills around ML Ops as well as Data Ops can go a long way.

As a data scientist, experience in handling data Ops has become far more important than just a candidates educational background, he says. They will need to demonstrate the stack skills where they have dealt with data before. A statistical background will be considered an added bonus, he adds.

The technical skills that IBM looks for in data science candidates encompasses ML Ops, which includes some of the newer skills, like debiasing and machine learning model runtime management.

In addition to that, they need to possess adequate skills in the areas of Data ops, data wrangling and domain knowledge, which is essentially a cross section between industry knowledge and applicability of machine learning in those industries, says Chahal.

Although the company does not overemphasize candidates educational background, they need to have a good grasp of the relevant competencies mentioned above. With several platforms abound with machine learning certifications, Chahal feels that that may be a good approach for data science aspirants to upskill themselves.

These certifications can verify their awareness about various platforms, tools, libraries and packages that are being used across enterprises today, as well as the familiarity or the ability to work with open source or enterprise/vendor-specific tools.

In fact, IBM also offers code patterns on data science for free, which explores the use of machine learning approaches to different industry scenarios and solution domains.

ALSO READ: Why Companies Like IBM Are Coming Up With Free Data Science Courses

Although the benefits of certifications cannot be emphasized enough, with changing times, the industry requirements for data scientists have evolved too. While online courses have their place in the sector, today the industry is looking for data science stack skills in developing programs, which requires augmented certification along with hands-on experience of having worked on projects. That is, the overall requirement is dual, and this trend is being observed in the hiring practices at IBM as well.

If you are starting off your career as a data scientist, certifications will certainly help you establish your skill, says Chahal. But what will give you the edge is demonstrating a few projects to prove that you have applied the acquired knowledge and skills, he adds.

According to him, having published code or open sourced on GitHub on data science, or having participated in Kaggle competitions, would prove a candidates credential that they have hands-on experience in different fields of data science. As an accomplished data scientist, we look for experience of having worked on a variety of projects in the data science technology stack.

Concurs Sharath Kumar RK, who has been working as a data scientist at IBM for nearly four years. While recruiters will test aspirants ability to solve problems on paper, the prime focus will still be on their understanding of challenges at both a micro and a macro level, he says.

READ MORE: Is Data Science For You? This IBM Data Scientist Tells How To Figure It Out

According to Chahal, once hired, data scientists at IBM, while focused on getting insights, have to adopt three important data science related best practices:

According to Chahal, while the popular hiring trend has seen the recruitment of experts from pure data science background with little or no industry experience in the beginning, this is no longer the case.

Lately, the trend has moved towards recruiting data scientists with stack skills, including Data Ops and ML Ops, or of data scientists possessing domain knowledge, he says. Some companies continue to recruit pure data science experts. However, they do look for additional certification, which proves their ability to work across enterprise-wide platforms or open source tools, he adds.

comments

View post:
What Is The Hiring Process Of Data Scientists At IBM? - Analytics India Magazine