Guide: Which video call apps are safe to use? – Malay Mail

Of the 15 apps in total that are discussed in the guide, only three platforms didnt make the cut for Mozilla: Houseparty, Discord, and Doxy.me. SoyaCincau pic

KUALA LUMPUR, May 1 As many around the world resort to remote working conditions due to the ongoingCovid-19pandemic, the spotlight is certainly focused on video call applications available today.

Most of us have heard aboutZooms notorious problems with privacy and security, but what about the other alternatives?

Mozillahas just published a guide that takes a look at some of the more popular apps specifically, the security and privacy features of each platform.

The guide bases its assessment of each app based on its Minimum Security Standards: encryption, security updates, strong password requirements, vulnerability management, and privacy policies. Apps that meet all five criteria are certified as meeting Minimum Security Standards.

Of the 15 apps in total that are discussed in the guide, only three platforms didnt make the cut for Mozilla: Houseparty, Discord, and Doxy.me.

However, the guide has recently been updated after Discord updated its requirements for strong passwords, which means that only two apps donotmeet Mozillas minimum standards now.

Are Houseparty and Doxy.me unsafe?

Houseparty, the platform developed by Epic Games (Fortnite creators), isnt really a simple, standalone video conferencing app. Instead, the idea is for friends to hang out within the app; there are built-in games and Snapchat integration on the platform.

The issue here stems from the lack of a strong password requirement. On the app, all that is required is a minimum of five characters this means that even passwords like 12345 are accepted.

However, the app uses encryption, receives regular security updates, manages vulnerabilities, and has a privacy policy that you can viewhere.

Doxy.me, on the other hand, received the lowest score of all 15 apps reviewed.

Again, there is no strong password requirement, and Doxy.me doesnothave a a bug bounty program, although there are other mechanisms in place for users to report vulnerabilities.

The biggest issue, according to Mozilla, has to do with Doxy.mes simple, web browser-based platform. The programme is used by medical personnel to provide medical advice remotely, and does not require an app.

This means the platform fails on the security updates requirement instead, Mozilla advises users to keep their browsers up to date if/when using Doxy.me.

What about Zoom?

To be utterly frank with you, I expected Zoom to fall within the group of apps that dont meet the required standards.

However and despite the bad press the company has drawn Mozilla says that the platfrom meets all five of its security standards.

Despite the prevalence of something called Zoombombing, where strangers hijack video conferences to broadcast inappropriate content, Mozilla appears to suggest that any past mistakes are being rectified.

To Zooms credit, they have acknowledged their mistakes and seem to be working hard to fix them.

Its also worth noting that while encryption is present in all of the apps discussed, there are different types or levels of encryption.

The type of encryption that most users really want in their messaging apps is end-to-end encryption the kind present in apps like WhatsApp. Additionally, all of the apps support a feature that alerts participants if a video call is being recorded.

In general, competition is definitely heating up in the video conferencing space. Mozilla says that that can only be a good thing for users, which certainly makes sense.

The case study of Zoom is a perfect example of how accountability should work here; the notoriety of the the platforms problems resulted in a quick response from the company, with founder Eric Yuan even apologising for missteps in aninterviewonCNN.

And with remote working conditions expected to continue for now, its worth thinking about the security and privacy aspect of video calls especially for professional purposes. SoyaCincau

More:
Guide: Which video call apps are safe to use? - Malay Mail

Jesse Kline on COVID-19: Keeping government secure and saving taxpayer money with open source – National Post

In this era of social distancing, many have turned to videoconferencing as a means of staying in touch with friends, family and colleagues. And although it dragged its feet for quite some time, the House of Commons has now gone virtual, as well. But why is Parliament relying on a foreign company thats selling a piece of software with a raft of known security issues, instead of finding a made-in-Canada solution that would allow us to protect our data and save taxpayer money?

On Tuesday, the full House convened for the first time over Zoom, the videoconferencing software that has become a household name during this pandemic, with its user base exploding from 10 million daily users in December, to 300 million today. Zoom, however, has come under increased scrutiny about its substandard security and lax privacy controls.

The company outright lied about using end-to-end encryption. We learned that it has access to decryption keys, meaning it can potentially snoop on conversations. A team from the University of Toronto found that the software was sometimes sending encryption keys through servers located in communist China, even if none of the participants in the call were from that country. And the term Zoombombing has entered the lexicon, with many meetings being spied on or actively disrupted by people spouting racism and displaying Nazi imagery.

A parliamentary spokesperson told CBC that the version of the software being used by the House has added security features and that most parliamentary proceedings are open to the public anyway, so privacy is less of an issue (cabinet meeting are being held using something else entirely).

Fair enough. But given that the FBI has warned teachers not to use Zoom and many companies such as Daimler, Ericsson, SpaceX and Postmedia and governments including Germany, Taiwan and Singapore have banned its use outright, it seems like Parliament should have had some reservations about it.

Much has been made in recent weeks about future-proofing Canada to withstand future crises by producing more supplies here at home. As Ive written previously, this is problematic because protectionism doesnt ensure we have adequate supplies of a given product and its impossible to predict exactly what we will need to meet the next emergency.

When it comes to software, however, its a different matter entirely, because there is a huge variety of free and open source software packages available that are already powering much of the worlds critical infrastructure and can easily be adapted to Canadas needs.

For the uninitiated, open source refers to software that is developed in the open and given away for free. It is often written by teams that can include many people, from unpaid volunteers, to employees of some of the worlds largest tech firms. Even if youve never heard of open source, chances are that you are running it, or using technology that is based on it.

A majority of websites run on open source. The open source Linux operating system is the basis for Googles Android and Chrome OS systems, and powers a plethora of Internet of Things devices, from routers, to smart TVs, to home automation systems.

Another videoconferencing platform thats seen a sharp increase in popularity is Jitsi. While its run by a company called 88, which offers free and paid plans, its also open source, meaning anyone can run a Jitsi server and anyone with enough knowledge can audit its source code to figure out exactly how it works and whether there are any potential security vulnerabilities.

The advantage of the government selecting open systems, like Jitsi, instead of proprietary ones, like Zoom, is that it would allow government to run all its systems in-house, instead of relying on foreign companies to transmit and store data.

It would also give government the ability to conduct security audits of its systems, which is much easier to do when you can see the code that a software package was built with, rather than trying to figure out how a black box works without being able to open it up.

And while there would be an initial cost to purchasing the necessary hardware and ensuring the government has the proper expertise to implement and maintain it, there would be significant savings for taxpayers in the long run, as the government would be able to stop paying for costly software licenses.

Jitsi is already being used by companies like WeSchool, an Italian firm that runs online classroom software that is being used by 500,000 educators and students during this crisis. And in February, the South Korean government began switching its desktops from Windows 7 to Linux, which it expects will save it significant sums of money in the future.

Security researchers have warned the government that Zoom is a privacy disaster waiting to happen. In order to protect our critical information technology infrastructure, especially that which is tasked with running our democratic institutions, from foreign interference and espionage, we need to seriously look at running these systems in Canada, with software we can trust.

Finding open source solutions is the best way to go about doing that.

National Postjkline@nationalpost.comTwitter.com/accessd

See the article here:
Jesse Kline on COVID-19: Keeping government secure and saving taxpayer money with open source - National Post

Red Hat’s current and former CEOs in conversation about open source – ITWeb

Paul Cormier, CEO of Red Hat, and Jim Whitethurst, president of IBM.

The Red Hat Summit was supposed to be held in San Francisco this year, but instead took place online.

The open source companys in-person events are grand affairs - the conference centre seems to be stained red as thousands of customers, partners, staff and the media flock to what must be the largest gathering of open source enthusiasts on the planet.

One benefit of holding the conference online is that more people can attend; 38 000 signed up for Red Hat Summit 2020 Virtual Experience, which took place from 28 to 29 April.

It fell to the companys new CEO Paul Cormier, to kick the proceedings off from his home office near Boston, Massachusetts, this week. First, he spoke to Jim Whitehurst, the former CEO, who is now president of IBM and responsible for that companys cloud and cognitive software organisation. He also oversaw the acquisition of Red Hat by IBM for $34 billion last year.

Whitehurst joined the company in 2007, and Cormier asked him what he thought the biggest change had been over the last 12 years. Without a doubt it had been the rise of open source software, Whitehurst said.

Back then, we were trying to convince people that open source could be a viable alternative to traditional software, so that Linux could replace Unix. I got a lot of questions around whether its secure and reliable. If you look at where weve come fromits the default choice now. That change has been amazing.

Red Hat had less than 2 000 people when he joined, and now employs over 15 000. Now, he said, almost every large enterprise isnt just using open source technologies in their stacks, theyre also thinking about how to implement the open source way, such as Agile, or DevOps methodologies.

(Theyre using open source) not just in how they develop software, but how they run their businesses. People have learned that if theyre going to innovate, it requires a different way of working.

Whitehurst said the biggest topic on IBM customers mind is innovation.

How do you innovate? We went from a world where value creation was much more about how you executed or made things of higher quality, or cheaper. And now value creation is more about growth - how you invent new things, or how you create whole new markets that didnt exist before. How do you invent new ways to interact with your customers?

As for the future of Red Hat within IBM, and whether it could continue to operate with any degree of autonomy, Whitehurst said the two companies had shared a vision of hybrid cloud long before Red Hat was acquired. While the larger company can help to scale the adoption of hybrid cloud, it also competes with partners in the open source ecosystem.

Red Hat has to stand alone so that it can work with competitors of IBM to ensure that the platform is neutral and available to anyone.

Its critical to ensure that the industry has a horizontal platform, and IBM supports that but recognises we have to leave Red Hat separate to ensure its success.

The summits are a chance for Red Hatters, as theyre known, to dress up and show their allegiance to the company. Many wear red fedoras, a nod to the companys Linux distribution project. Others wear crimson dresses, or red shoes. Whitehurst usually wears a stylish shirt of the palest pink to the summits, but this year in a perhaps symbolic move he chose a shirt in a shade of blue, a shade not too far from that of the logo of his new company. He did however include a red fedora within eyeshot of his cameras lens.

Read this article:
Red Hat's current and former CEOs in conversation about open source - ITWeb

Private Internet Access announces third year of WireGuard sponsorship – Privacy News Online

Private Internet Access is happy to announce that we are sponsoring the WireGuard project as a bronze company donor. Private Internet Access first sponsored WireGuard in 2018, and with our 2020 sponsorship has now been a WireGuard sponsor for three years. Private Internet Access believes in sponsorship as a way of giving back to the community and is proud to sponsor WireGuard.

This years WireGuard sponsorship is special because it coincides with our release of WireGuard support on all desktop clients and mobile applications. Private Internet Access first released WireGuard support as part of a beta in March, 2020 and brought it to all users in April, 2020. Similarly, CyberGhost and ZenMate, other VPN companies under the same parent company, KAPE, have also supported WireGuard as bronze company donors.

Private Internet Access is committed to providing a secure and private no logging VPN service to our customers and as part of that commitment, we sponsor open source projects and organizations that champion security, privacy, and civil liberties. WireGuard is a new VPN protocol that has been widely acclaimed so much so that it is now part of the Linux kernel. Learn more about WireGuard in the PIA WireGuide.

Because it is a free and open source software (FOSS) project, WireGuard development is supported by developers that donate their time, as well as companies that donate funds. FOSS underlies much of the internet and computing technologies that we use today and Private Internet Access will continue to support FOSS projects like WireGuard because that is part of the PIA ethos. Without the support of the entire community, projects like WireGuard would not be able to exist to advance our internet. View a full list of organizations and projects sponsored by Private Internet Access on this page.

WireGuard is a registered trademark of Jason A. Donenfeld.

Read the rest here:
Private Internet Access announces third year of WireGuard sponsorship - Privacy News Online

IOTech: Bridging the OT-IT Divide – EE Journal

Ive just been talking to the folks from a jolly interesting company called IOTech. One of the things they told me that really struck a chord was that their technology bridges the OT-IT divide. But what is the OT-IT divide, and why does it need to be bridged? I hear you cry. Well, Im glad you asked, because I feel the urge to expound, explicate, and elucidate (dont worry; Im a professional).

As an aside, IOTech started in Newcastle upon Tyne, which is a university city on the River Tyne in northeast England (IOTech is now headquartered in Edinburgh, with sales and marketing throughout Europe and America). With its twin city, Gateshead, Newcastle used to be a major shipbuilding and manufacturing hub during the Industrial Revolution; its now transmogrified itself into a center of business, arts, and sciences. The reason I mention this here is that I almost went to university in Newcastle I went up there for the interviews and thought it was a fantastic institution in a gorgeous city but I ended up getting a better offer from Sheffield Hallam University (that is, Sheffield said theyd accept me. LOL).

As another aside, the term Geordie is both a nickname for a person from the Tyneside area of North East England and the dialect used by its inhabitants. All the Geordies Ive ever met have had a wonderful sense of humor, and the Geordie dialect has a sing-song quality thats very pleasing to the ear, but I fear we are in danger of wandering off into the weeds

Now, this next part isnt an aside (I know youre surprised), but rather it is laying a foundation for what is to come. Have you noticed how everybody seems to be talking about the edge at the moment? It seems you can barely start to read an IoT-centric article without hearing tell of things like edge analytics, edge computing, and edge security. Another area of confusion is when people talk about the Internet of Things (IoT) and the Industrial IoT (IIoT). What exactly do we mean by these terms, and how do they relate to the edge?

The problem is that everyone has different understandings as to what the term edge actually means. In fact, we discussed a lot of this in my column What the FAQ is the Edge vs. the Far Edge? As we noted in that column, part of this depends on who you are and your function in the world. If you work for a cloud service provider, for example, then the people who connect to and use your services may live thousands of miles away, so you might regard almost anything outside of your facility, including internet service providers (ISPs), as being the edge.

By comparison, if you are an ISP, then you will regard your customers in the form of homes, stores, factories, etc. as being the edge. Even when you get to a factory, for example, different people will have different views as to what constitutes the edge.

The term edge means different things to different people (Image source; Max Maxfield)

At some stage, we may want to talk about the IoT and IIoT devices that are located at the very edge of the internet the ones containing the sensors and actuators that interface to the real world. In this case, some people use terms like the far edge and extreme edge as an aid to visualizing the relative location of these devices in the scheme of things.

But wait, theres more (I can barely believe it myself). The term information technology (IT) refers to the use of computers and networks to store, retrieve, transmit, and manipulate data or information. In the case of an industrial environment like a factory, all of this is overseen by the companys IT department. I know this is unfair, but I cant help but visualize these folks as relaxing in a luxurious air-conditioned common room sipping cups of designer coffee while occasionally deigning to field questions from their users by superciliously saying things like, Are you sure youve turned it on? You have? Well, in that case turn it off, then turn it on again, and then call back if its still not working! (No, of course Im not bitter; why do you ask?)

But what about the heroes working in the trenches those charged with the monitoring and control of physical devices, such as motors, generators, valves, and pumps? These brave guys and gals are the cream of the operational technology (OT) department.

All of which leads us to the terms thin (OT) edge and thick (IT) edge. The idea here is that the thin edge refers to the domain of the OT group working with automation-centric data, while the thick edge refers to the realm of the IT group who want to take that data and run with it, but who cannot access or use it in its raw form. Thus, one of the primary roles of the OT group is to convert the raw data into a form that can be consumed by the IT department. (Im reminded of the phrase the thin blue line, which refers figuratively to the position of police in society as the force that holds back chaos. Similarly, we can think of the OT folks at the thin edge as holding back the chaos of the real world while transforming it into something the rest of us can use.)

The problem is that, assuming they deign to talk to each other at all, the OT and IT teams speak different languages. What is required is some way to bridge the divide between these groups, which brings us back to the folks at IOTech whose technology does just that (its like dj vu all over again; hang on, didnt somebody just say that?).

Lets start with EdgeX Foundry, which is a vendor-neutral open-source platform hosted by the Linux Foundation that provides a common framework for IIoT edge computing. The goal of the EdgeX Foundry is the simplification and standardization of edge computing architectures applicable in IIoT scenarios.

In this context, the term South Side refers to a heterogeneous set of devices, sensors, actuators, and other IoT objects that produce the raw data. Contra wise, the term North Side refers to the fog and/or the cloud where the data will eventually be aggregated, stored, and analyzed. The role of the EdgeX Foundry is to take the raw data from the South Side (the domain of the OT group), and treat and process it into a form suitable to be handed over to the North Side (the realm of the IT department).

The term microservices refers to a software development technique a variant of the service-oriented architecture structural style that arranges an application as a collection of loosely coupled services. In a microservices architecture, services are fine-grained, and the protocols are lightweight. Thereason I mention this here is that the EdgeX Foundry platform is structured in different layers, each one composed of multiple microservices. This modular architecture allows users to easily scale, update, and distribute the logic into different systems, while also improving maintainability.

Having said all this, companies typically dont directly deploy open source software for business-critical, safety-critical, or mission-critical applications. Instead, they go to someone that will offer enterprise-level services and support. Consider the relationship between the Linux operating system (OS) and Red Hat, for example. Linux is open source, so companies could theoretically simply load it onto all of their workstations and servers themselves for free. Instead, they prefer to go to a company like Red Hat, which offers enterprise-level service and support. (Founded in 1993, Red Hat was acquired by IBM in 2019).

The way I think of this is that the Arduino is open source, and the folks at Arduino provide all of the hardware and software files I need to build my own boards, but even so I still find it quicker, easier, and more reliable to buy fully functional Arduinos from trusted vendors.

Based on this, we might think of IOTech as being the Red Hat of the EdgeX Foundry world. In addition to providing commercial support for trusted deployment of the open source baseline EdgeX Foundry platform, IOTech also offers numerous value-add features that are not available in the baseline version, such as the following:

Lets start with IOTechs edgeXpert. On the South Side, edgeXpert provides a wealth of connectors and profiles that can ingest raw data from sensors in any way, shape, or form anything from the twisted wire electrical specifications and protocols of the 1960s to the packet-based interfaces of todays most sophisticated sensor devices (the connectors handle the protocols, while the profiles specify what to expect coming in and what to send out). On the North Side, there are the enterprise level connectors and profiles required to export the processed data into the fog and the cloud.

The IOTech edge platform solution (Image source: IOTech)

In between, the IOTech edge platform solution includes core data services, such as the ability to normalize edge data, aggregate edge data from multiple sensors, and store edge data for subsequent use. There are also services to analyze and process the data, along with services for security and management.

One key point is that edgeXpert boasts a distributable architecture its various microservices can run on a single host of be distributed across a cluster based on resource availability. In some cases, portions of edgeXpert might run on the edge devices themselves, such as smart surveillance cameras, for example. The containerized deployment of microservices supports portability, while distribution of microservices provides scalability and failover support, where failover is a method of protecting computer systems from failure, in which standby equipment automatically takes over when the main system fails.

edgeXpert boasts a distributable architecture (Image source: IOTech)

In the same way that you dont want the airbag in your car deploying based on a decision thats made in the cloud, there are some situations in factories when a decision needs to be made and acted on quickly. Thus, edgeXpert can also be used to provide rules-based control functions along the lines of If this temperature exceeds this value, then turn the machine off (more sophisticate control systems from other vendors can be integrated into the framework).

The good news is that edgeXpert is ideal for situations that can be addressed with near real-time responsiveness, which equates to around 80% of use cases. The better news is that, for those time-critical systems that require ultra-low latency response times and support for real-time deterministic data processing, the edgeXrt extension to edgeXpert can be brought into play.

I get to talk to a lot of companies that are working on super cool technologies like artificial intelligence (AI) and machine learning (ML). Very often, the folks from these companies drop terms like the AIoT into the conversation. As I noted in my column What the FAQ are the IoT, IIoT, IoHT, and AIoT?:

According to the IoT Agenda, The Artificial Intelligence of Things (AIoT) is the combination of artificial intelligence (AI) technologies with the Internet of Things (IoT) infrastructure to achieve more efficient IoT operations, improve human-machine interactions, and enhance data management and analytics [] the AIoT is transformational and mutually beneficial for both types of technology as AI adds value to IoT through machine learning capabilities and IoT adds value to AI through connectivity, signaling, and data exchange.

I couldnt have said it better myself. I remember many occasions where Ive been lured, beguiled, and tempted by promises of delight with regard to all the wonderful tasks these AI systems could perform in an industrial setting. For example, Ive heard tales of how an AI system running on the edge (whatever that is) or in the fog or in the cloud can monitor the vibration of a machine using a 3-axis accelerometer, or listen to it using a MEMS microphone, and detect subtle anomalies and patterns and use these to predict timelines for future problems and to automatically schedule preemptive maintenance.

The one thing I dont recall any of these companies talking about is how they take the noisy (from electrical interference) and dirty (with missing or false values) raw data from myriad diverse sensors and wrangle it into a form thats suitable for their systems to peruse and ponder. Of course, Ive now discovered that IOTech provides this oh so important piece of the jigsaw, thereby bridging the OT-IT divide.

Related

Continue reading here:
IOTech: Bridging the OT-IT Divide - EE Journal

Machine Learning Tutorial for Beginners

What is Machine Learning?

Machine Learning is a system that can learn from example through self-improvement and without being explicitly coded by programmer. The breakthrough comes with the idea that a machine can singularly learn from the data (i.e., example) to produce accurate results.

Machine learning combines data with statistical tools to predict an output. This output is then used by corporate to makes actionable insights. Machine learning is closely related to data mining and Bayesian predictive modeling. The machine receives data as input, use an algorithm to formulate answers.

A typical machine learning tasks are to provide a recommendation. For those who have a Netflix account, all recommendations of movies or series are based on the user's historical data. Tech companies are using unsupervised learning to improve the user experience with personalizing recommendation.

Machine learning is also used for a variety of task like fraud detection, predictive maintenance, portfolio optimization, automatize task and so on.

In this basic tutorial, you will learn-

Traditional programming differs significantly from machine learning. In traditional programming, a programmer code all the rules in consultation with an expert in the industry for which software is being developed. Each rule is based on a logical foundation; the machine will execute an output following the logical statement. When the system grows complex, more rules need to be written. It can quickly become unsustainable to maintain.

Machine learning is supposed to overcome this issue. The machine learns how the input and output data are correlated and it writes a rule. The programmers do not need to write new rules each time there is new data. The algorithms adapt in response to new data and experiences to improve efficacy over time.

Machine learning is the brain where all the learning takes place. The way the machine learns is similar to the human being. Humans learn from experience. The more we know, the more easily we can predict. By analogy, when we face an unknown situation, the likelihood of success is lower than the known situation. Machines are trained the same. To make an accurate prediction, the machine sees an example. When we give the machine a similar example, it can figure out the outcome. However, like a human, if its feed a previously unseen example, the machine has difficulties to predict.

The core objective of machine learning is the learning and inference. First of all, the machine learns through the discovery of patterns. This discovery is made thanks to the data. One crucial part of the data scientist is to choose carefully which data to provide to the machine. The list of attributes used to solve a problem is called a feature vector. You can think of a feature vector as a subset of data that is used to tackle a problem.

The machine uses some fancy algorithms to simplify the reality and transform this discovery into a model. Therefore, the learning stage is used to describe the data and summarize it into a model.

For instance, the machine is trying to understand the relationship between the wage of an individual and the likelihood to go to a fancy restaurant. It turns out the machine finds a positive relationship between wage and going to a high-end restaurant: This is the model

When the model is built, it is possible to test how powerful it is on never-seen-before data. The new data are transformed into a features vector, go through the model and give a prediction. This is all the beautiful part of machine learning. There is no need to update the rules or train again the model. You can use the model previously trained to make inference on new data.

The life of Machine Learning programs is straightforward and can be summarized in the following points:

Once the algorithm gets good at drawing the right conclusions, it applies that knowledge to new sets of data.

Machine learning can be grouped into two broad learning tasks: Supervised and Unsupervised. There are many other algorithms

An algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output. For instance, a practitioner can use marketing expense and weather forecast as input data to predict the sales of cans.

You can use supervised learning when the output data is known. The algorithm will predict new data.

There are two categories of supervised learning:

Imagine you want to predict the gender of a customer for a commercial. You will start gathering data on the height, weight, job, salary, purchasing basket, etc. from your customer database. You know the gender of each of your customer, it can only be male or female. The objective of the classifier will be to assign a probability of being a male or a female (i.e., the label) based on the information (i.e., features you have collected). When the model learned how to recognize male or female, you can use new data to make a prediction. For instance, you just got new information from an unknown customer, and you want to know if it is a male or female. If the classifier predicts male = 70%, it means the algorithm is sure at 70% that this customer is a male, and 30% it is a female.

The label can be of two or more classes. The above example has only two classes, but if a classifier needs to predict object, it has dozens of classes (e.g., glass, table, shoes, etc. each object represents a class)

When the output is a continuous value, the task is a regression. For instance, a financial analyst may need to forecast the value of a stock based on a range of feature like equity, previous stock performances, macroeconomics index. The system will be trained to estimate the price of the stocks with the lowest possible error.

In unsupervised learning, an algorithm explores input data without being given an explicit output variable (e.g., explores customer demographic data to identify patterns)

You can use it when you do not know how to classify the data, and you want the algorithm to find patterns and classify the data for you

Type

K-means clustering

Puts data into some groups (k) that each contains data with similar characteristics (as determined by the model, not in advance by humans)

Clustering

Gaussian mixture model

A generalization of k-means clustering that provides more flexibility in the size and shape of groups (clusters

Clustering

Hierarchical clustering

Splits clusters along a hierarchical tree to form a classification system.

Can be used for Cluster loyalty-card customer

Clustering

Recommender system

Help to define the relevant data for making a recommendation.

Clustering

PCA/T-SNE

Mostly used to decrease the dimensionality of the data. The algorithms reduce the number of features to 3 or 4 vectors with the highest variances.

Dimension Reduction

There are plenty of machine learning algorithms. The choice of the algorithm is based on the objective.

In the example below, the task is to predict the type of flower among the three varieties. The predictions are based on the length and the width of the petal. The picture depicts the results of ten different algorithms. The picture on the top left is the dataset. The data is classified into three categories: red, light blue and dark blue. There are some groupings. For instance, from the second image, everything in the upper left belongs to the red category, in the middle part, there is a mixture of uncertainty and light blue while the bottom corresponds to the dark category. The other images show different algorithms and how they try to classified the data.

The primary challenge of machine learning is the lack of data or the diversity in the dataset. A machine cannot learn if there is no data available. Besides, a dataset with a lack of diversity gives the machine a hard time. A machine needs to have heterogeneity to learn meaningful insight. It is rare that an algorithm can extract information when there are no or few variations. It is recommended to have at least 20 observations per group to help the machine learn. This constraint leads to poor evaluation and prediction.

Augmentation:

Automation:

Finance Industry

Government organization

Healthcare industry

Marketing

Example of application of Machine Learning in Supply Chain

Machine learning gives terrific results for visual pattern recognition, opening up many potential applications in physical inspection and maintenance across the entire supply chain network.

Unsupervised learning can quickly search for comparable patterns in the diverse dataset. In turn, the machine can perform quality inspection throughout the logistics hub, shipment with damage and wear.

For instance, IBM's Watson platform can determine shipping container damage. Watson combines visual and systems-based data to track, report and make recommendations in real-time.

In past year stock manager relies extensively on the primary method to evaluate and forecast the inventory. When combining big data and machine learning, better forecasting techniques have been implemented (an improvement of 20 to 30 % over traditional forecasting tools). In term of sales, it means an increase of 2 to 3 % due to the potential reduction in inventory costs.

Example of Machine Learning Google Car

For example, everybody knows the Google car. The car is full of lasers on the roof which are telling it where it is regarding the surrounding area. It has radar in the front, which is informing the car of the speed and motion of all the cars around it. It uses all of that data to figure out not only how to drive the car but also to figure out and predict what potential drivers around the car are going to do. What's impressive is that the car is processing almost a gigabyte a second of data.

Machine learning is the best tool so far to analyze, understand and identify a pattern in the data. One of the main ideas behind machine learning is that the computer can be trained to automate tasks that would be exhaustive or impossible for a human being. The clear breach from the traditional analysis is that machine learning can take decisions with minimal human intervention.

Take the following example; a retail agent can estimate the price of a house based on his own experience and his knowledge of the market.

A machine can be trained to translate the knowledge of an expert into features. The features are all the characteristics of a house, neighborhood, economic environment, etc. that make the price difference. For the expert, it took him probably some years to master the art of estimate the price of a house. His expertise is getting better and better after each sale.

For the machine, it takes millions of data, (i.e., example) to master this art. At the very beginning of its learning, the machine makes a mistake, somehow like the junior salesman. Once the machine sees all the example, it got enough knowledge to make its estimation. At the same time, with incredible accuracy. The machine is also able to adjust its mistake accordingly.

Most of the big company have understood the value of machine learning and holding data. McKinsey have estimated that the value of analytics ranges from $9.5 trillion to $15.4 trillion while $5 to 7 trillion can be attributed to the most advanced AI techniques.

Read the original here:
Machine Learning Tutorial for Beginners

Machine Learning on AWS

Amazon SageMaker enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. It removes the complexity that gets in the way of successfully implementing machine learning across use cases and industriesfrom running models for real-time fraud detection, to virtually analyzing biological impacts of potential drugs, to predicting stolen-base success in baseball.

Amazon SageMaker Studio: Experience the first fully integrated development environment (IDE) for machine learning with Amazon SageMaker Studio, where you can perform all ML development steps. You can quickly upload data, create and share new notebooks, train and tune ML models, move back and forth between steps to adjust experiments, debug and compare results, and deploy and monitor ML models all in a single visual interface, making you much more productive.

Amazon SageMaker Autopilot: Automatically build, train, and tune models with full visibility and control, using Amazon SageMaker Autopilot. It is the industrys first automated machine learning capability that gives you complete control and visibility into how your models were created and what logic was used in creating these models.

See more here:
Machine Learning on AWS

How To Verify The Memory Loss Of A Machine Learning Model – Analytics India Magazine

It is a known fact that deep learning models get better with diversity in the data they are fed with. For instance, data in a use case related to healthcare data will be taken from several providers such as patient data, history, workflows of professionals, insurance providers, etc. to ensure such data diversity.

These data points that are collected through various interactions of people are fed into a machine learning model, which sits remotely in a data haven spewing predictions without exhausting.

However, consider a scenario where one of the providers ceases to offer data to the healthcare project and later requests to delete the provided information. In such a case, does the model remember or forget its learnings from this data?

To explore this, a team from the University of Edinburgh and Alan Turing Institute assumed that a model had forgotten some data and what can be done to verify the same. In this process, they investigated the challenges and also offered solutions.

The authors of this work wrote that this initiative is first of its kind and the only work that comes close is the Membership Inference Attack (MIA), which is also an inspiration to this work.

To verify if a model has forgotten specific data, the authors propose a Kolmogorov Smirnov (K-S) distance-based method. This method is used to infer whether a model is trained with the query dataset. The algorithm can be seen below:

Based on the above algorithm, the researchers have used benchmark datasets such as MNIST, SVHN and CIFAR-10 for experiments, which were used to verify the effectiveness of this new method. Later, this method was also tested on the ACDC dataset using the pathology detection component of the challenge.

The MNIST dataset contains 60,000 images of 10 digits with image size 28 28. Similar to MNIST, the SVHN dataset has over 600,000 digit images obtained from house numbers in Google Street view images. The image size of SVHN is 32 32. Since both datasets are for the task of digit recognition/classification, this dataset was considered to belong to the same domain. CIFAR-10 is used as a dataset to validate the method. CIFAR-10 has 60,000 images (size 32 32) of 10-class objects, including aeroplane, bird, etc. To train models with the same design, the images of all three datasets are preprocessed to grey-scale and rescaled to size 28 28.

Using the K-S distance statistics about the output distribution of a target model, said the authors, can be obtained without knowing the weights of the model. Since the models training data are unknown, few new models called the shadow models were trained with the query dataset and another calibration dataset.

Then by comparing the K-S values, one can conclude if the training data contains information from the query dataset or not.

Experiments have been done before to check the ownership one has over data in the world of the internet. One such attempt was made by the researchers at Stanford in which they investigated the algorithmic principles behind efficient data deletion in machine learning.

They found that for many standard ML models, the only way to completely remove an individuals data is to retrain the whole model from scratch on the remaining data, which is often not computationally practical. In a trade-off between efficiency and privacy, a challenge arises because algorithms that support efficient deletion need not be private, and algorithms that are private do not have to support efficient deletion.

Aforementioned experiments are an attempt to probe and raise new questions related to the never-ending debate about the usage of AI and privacy. The objective in these works is to investigate the idea of how much authority an individual has over specific data while also helping expose the vulnerabilities within a model if certain data is removed.

Check more about this work here.

comments

Read more:
How To Verify The Memory Loss Of A Machine Learning Model - Analytics India Magazine

Tecton.ai Launches with New Data Platform to Make Machine Learning Accessible to Every Company – insideBIGDATA

Tecton.ai emerged from stealth and formally launched with its data platform for machine learning. Tecton enables data scientists to turn raw data into production-ready features, the predictive signals that feed machine learning models. Tecton is in private beta with paying customers, including a Fortune 50 company.

Tecton.ai also announced $25 million in seed and Series A funding co-led by Andreessen Horowitz and Sequoia. Both Martin Casado, general partner at Andreessen Horowitz, and Matt Miller, partner at Sequoia, have joined the board.

Tecton.ai founders Mike Del Balso (CEO), Kevin Stumpf (CTO) and Jeremy Hermann (VP of Engineering) worked together at Uber when the company was struggling to build and deploy new machine learning models, so they createdUbers Michelangelo machine learning platform. Michelangelo was instrumental in scaling Ubers operations to thousands of production models serving millions of transactions per second in just a few years, and today it supports a myriad of use cases ranging from generating marketplace forecasts, calculating ETAs and automating fraud detection.

Del Balso, Stumpf and Hermann went on to found Tecton.ai to solve the data challenges that are the biggest impediment to deploying machine learning in the enterprise today. Enterprises are already generating vast amounts of data, but the problem is how to harness and refine this data into predictive signals that power machine learning models. Engineering teams end up spending the majority of their time building bespoke data pipelines for each new project. These custom pipelines are complex, brittle, expensive and often redundant. The end result is that 78% of new projects never get deployed, and 96% of projects encounter challenges with data quality and quantity(1).

Data problems all too often cause last-mile delivery issues for machine learning projects, said Mike Del Balso, Tecton.ai co-founder and CEO. With Tecton, there is no last mile. We created Tecton to empower data science teams to take control of their data and focus on building models, not pipelines. With Tecton, organizations can deliver impact with machine learning quickly, reliably and at scale.

Tecton.ai has assembled a world-class engineering team that has deep experience building machine learning infrastructure for industry leaders such as Google, Facebook, Airbnb and Uber. Tecton is the industrys first data platform that has been designed specifically to support the requirements of operational machine learning. It empowers data scientists to build great features, serve them to production quickly and reliably and do it at scale.

Tecton makes the delivery of machine learning data predictable for every company.

The ability to manage data and extract insights from it is catalyzing the next wave of business transformation, said Martin Casado, general partner at Andreessen Horowitz. The Tecton team has been on the forefront of this change with a long history of machine learning/AI and data at Google, Facebook and Airbnb and building the machine learning platform at Uber. Were very excited to be partnering with Mike, Kevin, Jeremy and the Tecton team to bring this expertise to the rest of the industry.

The founders of Tecton built a platform within Uber that took machine learning from a bespoke research effort to the core of how the company operated day-to-day, said Matt Miller, partner at Sequoia. They started Tecton to democratize machine learning across the enterprise. We believe their platform for machine learning will drive a Cambrian explosion within their customers, empowering them to drive their business operations with this powerful technology paradigm, unlocking countless opportunities. We were thrilled to partner with Tecton along with a16z at the seed and now again at the Series A. We believe Tecton has the potential to be one of the most transformational enterprise companies of this decade.

Sign up for the free insideBIGDATAnewsletter.

See more here:
Tecton.ai Launches with New Data Platform to Make Machine Learning Accessible to Every Company - insideBIGDATA

This 17-year-old boy created a machine learning model to suggest potential drugs for Covid-19 – India Today

In keeping with its tradition of high excellence and achievements, Greenwood High International School's student Tarun Paparaju of Class 12 has achieved the 'Grand Master' level in kernels, the highest accreditation in Kaggle, holding a rank of 13 out of 118,024 Kagglers worldwide. Kaggle is the world's largest online community for Data Science and Artificial Intelligence.

There are only 20 Kernel Grandmasters out of the three million users on Kaggle worldwide, and Tarun, aged 17 years, is honored to be one of the 20 Kernel Grandmasters now. Users of Kaggle are placed at different levels based on the quality and accuracy of their solutions to real-world artificial intelligence problems. The five levels in ascending order are Novice, Contributor, Expert, Master, and Grandmaster.

Kaggle hosts several data science competitions and contestants are challenged to find solutions to these real-world challenges. Kernels are a medium through which Kagglers share their code and insights on how to solve the problem.

These kernels include in-depth data analysis, visualisation, and machine learning, usually written in Python or R programming language. Other Kagglers can up vote a kernel if they believe it provides useful insights or solves the problem. 'Kernels Grandmaster' title at Kaggle requires 15 kernels qualified with gold medals.

Tarun's passion for Calculus, Mathematical modeling, and Data science from a very young age got him interested in participating and contributing to the Kaggle community.

He loves solving real-world Data Science problems, especially in the areas based on Deep learning: Natural language processing, Signal processing. Tarun is an open-source contributor to Keras, a Deep learning framework.

He has proposed and added Capsule NN layer support to Keras framework. He writes blogs about his adventures and learnings in data science.

Now, he closely works with the Kaggle community and aspires to be a scholar in the area of Natural language processing. Additionally, he loves playing cricket and football. Sports is a large part of his life outside Data science and academics.

Read:UGC releases new academic calendar: Here are top 10 important UGC updates

Read: MPhil, PhD students to get extension of 6 months, viva-voce through video conferencing: UGC

Read: WBBSE Madhyamik Result 2020: WB Class 10 result date to be fixed after Covid-19 lockdown ends

Original post:
This 17-year-old boy created a machine learning model to suggest potential drugs for Covid-19 - India Today