Mission Cloud Services Wins TechTarget Award for its Innovative AWS Machine Learning Work with JibJab – GlobeNewswire

LOS ANGELES, April 12, 2022 (GLOBE NEWSWIRE) -- Mission, a managed cloud services provider and Amazon Web Services (AWS) Premier Services Partner, today announced the company has won a 2021 Top Projects Award from TechTargets SearchITChannel. The annual award honors three IT services partners and their customers for exceptional technological initiatives that demonstrate compelling innovation, creative partnering, and business-wide benefits.

JibJab sought support from an AWS partner to achieve its goals around image quality and customer experience as it prepared to launch its user-designed Starring You Books. For the iconic digital entertainment studio known for enabling users to send personalized e-cards, the books would mark the companys first expansion into a physical product line. During the projects initial planning process, JibJab realized the opportunity to utilize a machine learning computer vision algorithm to detect faces within user-uploaded photos. The algorithm would need to automatically crop faces and hair from photos and perform post-processing to prepare print-quality images. Without the in-house ML expertise to build this algorithm and wanting to avoid the cost-prohibitive licensing fees of using an existing ML algorithm JibJab partnered with Mission to develop and complete the project.

Mission leveraged its AWS machine learning expertise to build and train the algorithm, implementing a process that included data labeling and augmentation with a training set of 17,000 images. Experts from Missions Data, Analytics & Machine Learning practice created JibJabs solution using several solutions, including Amazon SageMaker, Amazon Rekognition, and Facebooks Detectron2. This work has resulted in a seamless self-service experience for JibJab customers, who can upload their photos and have final, book-ready images prepared by the ML algorithm in just five seconds. Customers then simply place the final images within their personalized Starring You Books products using a GUI, and approve their work for printing.

Quotes

We talked to a few external companies and Mission was our clear preference, said Matt Cielecki, VP of Engineering at JibJab. It became evident from day one that Mission wasnt just going to throw something over the fence for us to use; the team was going to ensure that we understood the rationale behind the processes and technologies put into action.

Missions work with JibJab showcases the tremendous potential AWS and ML can enable for developing innovative new products and unprecedented customer experiences, said Ryan Ries, Practice Lead, Data Science & Engineering at Mission.We jumped at the opportunity to work with JibJab on this project and are proud of the success of the project and to have the work recognized with TechTarget SearchITChannels 2021 Top Projects Award.

About Mission Cloud Services

Mission accelerates enterprise cloud transformation by delivering a differentiated suite of agile cloud services and consulting. As an AWS Premier Services Partner, Missions always-on services enable businesses to scale and outpace competitors by leveraging the most transformative technology platform and enterprise software ecosystem in history.

ContactKyle Petersonkyle@clementpeterson.com

A photo accompanying this announcement is available at https://www.globenewswire.com/NewsRoom/AttachmentNg/d7325672-6f04-42ed-8959-9d365045ea72

See the article here:
Mission Cloud Services Wins TechTarget Award for its Innovative AWS Machine Learning Work with JibJab - GlobeNewswire

Machine learning in higher education – McKinsey

Many higher-education institutions are now using data and analytics as an integral part of their processes. Whether the goal is to identify and better support pain points in the student journey, more efficiently allocate resources, or improve student and faculty experience, institutions are seeing the benefits of data-backed solutions.

Those at the forefront of this trend are focusing on harnessing analytics to increase program personalization and flexibility, as well as to improve retention by identifying students at risk of dropping out and reaching out proactively with tailored interventions. Indeed, data science and machine learning may unlock significant value for universities by ensuring resources are targeted toward the highest-impact opportunities to improve access for more students, as well as student engagement and satisfaction.

For example, Western Governors University in Utah is using predictive modeling to improve retention by identifying at-risk students and developing early-intervention programs. Initial efforts raised the graduation rate for the universitys four-year undergraduate program by five percentage points between 2018 and 2020.

Yet higher education is still in the early stages of data capability building. With universities facing many challenges (such as financial pressures, the demographic cliff, and an uptick in student mental-health issues) and a variety of opportunities (including reaching adult learners and scaling online learning), expanding use of advanced analytics and machine learning may prove beneficial.

Below, we share some of the most promising use cases for advanced analytics in higher education to show how universities are capitalizing on those opportunities to overcome current challenges, both enabling access for many more students and improving the student experience.

Data science and machine learning may unlock significant value for universities by ensuring resources are targeted toward the highest-impact opportunities to improve access for more students, as well as student engagement and satisfaction.

Advanced-analytics techniques may help institutions unlock significantly deeper insights into their student populations and identify more nuanced risks than they could achieve through descriptive and diagnostic analytics, which rely on linear, rule-based approaches (Exhibit 1).

Exhibit 1

Advanced analyticswhich uses the power of algorithms such as gradient boosting and random forestmay also help institutions address inadvertent biases in their existing methods of identifying at-risk students and proactively design tailored interventions to mitigate the majority of identified risks.

For instance, institutions using linear, rule-based approaches look at indicators such as low grades and poor attendance to identify students at risk of dropping out; institutions then reach out to these students and launch initiatives to better support them. While such initiatives may be of use, they often are implemented too late and only target a subset of the at-risk population. This approach could be a good makeshift solution for two problems facing student success leaders at universities. First, there are too many variables that could be analyzed to indicate risk of attrition (such as academic, financial, and mental health factors, and sense of belonging on campus). Second, while its easy to identify notable variance on any one or two variables, it is challenging to identify nominal variance on multiple variables. Linear, rule-based approaches therefore may fail to identify students who, for instance, may have decent grades and above-average attendance but who have been struggling to submit their assignments on time or have consistently had difficulty paying their bills (Exhibit 2).

Exhibit 2

A machine-learning model could address both of the challenges described above. Such a model looks at ten years of data to identify factors that could help a university make an early determination of a students risk of attrition. For example, did the student change payment methods on the university portal? How close to the due date does the student submit assignments? Once the institution has identified students at risk, it can proactively deploy interventions to retain them.

Though many institutions recognize the promise of analytics for personalizing communications with students, increasing retention rates, and improving student experience and engagement, institutions could be using these approaches for the full range of use cases across the student journeyfor prospective, current, and former students alike.

For instance, advanced analytics can help institutions identify which high schools, zip codes, and counties they should focus on to reach prospective students who are most likely to be great fits for the institution. Machine learning could also help identify interventions and support that should be made available to different archetypes of enrolled students to help measure and increase student satisfaction. These use cases could then be extended to providing students support with developing their skills beyond graduation, enabling institutions to provide continual learning opportunities and to better engage alumni. As an institution expands its application and coverage of advanced-analytics tools across the student life cycle, the model gets better at identifying patterns, and the institution can take increasingly granular interventions and actions.

Institutions will likely want to adopt a multistep model to harness machine learning to better serve students. For example, for efforts aimed at improving student completion and graduation rates, the following five-step technique could generate immense value:

Institutions could deploy this model at a regular cadence to identify students who would most benefit from additional support.

Institutions could also create similar models to address other strategic goals or challenges, including lead generation and enrollment. For example, institutions could, as a first step, analyze 100 or more attributes from years of historical data to understand the characteristics of applicants who are most likely to enroll.

Institutions will likely want to adopt a multistep model to harness machine learning to better serve students.

The experiences of two higher education institutions that leaned on advanced analytics to improve enrollment and retention reveal the impact such efforts can have.

One private nonprofit university had recently enrolled its largest freshman class in history and was looking to increase its enrollment again. The institution wanted to both reach more prospective first-year undergraduate students who would be a great fit for the institution and improve conversion in the enrollment journey in a way that was manageable for the enrollment team without significantly increasing investment and resources. The university took three important actions:

For this institution, advanced-analytics modeling had immediate implications and impact. The initiative also suggested future opportunities for the university to serve more freshmen with greater marketing efficiency. When initially tested against leads for the subsequent fall (prior to the application deadline), the model accurately predicted 85 percent of candidates who submitted an application, and it predicted the 35 percent of applicants at that point in the cycle who were most likely to enroll, assuming no changes to admissions criteria (Exhibit 3). The enrollment management team is now able to better prioritize its resources and time on high-potential leads and applicants to yield a sizable class. These new capabilities will give the institution the flexibility to make strategic choices; rather than focus primarily on the size of the incoming class, it may ensure the desired class size while prioritizing other objectives, such as class mix, financial-aid allocation, or budget savings.

Exhibit 3

Similar to many higher-education institutions during the pandemic, one online university was facing a significant downward trend in student retention. The university explored multiple options and deployed initiatives spearheaded by both academic and administrative departments, including focus groups and nudge campaigns, but the results fell short of expectations.

The institution wanted to set a high bar for student success and achieve marked and sustainable improvements to retention. It turned to an advanced-analytics approach to pursue its bold aspirations.

To build a machine-learning model that would allow the university to identify students at risk of attrition early, it first analyzed ten years of historical data to understand key characteristics that differentiate students who were most likely to continueand thus graduatecompared with those who unenrolled. After validating that the initial model was multiple times more effective at predicting retention than the baseline, the institution refined the model and applied it to the current student population. This attrition model yielded five at-risk student archetypes, three of which were counterintuitive to conventional wisdom about what typical at-risk student profiles look like (Exhibit 4).

Exhibit 4

Together, these three counterintuitive archetypes of at-risk studentswhich would have been omitted using a linear analytics approachaccount for about 70 percent of the students most likely to discontinue enrollment. The largest group of at-risk individuals (accounting for about 40 percent of the at-risk students identified) were distinctive academic achievers with an excellent overall track record. This means the model identified at least twice as many students at risk of attrition than models based on linear rules. The model outputs have allowed the university to identify students at risk of attrition more effectively and strategically invest in short- and medium-term initiatives most likely to drive retention improvement.

With the model and data on at-risk student profiles in hand, the online university launched a set of targeted interventions focused on providing tailored support to students in each archetype to increase retention. Actions included scheduling more touchpoints with academic and career advisers, expanding faculty mentorship, and creating alternative pathways for students to satisfy their knowledge gaps.

Advanced analytics is a powerful tool that may help higher-education institutions overcome the challenges facing them today, spur growth, and better support students. However, machine learning is complex, with considerable associated risks. While the risks vary based on the institution and the data included in the model, higher-education institutions may wish to take the following steps when using these tools:

While many higher-education institutions have started down the path to harnessing data and analytics, there is still a long way to go to realizing the full potential of these capabilities in terms of the student experience. The influx of students and institutions that have been engaged in online learning and using technology tools over the past two years means there is significantly more data to work with than ever before; higher-education institutions may want to start using it to serve students better in the years to come.

Originally posted here:
Machine learning in higher education - McKinsey

When It Comes to AI, Can We Ditch the Datasets? Using Synthetic Data for Training Machine-Learning Models – SciTechDaily

A machine-learning model for image classification thats trained using synthetic data can rival one trained on the real thing, a study shows.

Huge amounts of data are needed to train machine-learning models to perform image classification tasks, such as identifying damage in satellite photos following a natural disaster. However, these data are not always easy to come by. Datasets may cost millions of dollars to generate, if usable data exist in the first place, and even the best datasets often contain biases that negatively impact a models performance.

To circumvent some of the problems presented by datasets, MIT researchers developed a method for training a machine learning model that, rather than using a dataset, uses a special type of machine-learning model to generate extremely realistic synthetic data that can train another model for downstream vision tasks.

Their results show that a contrastive representation learning model trained using only these synthetic data is able to learn visual representations that rival or even outperform those learned from real data.

MIT researchers have demonstrated the use of a generative machine-learning model to create synthetic data, based on real data, that can be used to train another model for image classification. This image shows examples of the generative models transformation methods. Credit: Courtesy of the researchers

This special machine-learning model, known as a generative model, requires far less memory to store or share than a dataset. Using synthetic data also has the potential to sidestep some concerns around privacy and usage rights that limit how some real data can be distributed. A generative model could also be edited to remove certain attributes, like race or gender, which could address some biases that exist in traditional datasets.

We knew that this method should eventually work; we just needed to wait for these generative models to get better and better. But we were especially pleased when we showed that this method sometimes does even better than the real thing, says Ali Jahanian, a research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of the paper.

Jahanian wrote the paper with CSAIL grad students Xavier Puig and Yonglong Tian, and senior author Phillip Isola, an assistant professor in the Department of Electrical Engineering and Computer Science. The research will be presented at the International Conference on Learning Representations.

Once a generative model has been trained on real data, it can generate synthetic data that are so realistic they are nearly indistinguishable from the real thing. The training process involves showing the generative model millions of images that contain objects in a particular class (like cars or cats), and then it learns what a car or cat looks like so it can generate similar objects.

Essentially by flipping a switch, researchers can use a pretrained generative model to output a steady stream of unique, realistic images that are based on those in the models training dataset, Jahanian says.

But generative models are even more useful because they learn how to transform the underlying data on which they are trained, he says. If the model is trained on images of cars, it can imagine how a car would look in different situations situations it did not see during training and then output images that show the car in unique poses, colors, or sizes.

Having multiple views of the same image is important for a technique called contrastive learning, where a machine-learning model is shown many unlabeled images to learn which pairs are similar or different.

The researchers connected a pretrained generative model to a contrastive learning model in a way that allowed the two models to work together automatically. The contrastive learner could tell the generative model to produce different views of an object, and then learn to identify that object from multiple angles, Jahanian explains.

This was like connecting two building blocks. Because the generative model can give us different views of the same thing, it can help the contrastive method to learn better representations, he says.

The researchers compared their method to several other image classification models that were trained using real data and found that their method performed as well, and sometimes better, than the other models.

One advantage of using a generative model is that it can, in theory, create an infinite number of samples. So, the researchers also studied how the number of samples influenced the models performance. They found that, in some instances, generating larger numbers of unique samples led to additional improvements.

The cool thing about these generative models is that someone else trained them for you. You can find them in online repositories, so everyone can use them. And you dont need to intervene in the model to get good representations, Jahanian says.

But he cautions that there are some limitations to using generative models. In some cases, these models can reveal source data, which can pose privacy risks, and they could amplify biases in the datasets they are trained on if they arent properly audited.

He and his collaborators plan to address those limitations in future work. Another area they want to explore is using this technique to generate corner cases that could improve machine learning models. Corner cases often cant be learned from real data. For instance, if researchers are training a computer vision model for a self-driving car, real data wouldnt contain examples of a dog and his owner running down a highway, so the model would never learn what to do in this situation. Generating that corner case data synthetically could improve the performance of machine learning models in some high-stakes situations.

The researchers also want to continue improving generative models so they can compose images that are even more sophisticated, he says.

Reference: Generative Models as a Data Source for Multiview Representation Learning by Ali Jahanian, Xavier Puig, Yonglong Tian and Phillip Isola.PDF

This research was supported, in part, by the MIT-IBM Watson AI Lab, the United States Air Force Research Laboratory, and the United States Air Force Artificial Intelligence Accelerator.

See more here:
When It Comes to AI, Can We Ditch the Datasets? Using Synthetic Data for Training Machine-Learning Models - SciTechDaily

Ensuring compliance with data governance regulations in the Healthcare Machine learning (ML) space – BSA bureau

"Establishing decentralized Machine learning (ML) framework optimises and accelerates clinical decision-making for evidence-based medicine" says Krishna Prasad Shastry, Chief Technologist (AI Strategy and Solutions) at Hewlett-Packard Enterprise

The healthcare industry is becoming increasingly information-driven. Smart machines are creating a positive impact to enhance capabilities in healthcare and R&D. Promising technologies are aiding healthcare staff in areas with limited resources, helping to achieve a more efficient healthcare system. Yet, with all its benefits, using data to deliver more value-based care is not without risks. Krishna Prasad Shastry, Chief Technologist (AI Strategy and Solutions) at Hewlett-Packard Enterprise, Singapore shares further details on the establishment of a decentralized machine learning framework while ensuring compliance with data governance regulations.

Technology will be indispensable in the future of healthcare, with advancements in various technologies such as artificial intelligence (AI), robotics, and nanotechnology. Machine learning (ML) a subset of AI now plays a key role in many health-related realms, such as disease diagnosis. For example, ML models can assist radiologists to diagnose diseases, like Leukaemia or Tuberculosis, more accurately and more rapidly. By using ML algorithms to evaluate imaging such as chest X-rays, MRI, or CT scans, and applying ML to analyse medical imaging, radiologists can better prioritise which potential positive cases to investigate. Similarly, ML models can be developed to recommend personalised patient care, by observing various vital parameters, sensors, or electronic health records (EHRs). The efficiency gains that ML offers stand to take the pressure off the healthcare system especially valuable when resources are stretched and access to hospitals and clinics are disrupted.

Data underpins these digital healthcare advancements. Healthcare organisations globally are embracing digital transformation and using data to enhance operations. Yet, with all its benefits, using data to deliver more value-based care is not without risks. For example, using ML for diagnostic purposes requires a diverse set of data in order to avoid bias. But, access to diverse data sets is often limited by privacy regulations in the health sector. Healthcare leaders face the challenge of how to use data to fuel innovation in a secure and compliant manner.

For instance, HPEs Swarm Learning, a decentralized machine learning framework allows insights generated from data to be shared without having to share the raw data itself. The insights generated by each owner in a group are shared, allowing all participants to still benefit from the collaborative insights of the network. In the case of a hospital thats building an ML model for diagnostics, Swarm Learning enables decentralized model training that benefits from access to insights of a larger data set, while respecting privacy regulations.

Partnering with stakeholders across the public and private sectors will enable us to better provide patients access to new digital healthcare solutions that can reform the management of challenging diseases such as cancer. Our recent partnership with AstraZeneca, under their A. Catalyst Network aims to drive healthcare improvement across Singapores healthcare ecosystem. Further, Swarm Learning can reduce the risk of breaching data governance regulations and can accelerate medical research.

The future of healthcare lies in working in tandem with technology; innovations in the AI and ML space are already being implemented across the treatment chain in the healthcare industry, with successful case studies that we can learn from. From diagnosis to patient management, AI and ML can be used to perform tasks such as predicting diseases, identifying high-risk patients, and automating hospital operations. As ML models are increasingly used in the diagnosis of diseases, there is an increasing need for data sets covering a diverse set of patients. This is a challenging demand to fulfill due to privacy and regulatory restrictions. Gaining insights from a diverse set of data without compromising on privacy might help, as in Swarm Learning.

AI models are used in precision medicine to improve diagnostic outcomes through integration and by modeling multiple data points, including genetic, biochemical, and clinical data. They are also used to optimise and accelerate clinical decision-making for evidence-based medicine. In the sphere of life sciences, AI models are used in areas such as drug discovery, drug toxicity prediction, clinical trials, and adverse event management. For all these cases, Swarm Learning can help build better models by collaborating across siloed data sets.

As we progress towards a technology-driven future, the question of how humans and technology can work hand in hand for the greater good will remain a question to be answered. But I believe that we will be able to maximise the benefits of digital healthcare, as long as we continue to facilitate collaboration between healthcare and IT professionals to bridge the existing gaps in the industry.

Visit link:
Ensuring compliance with data governance regulations in the Healthcare Machine learning (ML) space - BSA bureau

OVH Groupe : A journey into the wondrous land of Machine Learning, or Cleaning data is funnier than cleaning my flat! (Part 3) – Marketscreener.com

What am I doing here? The story so far

As you might know if you have read our blog for more than a year, a few years ago, I bought a flat in Paris. If you don't know, the real estate market in Paris is expensive but despite that, it is so tight that a good flat at a correct price can be for sale for less than a day.

Obviously, you have to take a decision quite fast, and considering the prices, you have to trust your decision. Of course, to trust your decision, you have to take your time, study the market, make some visits etc This process can be quite long (in my case it took a year between the time I decided that I wanted to buy a flat and the time I actually commited to buying my current flat), and even spending a lot of time will never allow you to have a perfect understanding of the market. What if there was a way to do that very quickly and with a better accuracy than with the standard process?

As you might also know if you are one of our regular readers, I tried to solve this problem with Machine Learning, using an end-to-end software called Dataiku. In a first blog post, we learned how to make a basic use of Dataiku, and discovered that just knowing how to click on a few buttons wasn't quite enough: you had to bring some sense in your data and in the training algorithm, or you would find absurd results.

In a second entry, we studied a bit more the data, tweaked a few parameters and values in Dataiku's algorithms and trained a new model. This yielded a much better result, and this new model was - if not accurate - at least relevant: the same flat had a higher predicted place when it was bigger or supposedly in a better neighbourhood. However, it was far from perfect and really lacked accuracy for several reasons, some of them out of our control.

However, all of this was done on one instance of Dataiku - a licensed software - on a single VM. There are multiple reasons that could push me to do things differently:

What we did very intuitively (and somewhat naively) with Dataiku was actually a quite complex pipeline that is often called ELT, for Extract, Load and Transform.

And obviously, after this ELT process, we added a step to train a model on the transformed data.

So what are we going to do to redo all of that without Dataiku's help?

When ELT becomes ELTT

Now that we know what we are going to do, let us proceed!

Before beginning, we have to properly set up our environment to be able to launch the different tools and products. Throughout this tutorial, we will show you how to do everything with CLIs. However, all these manipulations can also be done on OVHcloud's manager (GUI), in which case you won't have to configure these tools.

For all the manipulations described in the next phase of this article, we will use a Virtual Machine deployed in OVHcloud's Public Cloud that will serve as the extraction agent to download the raw data from the web and push it to S3 as well as a CLI machine to launch data processing and notebook jobs. It is a d2-4 flavor with 4GB of RAM, 2 vCores and 50 GB of local storage running Debian 10, deployed in Graveline's datacenter. During this tutorial, I run a few UNIX commands but you should easily be able to adapt them to whatever OS you use if needed. All the CLI tools specific to OVHcloud's products are available on multiple OSs.

You will also need an OVHcloud NIC (user account) as well as a Public Cloud Project created for this account with a quota high enough to deploy a GPU (if that is not the case, you will be able to deploy a notebook on CPU rather than GPU, the training phase will juste take more time). To create a Public Cloud project, you can follow these steps.

Here is a list of the CLI tools and other that we will use during this tutorial and why:

Additionally you will find commented code samples for the processing and training steps in this Github repository.

In this tutorial, we will use several object storage buckets. Since we will use the S3 API, we will call them S3 bucket, but as mentioned above, if you use OVHcloud standard Public Cloud Storage, you could also use the Swift API. However, you are restricted to only the S3 API if you use our new high-performance object storage offer, currently in Beta.

For this tutorial, we are going to create and use the following S3 buckets:

To create these buckets, use the following commands after having configured your aws CLI as explained above:

Now that you have your environment set up and your S3 buckets ready, we can begin the tutorial!

First, let us download the data files directly on Etalab's website and unzip them:

You should now have the following files in your directory, each one corresponding to the French real estate transaction of a specific year:

Now, use the S3 CLI to push these files in the relevant S3 bucket:

You should now have those 5 files in your S3 bucket:

What we just did with a small VM was ingesting data into a S3 bucket. In real-life usecases with more data, we would probably use dedicated tools to ingest the data. However, in our example with just a few GB of data coming from a public website, this does the trick.

Now that you have your raw data in place to be processed, you just have to upload the code necessary to run your data processing job. Our data processing product allows you to run Spark code written either in Java, Scala or Python. In our case, we used Pyspark on Python. Your code should consist in 3 files:

Once you have your code files, go to the folder containing them and push them on the appropriate S3 bucket:

Your bucket should now look like that:

You are now ready to launch your data processing job. The following command will allow you to launch this job on 10 executors, each with 4 vCores and 15 GB of RAM.

Note that the data processing product uses the Swift API to retrieve the code files. This is totally transparent to the user, and the fact that we used the S3 CLI to create the bucket has absolutely no impact. When the job is over, you should see the following in your transactions-ecoex-clean bucket:

Before going further, let us look at the size of the data before and after cleaning:

As you can see, with ~2.5 GB of raw data, we extracted only ~10 MB of actually useful data (only 0,4%)!! What is noteworthy here is that that you can easily imagine usecases where you need a large-scale infrastructure to ingest and process the raw data but where one or a few VMs are enough to work on the clean data. Obviously, this is more often the case when working with text/structured data than with raw sound/image/videos.

Before we start training a model, take a look at these two screenshots from OVHcloud's data processing UI to erase any doubt you have about the power of distributed computing:

In the first picture, you see the time taken for this job when launching only 1 executor- 8:35 minutes. This duration is reduced to only 2:56 minutes when launching the same job (same code etc) on 4 executors: almost 3 times faster. And since you pay-as-you go, this will only cost you ~33% more in that case for the same operation done 3 times faster- without any modification to your code, only one argument in the CLI call. Let us now use this data to train a model.

To train the model, you are going to use OVHcloud AI notebook to deploy a notebook! With the following command, you will:

In our case, we launch a notebook with only 1 GPU because the code samples we provide would not leverage several GPUs for a single job. I could adapt my code to parallelize the training phase on multiple GPUs, in which case I could launch a job with up to 4 parallel GPUs.Once this is done, just get the URL of your notebook with the following command and connect to it with your browser:

Once you're done, just get the URL of your notebook with the following command and connect to it with your browser:

You can now import the real-estate-training.ipynb file to the notebook with just a few clicks. If you don't want to import it from the computer you use to access the notebook (for example if like me you use a VM to work and have cloned the git repo on this VM and not on your computer), you can push the .ipynb file to your transactions-ecoex-clean or transactions-ecoex-model bucket and re-synchronize the bucket to your notebook while it runs by using the ovhai notebook pull-data command. You will then find the notebook file in the corresponding directory.

Once you have imported the notebook file to your notebook instance, just open it and follow the directives. If you are interested in the result but don't want to do it yourself, let's sum up what the notebook does:

Use the models built in this tutorial at your own risk

So, what can we conclude from all of this? First, even if the second model is obviously better than the first, it is still very noisy: while not far from correct on average, there is still a huge variance. Where does this variance come from?

Well, it is not easy to say. To paraphrase the finishing part of my last article:

In this article, I tried to give you a glimpse at the tools that Data Scientists commonly use to manipulate data and train models at scale, in the Cloud or on their own infrastructure:

Hopefuly, you now have a better understanding on how Machine Learning algorithms work, what their limitations are, and how Data Scientists work on data to create models.

As explained earlier, all the code used to obtain these results can be found here. Please don't hesitate to replicate what I did or adapt it to other usecases!

Solutions ArchitectatOVHCloud|+ posts

See the original post here:
OVH Groupe : A journey into the wondrous land of Machine Learning, or Cleaning data is funnier than cleaning my flat! (Part 3) - Marketscreener.com

Research Analyst / Associate / Fellow in Machine Learning and Artificial Intelligence job with NATIONAL UNIVERSITY OF SINGAPORE | 289568 – Times…

The Role

The Sustainable and Green Finance Institute (SGFIN) is a new university-level research institute in the National University of Singapore (NUS), jointly supported by the Monetary Authority of Singapore (MAS) and NUS. SGFIN aspires to develop deep research capabilities in sustainable and green finance, provide thought leadership in the sustainability space, and shape sustainability outcomes across the financial sector and the economy at large.

This role is ideally suited for those wishing to work in academic or industry research in quantitative analysis, particularly in the area of machine learning and artificial intelligence. The responsibilities of the role will include designing and developing various analytical frameworks to analyze structure, unstructured and non-traditional data related to corporate financial, environmental, and social indicators.

There are no teaching obligations for this position, and the candidate will have the opportunity to develop their research portfolio.

Duties and Responsibilities

The successful candidate will be expected to assume the following responsibilities:

Qualifications

Covid-19 Message

At NUS, the health and safety of our staff and students are one of our utmost priorities, and COVID-vaccination supports our commitment to ensure the safety of our community and to make NUS as safe and welcoming as possible. Many of our roles require a significant amount of physical interactions with students/staff/public members. Even for job roles that may be performed remotely, there will be instances where on-campus presences are required.

In accordance with Singapore's legal requirements, unvaccinated workers will not be able to work on the NUS premises with effect from 15 January 2022. As such, job applicants will need to be fully COVID-19 vaccinated to secure successful employment with NUS.

View post:
Research Analyst / Associate / Fellow in Machine Learning and Artificial Intelligence job with NATIONAL UNIVERSITY OF SINGAPORE | 289568 - Times...

What Is Machine Learning, and How Can It Help With Content Marketing? – Entrepreneur

Opinions expressed by Entrepreneur contributors are their own.

The term machine learning was first introduced by Arthur Samuel in 1959. Machine learning is a type of artificial intelligence that gives computers the ability to learn without being explicitly programmed. It provides a set of algorithms and techniques for creating computer programs that can automatically improve their performance on specific tasks.

Related:How Machine Learning Is Changing the World -- and Your Everyday Life

Machine learning is playing a significant role in content marketing because it helps marketers understand what consumers want to read and what they don't. It also helps marketerscreate content that will be more likely to generate conversions and increase their return on investment.

The future of machine learning in content marketing is limitless as we can expect AI to take over more and more responsibilities from marketers.

Related:5 Reasons Machine Learning Is the Future of Marketing

Machine learning is a type of artificial intelligence that can learn from data and make predictions. Machine learning algorithms are used in many industries, such as finance, healthcareand so on.Content marketing is one of the most popular industries where machine learning can be applied.

There are many ways that content marketers use machine learning to create better content and optimize their marketing campaigns. One way they do this is by using sentiment analysis to understand what kind of moods people might be in while reading their content. This helps them write more engaging copy for their audience.

Another way for marketersto use machine learning is by utilizing predictive analytics to predict what people will want to read based on the time of day or day of the week. This helps them make sure they have relevant content available at all times for their audience.

Predictive analytics is a process of extracting information from data sources to forecast the future. It is an approach that allows companies to use past data and trends to predict future outcomes.

Predictive analytics can be used for both customer engagement and content generation. For example, it can be used for customer service by predicting customer behavior and needs. This way, businesses are able to prepare for the needs of their customers before they even contact them. Predictive analytics can also help with content generation by predicting what content will resonate with customers and what topics people are interested in.

Predictive analytics is an important part of any companys marketing strategy. It helps them know their customers better and provides a more personalized experience for them.

Related:How Predictive Analytics Can Help Your Business See the Future

Machine learning is a subset of artificial intelligence that helps with predictive analytics. It supports your business decisions by providing insights into what will happen in the future.Machine learning has been used for years to help make predictions about the stock market. It is now being used to help make predictions about content as well.

Machine learning can be used to predict what kind of content will be popular, what topics people are interested inand even how long content should be before it gets boring. This type of AI saves both time and money by optimizing your content strategy for you!

Machine learning is the way of the future. It will help you create content that is relevant to your audience and that will resonate with them. You should start using it now to supercharge your content creation efforts.

Follow this link:
What Is Machine Learning, and How Can It Help With Content Marketing? - Entrepreneur

The Guardian view on bridging human and machine learning: its all in the game – The Guardian

Last week an artificial intelligence called NooK beat eight world champion players at bridge. That algorithms can outwit humans might not seem newsworthy. IBMs Deep Blue beat world chess champion Garry Kasparov in 1997. In 2016, Googles AlphaGo defeated a Go grandmaster. A year later the AI Libratus saw off four poker stars. Yet the real-world applications of such technologies have been limited. Stephen Muggleton, a computer scientist, suggests this is because they are black boxes that can learn better than people but cannot express, and communicate, that learning.

NooK, from French startup NukkAI, is different. It won by formulating rules, not just brute-force calculation. Bridge is not the same as chess or Go, which are two-player games based on an entirely known set of facts. Bridge is a game for four players split into two teams, involving collaboration and competition with incomplete information. Each player sees only their cards and needs to gather information about the other players hands. Unlike poker, which also involves hidden information and bluffing, in bridge a player must disclose to their opponents the information they are passing to their partner.

This feature of bridge meant NooK could explain how its playing decisions were made, and why it represents a leap forward for AI. When confronted with a new game, humans tend to learn the rules and then learn to improve by, for example, reading books. By contrast, black box AIs train themselves by deep learning: playing a game billions of times until the algorithm has worked out how to win. It is a mystery how this software comes to its conclusions or how it will fail.

NooK nods to the work of British AI pioneer Donald Michie, who reasoned that AIs highest state would be to develop new insights and teach these to humans, whose performance would be consequently increased to a level beyond that of a human studying by themselves. Michie considered weak machine learning to be just improving AI performance by increasing the amount of data ingested.

His insight has been vindicated as deep learnings limits have been exposed. Self-driving cars remain a distant dream. Radiologists were not replaced by AI last year, as had been predicted. Humans, unlike computers, often make short work of complicated, high-stake tasks. Thankfully, human society is not under constant diagnostic surveillance. But this often means not enough data for AI is available, and frequently it contains hidden, socially unacceptable biases. The environmental impact is also a growing concern, with computing projected to account for 20% of global electricity demand by 2030.

Technologies build trust if they are understandable. Theres always a danger that black box AI solves a problem in the wrong way. And the more powerful a deep-learning system becomes, the more opaque it can become. The House of Lords justice committee this week said such technologies have serious implications for human rights and warned against convictions and imprisonment on the basis of AI that could not be understood or challenged. NooK will be a world-changing technology if it lives up to the promise of solving complex problems and explaining how it does so.

Read the rest here:
The Guardian view on bridging human and machine learning: its all in the game - The Guardian

Predictive analytics and Machine Learning crucial to energy – Energy Digital

Predictive analytics and Machine Learning (ML) have a critical role to play in energy decarbonisation, according to a new report from data specialists CKDelta.

Pioneering cross-sector change and collaborationcalls for greater collaboration in the utilities sector, to fulfil its climate ambitions.

Managing Director Geoff McGrath said the potential to integrate this data across the value chain means we can re-conceptualise how we think about, and deploy, systems with both embedded and adaptive intelligence to optimise system performance, without compromising the net zero goal.

The utilities sector is at a watershed moment," he said. "The eyes of consumers and regulators are firmly fixed on electricity, water, and gas providers across the UK. Cost, environmental impacts, and consumer satisfaction are changing the way the sector delivers for customers."

Combining insight from water services provider Northumbrian Water Group, the report examines how the utilities sector can address four key challenges facing the industry today including leak reduction, shifting patterns of usage, and the emergence of a new energy economy by deploying open-source, data-driven models.

It comes at a time when electricity, gas, and water companies are coming under increasing regulatory and consumer scrutiny and the sector is driving forwards with ambitious environmental targets. The water sector has committed to delivering net zero emissions by 2030, while the government has committed to decarbonising the electricity grid by 2035.

Highlighting shifting patterns of energy and water usage as a core challenge to achieving these targets, the report states that we need integrated solutions that can accurately accommodate and predict both emerging and static trends.

It identifiespredictive data models developed from Machine Learning with high-frequency data as one such solution, noting that these models could also play a key role in optimising existing systems and networks.

The report goes on to suggest that companies and their investors should rethink their approach to effectively address thechallenges posed by delivering a low carbon future, adopting whole systemsmodels to gain visibility of competing aims across networks. These models empower organisations to holistically assess alternative energy and investment needs against other commercial targets, such as cost reduction.

CKDelta conclude their report with four recommendations, which are designed to foster an environment of collaboration and change, transparency and openness, and deliver on the sectors net zero ambitions.

These recommendations include putting the consumer at the heart of organisational decision making, using integrated data sources at all stages of the value chain, and keeping whole systems models at the forefront when deploying new infrastructure.

Nigel Watson, Chief Innovation Officer at Northumbrian Water Group, said as we near the halfway mark on AMP7 (Asset Management Period), we are now starting to shape and share what our plan will be for AMP8.

"We have already set our own ambitious target to reach net zero by 2027," he said. "What is becoming clear is the need to collaborate on how this is achieved and how we understand and utilise the tools that will deliver on our bold environmental ambitions. The insights offered from open data are ultimately what will help us to drive the systemic responses to these challenges and help enable the transition to net zero in our industry.

Read this article:
Predictive analytics and Machine Learning crucial to energy - Energy Digital

Deus in the Machina: Machine-Learning Corrections for Improved Position Accuracy – Inside GNSS

A novel method for improving the positioning accuracy of GNSS receivers exploits a machine learning (ML) algorithm. The ML model uses the post-fit residuals, which are readily available after the position computation from the position, velocity and timing (PVT) engine, adoptable by existing receivers without requiring any modification. The performance of this method, demonstrated using data collected with mass-market receivers as well as a Google public dataset collected with Android smartphones, shows the practicality of the concept.

GIANLUCA CAPARRA, PAOLO ZOCCARATO, FLOOR MELMAN

EUROPEAN SPACE AGENCY

GNSS receivers are proneto multipath errors. Increased receiver complexity, cost and power consumption constitute the main drawbacks of mitigation approaches relying upon designs of transmitted signals that allow a better multipath rejection and dedicated signal-processing techniques at baseband level. A mass-market receiver generally includes limited multipath rejection at antenna and baseband processing and applies some sort of filtering in the positioning engine. For instance, it is rather common to adopt a Kalman filter or pseudorange smoothing with phase measurements, as they produce a smoother and more accurate trajectory.

The errors related to atmospheric effects are instead usually compensated using atmospheric models or differencing with measurements from close-by reference receivers.

Recently, the use of machine learning with 3D mapping has been proposed to increase the accuracy of GNSS receivers. The performances achieved by this method are promising. The main disadvantage lies in the fact that it requires high-quality 3D maps to work. Moreover, these maps must be updated continuously to avoid introducing undesired biases.

Here, an approach that only needs information directly available in the GNSS receivers, and hence requires no information about the surroundings, attacks the multipath problem. The approach aims to create a ML model to be used after the positioning engine. This model estimates the positioning error exploiting the post-fit residuals and applies a correction to the PVT to compensate this error. The receiver still needs to receive the corrections from the trained model, but this model can be built based on GNSS results only. The ML regression acts as a sort of adaptive filter, able to cope with different environmental conditions automatically, properly adjusting the estimated position with a 3D error compensation. The ML feedback aims to map the directional pseudorange residuals to a 3D correction of the computed PVT.

The advantage of this approach is that it can be integrated in the current generation of receivers at software level, without requiring any hardware revision. It can even be deployed as a third-party service for the receivers that provide some basic additional information on top of the PVT.

The GNSS receiver estimates its position by deriving the distance from at least four satellites with known ephemerides. The signals contain information on the satellite positions and the transmission time, which is referenced to the system time. The receiver records the reception time, according to its local reference clock, and then estimates the distance by computing the propagation time. This distance estimation is usually referred to as pseudorange, as it includes the geometric range plus several errors sources, e.g. synchronization error between the local reference clock and the system time, atmospheric effects and multipath. The pseudorange is the most used measurement adopted in the position estimation process.

The receiver position can be estimated by solving the navigation equations either using a weighted least square (WLS) solution or a Kalman filter. Due to the noise embedded in the input measurements, the best-fit of the state vector, which includes the position parameters, will have some differences with regard to the measured pseudoranges. The differences between the observed and modeled (from the estimated solution) measurements are usually referred to as residuals. These can be either the innovation residuals, if Kalman filter is used, or the pseudorange residuals, if LS/WLS is used.

Our concept uses the residuals to train an ML model capable of predicting the 3D positioning errors, i.e. the differences between the estimated positions and the reference trajectory. The residuals are already present in the GNSS receivers: therefore, it does not require additional sensors for gathering external information.Figure 1andFigure 2present the detailed system architecture for the training phase and the usage phase.

The residuals, together with azimuth and elevation, can be projected into the navigation reference frame. The residuals are projected for all the signals used in the PVT computation.

These projected pseudorange residuals are the main input to the ML algorithm, which can be complemented by additional information, such as the carrier to noise ratio (C/N0) or other quality/reliability indicators. The ML model uses this information to estimate a positioning error. Depending on the application, the positioning error can be 3D or limited to the horizontal plane. It is possible to use any convenient reference frame. The approach described here is for a 3D positioning error, for instance in an East North Up (ENU) reference frame.

The training phase consists of finding a model that relates the residuals with the difference of estimated positions with respect to the reference trajectory, i.e. the position errors. The positioning error estimated by the ML model can then be subtracted from the position provided by the GNSS receiver, increasing the positioning accuracy.

When available, additional information can be included in the machine-learning model. This is left to future works. For instance, adding a label derived from the position indicating if the current area is rural or urban might help the ML algorithm to achieve better performances. Eventually, if the position itself (or a quantized version) is used in the ML model, the model can act as a sort of raytracing, because the model will learn how statistically the rays reflect in a certain environment, as a function of the azimuth and elevation. However, this would require an enormous amount of data. For this reason, this aspect was not taken into account for the time being. In this scenario, the algorithm would not be independent anymore from external information about the environment in which the receiver operates.

The concept has been tested with data collected from mass-market receivers. Three data analyses were performed:

Single-frequency (SF) multi-constellation PVT solution from a mass-market receiver, used as a black box;

Dual-frequency (DF) multi-constellation PVT solution from a mass-market receiver, used as a black box;

Smartphone measurements (Google Smartphone Decimeter Challenge) using a single frequency PVT engine.

In the first two experiments, data was collected with mass-market receivers during test campaigns and field trials carried out in 2020-2021 in the Netherlands, targeting two main environments: a rural/open sky scenario and a deep-urban scenario in Rotterdam, (with parts of the data acquisition in highways). The datasets consist of several runs, for a total of about 117k, 175k and 95k epochs and are represented inFigure 4.A reference trajectory for ground-truth obtained with a high-end GNSS/IMU is also available for each trajectory.

In the experiments, each epoch was considered independently. Therefore, temporal correlation among epochs has not been exploited. The temporal correlation can be achieved by using a recurrent neural network (RNN), such as long short-term memory (LSTM), but this is left to future work.

The ML algorithm has been implemented as a neural network using Tensorflow. The neural network architecture is reported inFigure 3,showing an example with four layers, where the last one is fully connected, with three neurons (one for each direction of the coordinate system) and without activation, which provides the error estimation per component. Using a single ML model for all three components simultaneously (multi-target regression) allows capturing the correlations among the components. The algorithm can be tuned to work in different configurations. For instance, in the experiment with smartphone data, only the horizontal errors have been considered, leading to a final layer with only 2 neurons.

For the proof of concept, a neural network with six layers of respectively [2048, 2048, 512, 256, 32, 3] units was used. Each layer is followed by a rectified linear unit (ReLU) activation function. Limited optimizations of the neural network and of the training phase have been performed.

Within each dataset, 70% was used for training, 15% for validation, and 15% for testing.Figure 5, Figure 6andFigure 8show the results of the tests (taken from a random 15% of the dataset never presented to the ML model during training and validation), comparing the cumulative distribution function (CDF) of the positioning error with regard to the reference trajectory before and after the ML algorithm application on the PVT solution. Notably, the ML algorithm effectively improves the positioning accuracy.

Figure 7provides a visual representation of the results, taken from the Rotterdam city center area. The data are taken from the DF test. The positions from the reference trajectory are represented in blue, the positions computed by the mass-market receiver are depicted in red, and the one after the ML corrections in green. The corrected positions generally lie closer to the reference trajectory than the positions computed by the mass-market receiver.

Table 1andTable 2report the summary of the results for horizontal and 3D errors respectively. The positioning errors are reported for different percentiles, together with the improvement in accuracy both in absolute and relative terms. It is interesting to note that at the 95th percentile, the improvement ranges between 7.7% and 38% for the horizontal errors, and between 23.5% and 40.8% for 3D errors.

To assess the impact of the complexity of the neural network on the performances of the corrections, a smaller variant of the neural network composed by four layers of respectively [512, 256, 32, 3] units was tested. One iteration of the smaller model (NN1) requires around 50 s on the laptop CPU (Intel i7-8650U) while the storage of the trained model parameters requires 2 MB. This allows it to be easily deployed to receivers. The bigger model (NN2) requires around 600 s per iteration, and around 60 MB of storage. The results are reported inFigure 9. Note that NN2 achieves better results than NN1.

We have introduced the concept of machine learning corrections for improving the accuracy of the GNSS receivers positioning. The advantage of this method is that it does not require changes in the architecture of the GNSS receivers and can be deployed as a software service.

The concept has been demonstrated in three experiments using real-world data collected with mass-market receivers and smartphones, showing that it is possible to achieve a significant accuracy improvement, even with a neural network of modest size.

Future work will expand the dataset size, increasing also the variety of environments, to better assess the generalization of the ML model, and explore different ML architectures, e.g. investigating the benefit of LSTM for capturing temporal correlation among the epochs. Another interesting direction of research will be to explore the potential benefits for high accuracy positioning techniques, such as PPP (-AR) or RTK.

This article is based on material presented in a technical paper at ION GNSS+ 2021, available ation.org/publications/order-publications.cfm.

(1)G. Fu, M. Khider, F. van Diggelen, Android Raw GNSS Measurement Datasets for Precise Positioning, Proceedings of ION GNSS+ 2020, September 2020, pp. 1925-1937.

(2)Martn Abadi, et al., TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.

(3)F van Diggelen, End Game for Urban GNSS: Googles Use of 3D Building Models,Inside GNSS, March 2021

(4)G. Caparra, Correcting Output of Global Satellite Navigation Receiver, PCT/EP2021/052383

Gianluca Caparrareceived a Ph.D. in information engineering from the Universit Degli Studi di Padova, Italy. He is currently a radio-navigation engineer with the European Space Agency. His research interests include positioning, navigation, and timing assurance, cybersecurity, signal processing, and machine learning, mainly in the context of global navigation satellite systems.

Paolo Zoccaratoholds a Ph.D. in science technology and measurements for space on precise orbit determination from the University of Padova, Italy. He worked at Curtin University as a PostDoc on PPP-RTK and in Trimble TerraSat GmbH on VRS and RTx. He is a radio-navigation engineer consultant at ESA/ESTEC, contributing mainly on real-time reliable high-accuracy positioning for different GNSS receiver types, sensors, environments and systems.

Floor Melmanreceived a masters degree in aerospace engineering from the Delft University of Technology (TU Delft), the Netherlands. He now works as a radio-navigation engineer at ESA/ESTEC. His main areas of work include PNT algorithms (in harsh environments) and GNSS signal processing.

See the rest here:
Deus in the Machina: Machine-Learning Corrections for Improved Position Accuracy - Inside GNSS