Want To Be AI-First? You Need To Be Data-First. – Forbes

Posted: January 31, 2020 at 9:46 am

Data First

Those that implement AI and Machine Learning project learn quickly that machine learning projects are not application development projects. Much of the value of machine learning projects rest in the models, training data, and configuration information that guides how the model is applied to the specific machine learning problem. The application code is mostly a means to implement the machine learning algorithms and "operationalize" the machine learning model in a production environment. That's not to say that application code is not necessary after all, the computer needs some way to operationalize the machine learning model but focusing a machine learning project on the application code is missing the big picture. If you want to be AI-first for your project, you need to have a data-first perspective.

Use data-centric methodologies and data-centric technologies

Therefore it follows that if you're going to have a data-first perspective, you need to use a data-first methodology. There's certainly nothing wrong with Agile methodologies as a way of iterating towards success, but Agile on its own leaves much to be desired as it's focused on functionality and delivery of application logic. There are already data-centric methodologies out there that have been proven in many real-world scenarios. One of the most popular is the Cross Industry Standard Process for Data Mining (CRISP-DM), which focuses on the steps needed for successful data projects. In the modern age, it makes sense to merge the notably non-agile CRISP-DM with Agile Methodologies to make it more relevant. While this is still a new area for most enterprises implementing AI projects, we see this sort of merged methodology approach to be more successful than trying to shoehorn all the aspects of an AI project into existing application-focused Agile methodologies.

It stands to reason that if you have a data-centric perspective on AI then you need to pair your data-centric methodologies with data-centric technologies. This means that your choice of tooling to implement all those artifacts detailed above need to be, first and foremost, data-focused. Don't use code-centric IDEs when you should be using data notebooks. Don't use enterprise integration middleware platforms when you should be using tools that focus on model development and maintenance. Don't use so-called machine learning platforms that are really just a pile of cloud-based technologies or overgrown big data management platforms. The tools you use should support the machine learning goals you need, which are in turn supported by the activities you need to do and the artifacts you need to create. Just because a GPU provider has a toolset doesn't mean that it's the right one to use. Just because a big enterprise vendor or a cloud vendor has a "stack" doesn't mean it's the right one. Start from the deliverables and the machine learning objectives and work your way backwards.

Another big consideration is where and how machine learning models will be deployed - or in AI-speak "operationalized". AI models can be implemented in a remarkably wide range of places from "edge" devices sitting disconnected from the internet to mobile and desktop applications; from enterprise servers to cloud-based instances; and all manner of autonomous vehicles and craft. Each of these locations is a place where AI models and implementations can and do exist. This amount of model operationalization heterogeneity highlights even more so how ludicrous the idea of a single machine learning platform is. How can one platform at the same time provide AI capabilities in a drone, mobile app, enterprise implementation, and cloud instance. Even if you source all this technology from a single vendor, it will be a collection of different tools that sit under a single marketing umbrella rather than a single, cohesive, interoperable platform that makes any sense.

Build data-centric talent

All this methodology and technology can't assemble itself. If you're going to be successful at AI projects you're going to need to be successful at building an AI team. And if the data-centric perspective is the correct one for AI, then it makes sense that your team also needs to be data-centric. The talent to build apps or manage enterprise systems or data is not the same to build AI models, tune algorithms, work with training data sets, and operationalize ML models. The primary core of your AI team needs to be data scientists, data engineers, and those folks responsible for putting machine learning models into operation. While there's always a need for coding, development, and project management, finding and growing your data-centric talent is key to long term success of your AI initiatives.

The primary challenge with building data talent is that it's hard to find and grow. The primary reason for this is because data isn't code. You need folks who know how to wrangle lots of data sources, compile them into clean data sets, and then extract information needles from data haystacks. In addition, the language of AI is math, not programming logic. So a strong data team is also strong in the right kinds of math to understand how to select and implement AI algorithms, properly tweak hyperparameters, and properly interpret testing and validation results. Simply guessing about and changing training data sets and hyperparameters at random is not a good way to create AI projects that deliver value. As such, data-centric talent grounded in a fundamental understanding of machine learning math and algorithms combined with an understanding of how to deal with big data sets is crucial to AI project success.

Prepare to continue to invest for the long haul

It should be pretty obvious at this point that the set of activities for AI are indeed very much data-centric and the activities, artifacts, tools, and team need to follow from that data-centric perspective. The biggest challenge is that so much of that ecosystem is still being developed and is not fully available for most enterprises. AI-specific methodologies are still being tested in large scale projects. AI-specific tools and technologies are still being developed, enhanced, and evolutionary changes are being released on a rapid scale. AI talent continues to be tight and is an area where we're just starting to see investment in growth of this skill set.

As a result, organizations that need to be successful with AI, even with this data-centric perspective, need to be prepared to invest for the long haul. Find your peer groups to see what methodologies are working for them and continue to iterate until you find something that works for you. Find ways to continuously update your team's skills and methods. Realize that you're on the bleeding edge with AI technology and prepare to reinvest in new technology on a regular basis, or invent your own if need be. Even though the history of AI spans at least seven decades, we're still in the early stages of making AI work for large scale projects. This is like the early days of the Internet or mobile or big data. Those early pioneers had to learn the hard way, making many mistakes before realizing the "right" way to do things. But once those ways were discovered, organizations reaped big rewards. This is where we're at with AI. As long as you have a data-centric perspective and are prepared to continue to invest for the long haul, you will be successful with your AI, machine learning, and cognitive technology efforts.

Excerpt from:

Want To Be AI-First? You Need To Be Data-First. - Forbes

Related Posts