5 Ways Data Scientists Can Advance Their Careers – Spiceworks News and Insights

Data and machine learning people join companies with the promise of cutting-edge ML models and technology. But often, they spend 80% of their time cleaning data or dealing with data riddled with missing values and outliers, a frequently changing schema, and massive load times. The gap between expectation and reality can be massive.

Although data scientists might initially be excited to tackle insights and advanced models, that enthusiasm quickly deflates amidst daily schema changes, tables that stop updating, and other surprises that silently break models and dashboards.

While data science applies to a range of roles, from product analytics to putting statistical models in production, one thing is usually true: data scientists and ML engineers often sit at the tail end of the data pipeline. Theyre data consumers, pulling it from data warehouses or S3 or other centralized sources. They analyze data to help make business decisions or use it as training inputs for machine learning models.

In other words, theyre impacted by data quality issues but arent often empowered to travel up the pipeline earlier to fix them. So they write a ton of defensive data preprocessing into their work or move on to a new project.

If this scenario sounds familiar, you dont have to give up or complain that the data engineering upstream is forever broken. Make like a scientist and get experimental. Youre the last step in the pipe and putting models into production, which means youre responsible for the outcome. While this might sound terrifying or unfair, its also a brilliant opportunity to shine and make a big difference in your teams business impact.

Here are five things data scientists and ML analysts get out of defense mode and ensure that even if they didnt create data quality issues, theyd prevent them from impacting the teams that rely on data.

Business executives hesitate to make decisions based on data alone. A KPMG report showed that 60% of companies dont feel very confident in their data, and 49% of leadership teams didnt fully support the internal data and analytics strategy.

Good data scientists and ML engineers can help by increasing data accuracy, then getting it into dashboards that help key decision-makers. In doing so, theyll have a direct positive impact. But manually checking data for quality issues is error-prone and a huge drag on your velocity. It slows you down and makes you less productive.

Using data quality testing (e.g. with dbt tests) and data observability helps to ensure you find out about quality issues before your stakeholders do, winning their trust in you (and the data) over time.

Data quality problems can easily lead to an annoying blame game between data science, data engineering, and software engineering. Who broke the data? And who knew? And who is going to fix it?

But when bad data goes into the world, its everyones fault. Your stakeholders want the data to work so that the business can move forward with an accurate picture.

Good data scientists and ML engineers build accountability for all data pipeline steps with Service Level Agreements. SLAs define data quality in quantifiable terms, assigning responders who should spring into action to fix problems. SLAs help avoids the blame game entirely.

Trust is so fragile, and it erodes quickly when your stakeholders catch mistakes and start blaming. But what about when they dont catch quality issues? Then the model is poor, or bad decisions are made. In either case, the business suffers.

For example, what if you have a single entity logged as Dallas-Fort Worth and DFW in a database? When you test a new feature, everyone in Dallas Fort-Worth is shown as variation A and everyone in DFW is shown variation B. No one catches the discrepancy. You cant conclude users in the Dallas Fort-Worth area your test has been thrown off, and the groups havent been properly randomized.

Clear the path for better experimentation and analysis through a foundation of higher quality data. By using your expertise to boost quality, your data will become more reliable, and your business teams can run meaningful tests. The team can focus on what to test next instead of doubting the results of the tests.

Confidence in the data starts with you; if you dont have a handle on high-quality and reliable data, youll carry that burden into your interactions with the product and your colleagues.

So stake your claim as the point-person for data quality and data ownership. You can have input into defining quality and delegating responsibility for fixing different issues. Remove friction between data science and engineering.

If you can lead the charge to define and boost data quality, youll impact almost every other team within your organization. Your teammates will appreciate the work you do to reduce org-wide headaches.

Incomplete or unreliable data can lead to terabytes of wasted data. That data lives in your warehouse, getting included in queries that incur compute costs. Low-quality data can be a major drag on your infrastructure bill as it gets included in the filtering-out process time and again.

Identifying complex data is one way to immediately create value for your organization, especially for pipelines that see heavy traffic for product analytics and machine learning. Recollect, reprocess, or impute and clean existing values to reduce storage and compute costs.

Keep track of the tables and data you clean up, and the number of queries run on those tables. Its essential to notify your team about how many questions are no longer running on junk data and how many gigs of storage are freed up for better things.

All data professionals, seasoned veterans, and newcomers should be indispensable parts of the organization. You add value by taking ownership of more reliable data. Although tools, algorithms, and analytics techniques are growing more sophisticated, often the input data is not its always unique and business-specific. Even the most sophisticated tools and models cant run well on erroneous data. The impact of data science can be a boon to your entire organization through the above five steps. Everyone wins when you improve the data your teams depend upon.

Which techniques can help data scientists and ML engineers streamline the data management process? Tell us on Facebook, Twitter, and LinkedIn. Wed love to know!

Read more:
5 Ways Data Scientists Can Advance Their Careers - Spiceworks News and Insights

Related Posts
This entry was posted in $1$s. Bookmark the permalink.