Is your data enough for your machine learning/artificial intelligence program? – Zimo News


The development of artificial intelligence is a top priority for businesses and governments around the world. Yet a fundamental aspect of AI remains overlooked: poor data quality.

AI algorithms rely on reliable data to produce the best possible results – with devastating consequences if the data is biased, incomplete, inadequate and inaccurate.

Identifying artificial intelligence systems patient disease is a prime example of how poor data quality can lead to negative results. When data is insufficient, these systems produce false diagnoses and inaccurate predictions, leading to misdiagnosis and delayed treatment.For example, a study conducted at the University of Cambridge over 400 tools for diagnosing Covid-19 Discover completely unusable AI-generated reports caused by the wrong dataset.

In other words, if your data isn’t good enough, your AI initiatives will have devastating real-world consequences.

What does “good enough” data mean?

There’s an entire debate about what “good enough” data means. Some say there is not enough good data.Others say the need for good data has paralyzed analysis — while HBR speaks out If your information is bad, your machine learning tools will be useless.

At WinPure, we define good enough data as Complete, accurate, and valid data that can be confidently used in business processes with acceptable risk, the level of which depends on the company’s individual goals and circumstances.

Most companies struggle with data quality and governance more than they admit. Increased tension; they are overwhelmed and under enormous pressure to deploy AI programs to remain competitive. Unfortunately, this means that issues like dirty data won’t even be part of board discussions until they cause the project to fail.

How does poor data quality affect AI systems?

Data quality issues arise early in the process when algorithms learn patterns based on training data.For example, if an AI algorithm were fed unfiltered social media data, it would pick up abuse, racist comments, and misogynistic remarks such as Microsoft’s AI RobotRecently, AI’s inability to detect dark-skinned people has also been attributed to part of the data.

What’s the connection with data quality?

Lack of data governance, lack of awareness of data quality, and siloed views of data (this gender difference may have been noted) led to poor results.

What should I do?

When companies realize they have data quality issues, they panic hiring. Consultants, engineers, and analysts are hired blindly to diagnose, cleanse data, and fix problems as quickly as possible. Unfortunately, several months passed before any progress was made, and despite the millions of dollars spent on manpower, the problem didn’t seem to go away. Taking a knee-jerk approach to data quality issues is of little use.

Real change starts at the grassroots level.

Here are three key steps to take if you want your AI/ML project to go in the right direction.

Raise awareness and identify data quality issues

First, assess data quality by creating a culture of data literacy.Suggested by Bill Schmarzo, an influential voice in the industry design thinking Create a culture where everyone understands and contributes to the organization’s data goals and challenges.

In today’s business environment, data and data quality are no longer the sole responsibility of IT or the data team. Business users should be aware of issues such as dirty data issues and inconsistent and duplicated data.

So the first key thing to do is make data quality training an organizational task and enable teams to identify bad data attributes.

Here’s a checklist you can use to start a conversation about the quality of your data.

Data health checklist.Source: WinPure Corporation

Design a plan to achieve quality metrics

Companies often make the mistake of downplaying data quality concerns. Instead of focusing on planning and strategy work, they hire data analysts to perform day-to-day data cleaning tasks. Some companies use data management tools to clean, deduplicate, merge, and purge data without a plan. Unfortunately, tools and talent cannot solve problems in isolation. It would be useful to develop a strategy to address the dimensions of data quality.

The strategy should address data collection, labelling, processing, and determining whether data is relevant to an AI/ML project. For example, if an RN recruiting program selects only male candidates for technical positions, the program’s training data is clearly biased, incomplete (because it did not collect enough candidate data), and inaccurate. Therefore, this data does not serve the true purpose of an AI project.

Data quality goes beyond routine cleaning and correction tasks. It is best to develop data integrity and data governance standards before starting a project. This will prevent the project from failing later!

Ask the right questions and define responsibilities

There is no universal standard for “good enough data or data quality level”. Instead, it all depends on your company’s information management system, data governance guidelines (or lack thereof), and knowledge of your team and business goals, among many other factors.

Before starting a project, ask your team the following questions:

  • Where does our information come from and how is the data collected?
  • What issues affect the data collection process and threaten positive outcomes?
  • What information does the data provide? Does it meet data quality standards (that is, the information is accurate, completely reliable and consistent)?
  • Are designees aware of the importance of data quality and low quality?
  • Are roles and responsibilities defined? For example, who needs to maintain a regular data cleaning schedule? Who is responsible for creating the master record?
  • Is the data fit for purpose?

Ask the right questions, assign the right roles, set data quality standards, and help your team solve challenges before they become problems!


Data quality is not just about correcting typos or errors. It ensures that AI systems are not discriminatory, misleading or inaccurate. Before starting an AI project, it is necessary to fix flaws in the data and address data quality challenges. Additionally, launch an organization-wide data literacy program to connect each team to the bigger picture.

Frontline workers who process, process and label data need data quality training to identify deviations and errors over time.

Featured Image Credit: Author provided; thank you!

In-text pictures: provided by the author; thank you!


Farah Kim is a people-focused marketing consultant specializing in problem-solving and reducing complex information into actionable insights for business leaders. Since 2011, she has been involved in technology, B2B and B2C.

Source link


Please enter your comment!
Please enter your name here