Data Engineering and DevOps: The unsung hero’s of business AI

arjun dhar
3 min readNov 9, 2023

What Business pursuing AI need to be aware of

With so much hype surrounding AI and its applications to consumers and business. While consulting on some business driven cases; I felt my duty to educate people about the role of Data Engineering.

Data-Engineering contribution to business AI

TL;DR of how AI works in business

AI in the business context of supervised learning, is a machine that learns from data. For a given input and known expected result, it reduces the error between what it thinks is the right answer and known answer (Train & validate steps), over several iterations (epochs) to create a model that is close enough (unbiased) yet not an exactly replay of the input (overfitting). In other words, the model does not memorize the input mapping to output but rather understands (learns), how the input results in an expected output.

This implies having data to train the model. The quality of data is also very important.

Its all About Data

for good AI one needs good data …and in most cases; lots of it!

Data has to be sourced from multiple systems and databases. The data must be cleaned, integrated, resolved (related), verified and even labelled from time to time; that too at scale.

This requires specialized focus on Data Engineering among some other pre requisites. Before one can think of applying AI, one must also perform some pre-processing steps:

  1. Data Engineering & DevOps
  2. Exploratory data analysis or EDA: Applying domain knowledge over the data as Correlation does not imply causation.
  3. Data processing (part of Data Engineering, relevant to the AI model)
  4. Feature Engineering: Applying domain knowledge, to transform raw data into features that can be used in machine learning algorithms. In some texts this is considered part of EDA. In my opinion, this requires a different intent and skillset; hence best to illicit.
  5. Actual AI modeling

As we can see from the above; before you get to AI; 1st and 3rd steps are Engineering tasks, while the other 2 are one that require deep domain knowledge. Each layered over the previous step. Since its assumed for any organization to have domain knowledge, and assumed that if you embark on an AI journey you will have data science expertise in house or hired.

What about the Data Engineering tasks?

Sadly, most new age Data Scientists are aware of Data engineering, but not qualified to perform the task itself. Most organizations obsessed with AI are not even aware of data engineering and rely on their developers and data scientists to get it done. What most are unaware of is that

Software engineering and Data Science, does not imply data engineering capabilities.

The following diagram illustrates why and the skill sets needed to get this done.

Data scientists can model however when it comes to delivering that AI model in production @ scale…

Data scientists often have to rely on the expertise of MLOps and Data Engineers. Data Engineers in turn must rely on DevOps also.

It’s not even about just capability. MLOps (a Subset of DevOps relevant to AI) and Data Engineering are full time jobs that concern data infrastructure, while the Data scientist is more focused on the business applicability of the AI model.

Data Engineering specializes from Software engineering, when it comes to dealing with Big Data, tooling, infrastructure, event driven architecture as it focus on consuming data @ scale rather than generation of application data or application features.

Conclusion

DevOps and Data Engineering are bedrocks of any sustainable production grade business AI. When planning, do not ignore these two.

--

--

arjun dhar

Software development enthusiast since I was 8 yrs old. Love communicating on anything regarding innovation, community development … ∞