Building block 2
AI engineering excellence.
Coders are in short supply and overworked, and they suffer from burn-out and reduced productivity and efficiency. New tools are helping them cope with burnout and adding value to the organizations that deploy them.
Microsoft is an AI-first business investing heavily in this area, having recently committed $10 billion to research and development at OpenAI, solidifying its position as a leader in the AI industry.
Tools such as OpenAI’s Codex, GitHub’s Copilot and Open AI's ChatGPT can be used to complete lines of code and to find bugs. They also create code in most programming languages from a natural language prompt. AI-based techniques assist in other stages of the software development life cycle, including gathering requirements, design, deployment, and maintenance.
These AI tools, based on large language models, can perform a variety of tasks but demand heavy compute for inferencing. Infosys addresses this issue by building narrow transformers. Here, an appropriate smaller language model is used as the foundation model, and then fine-tuned with domain and task-specific data to augment a specialized task, such as code completion.
Full fine-tuning updates all parameters of the foundation model, demanding higher compute, and when there are multiple downstream tasks, full fine-tuning results in a new model for each task demanding more storage.
Infosys adopts parameter-efficient fine-tuning (PEFT) methods to build narrow transformers as these methods enable adaptation of foundation models to downstream tasks with fine-tuning only a small number of (extra) model parameters. This results in reduced computational and storage costs, while achieving performance comparable to that of full fine-tuning. These narrow transformers are built and served at scale without external compute, ensuring data security.
In the DevSecOps age, developers are not just responsible for writing code, but also for setting up the production environment, and assembling, integrating, customizing, and maintaining applications once in production. This burden is now carried by another 2023 software tech trend — platform engineering. This accelerates development velocity and reduces software engineering overload through self-service capabilities — standardized tools, components, and automated processes — offered via an internal development platform (IDP).
Gartner expects that by 2026, 80% of organizations will establish these platform teams as internal providers of reusable services, components, and tools for application delivery. Beyond AI-driven coding, these platforms encourage consistency and efficiency in software development and provide relief from the management of delivery pipelines and low-level infrastructure.
As AI tools evolve, the software development sector is in for an operational and financial windfall. If firms can motivate their workforce to upskill in AI prompt engineering, the future will be an AI-first, continuously learning and evolving organization — an AI-first Live Enterprise.
In the first AI wave (H1), systems and methods were predominantly used to supervise and optimize operations. In the AI-first era, these methods will pave the way to further improve operations and increase efficiency. We will see the onset of newer frontline ML methods such as reinforcement learning, backed by meta-learning-driven dynamic control.
Our paper on a self-driving cloud for greener business discusses this trend, where companies analyze data using ML techniques to make informed decisions and take proactive measures. The dynamic control feature also means the system can learn and adapt to new scenarios on the fly, driving even more efficient operations.
AI-first operations improve customer and employee experience through generative AI. Using AI assistants, an AI-first organization can handle internal and external user queries at pace, improving productivity and innovation — and then they also become companies that people want to work for.
Data-driven organizations that care about post-sales customer engagement improve employee retention, as the research for our Digital Radar 2023 report reveals. These AI systems use machine translation to interact with users, driving localized contextual conversations that add further data points and improve efficiencies and insights.
Many applications are possible, including voice assistants and recommendation systems. Developers and organizations can build applications and services that understand and interact with users in more natural and intuitive ways.
AI-first customer-centric operations systems deliver the following capabilities:
This technology is also well suited to solve problems across industries such as telecommunications, utilities, and retail.
Google’s Dialogflow is used by Optus, one of the largest telcos in Australia, to power virtual agents in a support application. Because the technology comes with prebuilt agents, in-depth programming knowledge is not required, and the technology can be rolled out much more quickly.
For example, prebuilt agents answer requests such as “I need help paying my bills” or “I haven’t received my order, where is it?” without requiring custom programming.
These capabilities extend to other industries too. Healthcare providers can extend health bot instances to include novel scenarios and integrate them with other IT systems and data sources.
AI platforms enable operations with self-heal capabilities, anomaly detection, automated monitoring, and alerting. For example, predictive maintenance is a great use case in the utilities sector.
The AI organizations of the future will need to respond in close to real time to queries across platforms, including mobile, web, chatbots, smart devices, interactive voice response systems, and messaging apps.
AI-first operations, using advances in NLP and transfer learning, are the future of time-limited, data-driven conversations, extending from internal support to customer contact centers.
In H2, data engineering was key: in our 2021 paper Scaling AI: Data over Models, we estimated that 25% to 60% of machine learning project costs at that time were spent on manual data labeling and validation.
Firms had to manage data lineage and build systems with active learning, in which a classifier examines unlabeled data and selects part of this data for further human labeling. For the process to operate effectively, machine learning systems needed to be efficient, scalable, and reliable. This landscape also required a central model repository and trustworthy AI practices.
Many firms are still working in H2, and technologies such as Azure ML, AWS SageMaker, MLFlow, and products such as DataRobot and Iguazio, are emerging as sources for model management, deployment, and managing training data. Meanwhile, our clients require online and offline feature storage for machine learning data management and monitoring.
We are now in the H3 era, and all eyes are on generative AI and the models that underpin these tools. Building these models is a complex process. It can take large firms several hundred days and thousands of CPUs and GPUs to deliver a new large language model.
Creators of these models now rely on MLOps to support scaling techniques, including data parallelism, pipeline parallelism, and tensor model parallelism.
With data parallelism, tasks are run in parallel, with data divided into partitions, and the models run on separate subsets of the data, increasing model training speed.
Model parallelism, as the name suggests, divides a massive complex model either vertically or horizontally, with different parts of the model running on the same data. In this way, Data + MLOps techniques increase operational efficiency for H3 technology providers.
It is now believed that companies like OpenAI and Google are harnessing generative AI methods to make their MLOps pipeline even more sophisticated, creating a meta-robot that can build even better robots.
For instance, ChatGPT's efficiency comes from chaining together several distinct models, starting with a regular large language model, creating a reward model with human feedback, and finally using reinforcement learning with human feedback (RLHF). This removes the operational burden from their MLOps teams as there is no need for the data engineering tasks used in H2.
In H3, the big question is whether MLOps will become obsolete for firms using ML technology. Will it fade into obscurity, or will it evolve to suit the needs of users of LLMs and generative models?
We believe that, even in organizations that buy out-of-the-box generative AI solutions, H3 models will be tailored to specific use cases, and will require MLOps to bring all components together to reduce operational complexity and increase velocity of AI products.
Systems will be created that integrate several generative AI models, forging a fusion of models that's greater than the sum of its parts.
With this in mind, firms should implement maximum automation across the entire gamut of data engineering and model life cycle management, ranging from training and inferencing to API abstraction and toolkit engineering.
This is vital due to the upcoming LLM landscape that includes content such as text, images, audio, etc., and multiple models that operate on disparate data.
To make this data mesh operational, we require more advanced AI factory operations that leverage MLOps. AI engineering life cycle management, part of our PolyAI suite of services (Figure 6), which sits under Infosys Topaz, is an Infosys approach to MLOps that enables data scientists to use ML development tools of their choice and train and deploy their models at enterprise scale without having to deal with engineering complexity.
Source: Infosys
The approach also supports multiple versions of leading AI frameworks such as TensorFlow, PyTorch and others, and maintains traceability of model artifacts while at the same time enabling versioning and sharing of artifacts among development teams.
AI systems should also be built to evolve, or improve, with time. In Data + AI Radar, we introduced the SURE taxonomy. Here, an AI system moves from Sensing, to Understanding, to Responding, and finally to Evolving. Evolve, therefore, is the most advanced type of AI system, with models that are self- supervised and incorporate RLHF.
At present, only 15% of firms (Figure 7), including the cloud giants and others such as Apple, Meta, OpenAI, and Netflix, are able to achieve these top-level evolutionary design capabilities.
Companies need advanced AI if they are to achieve the loftiest ambitions of AI and stand out from competitors.
For enterprises to leverage the foundation models used in generative AI, they need to do three things: first, they must acquire up-to-date knowledge; second, they need to perform advanced reasoning; and third, they must use actuation to make them more useful, such as automating business workflows.
If the system doesn’t have enough knowledge, this knowledge will have to be continually updated by the system from the outside-in. This could be enterprise domain knowledge or data gleaned from searching the internet. OpenAI stopped training GPT-4 in September 2021, so ChatGPT, which is based on that large language model, has no knowledge of events after that date. To hedge against poor outcomes, OpenAI uses evolutionary design principles: ChatGPT 4 has a modular plugin architecture so that other applications can plug into it and provide additional services, including up-to-date knowledge or insights.
If the question requires advanced reasoning capabilities, then the evolutionary architecture can use chain-of- thought (CoT) prompting — a series of intermediate reasoning steps — that increases the ability of LLMs to perform complex reasoning.
These systems must be trustworthy. For this, we need external control. An external control system — sort of like a master control plane such as LangChain — feeds the question or prompt to the foundation model and uses APIs to orchestrate behind-the-scenes plugins, gets the answer, and feeds the response back to the user.
Further, evolutionary systems such as OpenAI also use human preference data (such as asking for thumbs up/thumbs down prompts) to continuously revise itself and offer answers that get better over time. This is an example of RLHF, which drives improvement in the model over time.
This is one kind of evolutionary design, using a modular architecture to iron out deficiencies. Another approach, used by Microsoft’s Bing and DeepMind’s Sparrow, is to continuously update knowledge by retrieving metadata. Bing does periodic searches daily — retrieving information and then sending it to the LLM for training or fine tuning.
As compared with other report readers