Skip to site navigation Skip to main content Skip to footer content Skip to Site Search page Skip to People Search page

Bylined Articles

Mitigating AI Bias with Responsible AI Design

By Dr. Agatha H. Liu
Fall 2023
California Lawyers Association New Matter

Mitigating AI Bias with Responsible AI Design

By Dr. Agatha H. Liu
Fall 2023
California Lawyers Association New Matter

Read below

Now that artificial intelligence (AI) is employed widely with unprecedented consequences, there is quite a scramble to implement mitigating measures. For example, the United Trademark and Patent Office (USPTO) is soliciting public comments on what steps the USPTO should take to mitigate harms and risks from AI-enabled invention. Many of the proposed guardrails are applicable to the deployment of AI technology, to conform original output of the AI technology to desired principles, policies, guidelines, etc. However, it is no less valuable to improve the design of the AI technology, especially when various computational techniques can be readily applied.

One fundamental issue with the AI technology is producing inaccurate output, with random, sporadic errors or, more damagingly, systemic deviations leading to bias. This article presents a systematic review of how computational techniques can be utilized to help mitigate such bias. The fundamental issue is not new, and at least some of the relevant computational techniques have respectable histories and wide-ranging applications. It is a good time to evaluate these computational techniques cohesively in the context of reducing the bias coded into or produced by the AI technology. Such evaluation can shed more light on how technology can always be improved by additional technology, and how creators of the AI technology can be more responsible and be incentivized to be so.

AI today mostly focuses on machine learning (ML), even though AI also encompasses simulating other aspects of natural intelligence. To briefly summarize ML, an ML model (algorithm) is built from a training dataset and is typically a classification model or a generation model. Each sample in the training dataset is represented by a set of features. An ML classification model essentially learns from the training dataset what the features are and how they correlate with the known classes. Once trained, the ML classification model can be executed on a new sample to make predictions, typically by assigning a new sample to one of those classes. For example, each sample could correspond to a person, the features can include the age and gender of a person, and the classes may correspond to different shopping patterns. On the other hand, an ML generation model essentially learns from the training dataset what the features are and how they characterize the relationships among the samples.

Once trained, the ML classification model can be executed on a new sample to predict new samples that satisfy those relationships. For example, the samples could be words in an article, and the predicted new sample could be the next word in a given article.

Inaccurate output of the AI technology in the form of biased predictions could be attributed to inadequacies of the training dataset or the algorithm. The technological solutions to eliminating or reducing such biased predictions can come in different forms. They include improving the distribution of features within each sample, the distribution of samples in the training dataset, and the structure and configuration of the algorithm. They also include making different aspects of the AI technology more understandable, which can in turn facilitate improving the training dataset or the algorithm. We discuss some of these technological solutions below.

Improving Data Representativeness

When we consider each sample that has various features in a training dataset, the issue of a skewed feature set can arise, where the features are not sufficiently independent. In a simple example, two correlated features, such as age and physical strength, essentially lead to double counting. One consequence of a skewed feature set is that a feature may be weighted inadequately to either inflate its own importance or mask the importance of another feature for the purpose of making predictions. There exist computational techniques to identify and eliminate heavily correlated features, such as principal component analysis that can rectify the skewed nature of the feature set. With a more balanced feature set, the various features are more likely to be represented in proportion to their prediction significance.

When we then look at all the samples in a training dataset, we can run into the issue of an imbalanced training dataset, where the number of original samples in specific classes or the number of original samples having specific types of feature values is too small. One consequence is that the ML model would not quite know how to recognize samples that should fall into the specific class or might falsely classify samples into the specific class. Another consequence is that the ML model incorrectly overlooks the relationships between samples that should fall into the specific class and other samples.

The imbalanced training dataset could have been caused by many reasons. Maybe the number of samples was small in the first place. For example, when the goal is to learn about the tourist behavior in a particular country, maybe simply not enough people have visited the country. Maybe the scope of sample collection was limited. For example, the samples collected from a city may deviate much from the distribution of samples in the entire country. Maybe the manner of collection had inherent restrictions. For example, web scraping can collect only data that people have been willing to share about their travel experiences on their websites, and certain people tend to share more information online than others. Understanding the source of the imbalance can help identify an appropriate approach to tackle the imbalance, such as determining whether the underlying distribution of samples is known with respect to given classes.

The underlying distribution of samples may be known or estimable from historical data. When the underlying distribution of the samples is available, there exist under- or over-sampling techniques and data augmentation methods, such as the Synthetic Minority Oversampling Technique (SMOTE), to improve prediction results.

These computational techniques allow disparate levels of sampling or augmentation to conform the training dataset to a known sample distribution.

There is the tougher case when the underlying distribution of samples with respect to the given classes is unavailable. For example, in the medical field, how images of diseased organs (samples) correlate with different disease stages (classes) is often unknown at least because the same organs of different individuals could look rather different and the same disease can manifest in unique ways through different disease stages. There exist techniques for synthesizing training data that reflect the underlying distribution that is not already available. For example, flow-based generative models can estimate that underlying distribution with uncertainty information by transforming it to a simpler, target distribution, which enables generation of "realistic" samples by sampling from the estimated underlying distribution. These computational techniques therefore allow estimating a sample distribution and creating new samples to fit the training dataset to the estimated sample distribution. 

One fundamental approach to improving the training dataset is to update as new or better samples become available. These samples can be generated by the real world or be synthesized, as noted above, and they can be evaluated by humans for training purposes. However, there is the question of when to update the training dataset to reduce training effort while increasing overall prediction quality. Fortunately, there is a machine learning paradigm called reinforcement learning based on a statistical model that is readily applicable, for example. It is essentially controlled retraining in that the ML model is retrained (on new data not available during initial training) when a trigger condition is satisfied and the ML model is expected to converge as the policy that determines next steps in the learning environment is refined through the iterations. Such techniques allow the ML model to be retrained automatically and efficiently.

Improving Algorithm Neutrality

There is this concept of imagined objectivity where a computer does not discriminate. In reality, objectivity may exist to the extent that an ML model is designed based on a generic cognitive process with universal applicability. Existing machine learning algorithms are often inspired by biological processes that are universal across humans, such as how neurons transmit and route neural signals or how cells receive and carry genetic materials. However, the ML model can have various settings that control how different parts of the ML model function. These settings are sometimes called hyperparameters that enable learning parameters of the ML model from the training dataset. Typically, these hyperparameters are not automatically obtained from the learning process but are manually tuned when the ML model is initially developed. In a simple scenario, an ML classification model may have a threshold hyperparameter such that a new sample that has a score above the threshold would go to a first class and a new sample that has a score not above the threshold would go to a second class. To manually tune the threshold, a human may decide what the threshold should be based on what the human believes the classification results of new samples should look like.

To reduce potential bias resulting from the human decision, the tuning process can also be automated through hyperparameter optimization, which generally utilizes cross-validation techniques based on a hold-out validation or test dataset. The samples in the holdout dataset similar to those in the training dataset would have labels, and the holdout dataset similar to the training dataset should also be subjected to the representativeness requirements. In addition, learnable hyperparameters can be considered a parameter of an enhanced ML model and directly incorporated into the learning process, at the potential expense of aggressively increasing the capacity of the ML model.

Improving Technology Clarity

Attempts to improve the AI data or the algorithm can significantly benefit from an understanding of how and why a machine produced a biased result. Such an understanding may be enabled by a reveal of the training dataset, the hyperparameters of the ML model, and much more during the deployment phase, and the understanding can then be fed back to the design phase. Sometimes, the deficiencies of the training dataset or the ML model become more apparent as new samples flow through the ML model. Therefore, enlightening users as to how an ML model executes on a new sample would further contribute to improving the design of the AI technology. As the AI technology becomes more advanced and as the definition of bias also evolves, however, understanding how the AI technology may have generated bias is often no longer a simple task.

Some of the relatively recent discussion has made a distinction between the so-called AI explainability and interpretability. Explainability addresses the low level, the components of an AI algorithm and data (including the training dataset and the prediction output), essentially informing the what and some of the how.

Interpretability addresses the high level, the structure of an AI algorithm and its interaction with the data, essentially informing more of the how and the why. Our purpose is to assess how AI could be clarified and understood, and we will be referring to the explainability and interpretability interchangeably sometimes. 

Some conventional ML models (e.g., regressions and decision trees) provide inherent interpretability in terms of which features are given more weight in the ML model, for example. They tend to have a relatively simple structure that directly shows how the ML models operate. However, the more advanced, what is considered as deep learning techniques involve complex ML models with many layers, connections, and parameters that are difficult to visualize or otherwise make sense of. For these deep learning techniques, however, there currently exist some interpretable AI frameworks that helps explain how different feature values affect the output of a ML model or how they contribute to the prediction outcomes of the ML model. Examples of such interpretable AI frameworks include Local Interpretable Model-agnostic Explanation (LIME) and Shapley Additive exPlanation (SHAP), which are model-agonistic, and Layer-wise Relevance Propagation (LRP) and eXplainable Neural Network (XNN), which are specific to artificial neural networks. Specifically, LRP computes contribution scores and back propagates the scores across layers of a layered network from output variables to input variables, while XNN extracts features from a fully connected neural networks and builds additive index models to explain the relationship between input variables and the target variable.

Otherwise, deep learning techniques can be interpreted or at least clarified in relation to bias using external, quantitative metrics, such as disparate impact or equality of opportunity. These metrics also tend to be model- agnostic measures based on some "standardized" datasets and some "bias" definition, not to examine the inner workings of an ML model but to evaluate the output data of the ML model given different input data and see how the output data changes depending on the input data.

Such standardized datasets may not have enough coverage, though. Since bias could exist in different forms and coded into different aspects of the AI technology, rich datasets may be required to sufficiently expose the weaknesses of the AI technology. Therefore, just like crowd-based penetration tests could be applied to computer systems to identify bugs or threats that can be elusive, similar approaches could be applied to deep learning techniques to detect bias and other harmful effects. The main difference here is that unlike a cyberthreat that is clearly defined, the definition of bias itself or what should be measured by the bias metrics can be elusive or confusing.

It is also possible to simply examine the inner workings of ML models, including those involved in deep learning techniques, to improve the explainability of the AI technology. The computer user interface of a deep learning technique can be designed to inform users of how the ML model works by showing details at runtime regarding which hyperparameters are used, which input data is used, which values are given to parameters at different times, and so on. In this case, it would be especially helpful to map the entities within the ML model to concepts that can be understood by users and to show how data is being transformed by the ML model into final predictions. For example, it is relatively easy to enable users to follow the application of ML model to image data because the features of the samples in the input data comprising pixels that make sense visually can be directly visualized.

Explainable AI is not straightforward also because of additional restrictions in terms of privacy, IP protection, and so on. For example, there is certain interaction between AI explainability and data privacy. When privacy regulations make it impermissible to use personal information, explainable AI can help prove that an ML model does not violate the privacy regulations. Whether use of personal information is permitted, however, there remains the question of to what extent or in what form the training data which originally includes personal information should be made publicly accessible for the sake of helping people understand how the AI technique works. To strike a balance, there exist privacy enhancing techniques (PETs) to determine how to redact or otherwise anonymize given data to generate a training dataset that is sufficiently expressive but hides sensitive information for the purposes of human consumption. Examples of such techniques include differential privacy and data masking via nulling, scrambling, aging, and so on.


As discussed above, many existing computational techniques can be applied to help reduce bias or other harmful effects produced by the AI technology. Additional techniques in the same fields or novel fields are surely to be developed with the rapid adoption of the AI technology and thus rising concern for undesirable consequences.

By tracking the design of the AI technology, relevant issues are expected to be detected and addressed in a more expedient and comprehensive manner. The creators of the AI technology, having accumulated the pertinent knowledge and expertise, are in a unique position to study and enhance the design of the AI technology.

Further as discussed above, the USPTO recognizes the potential harms and risks of AI technology. Therefore, in the near future there will likely be a shift in the way the USPTO approaches the AI technology, with regards to not only assessing patentability, inventorship, or other aspects of the AI technology itself but also addressing the harms and risks of the AI technology. ML-related inventions, specifically inventive ML models, have been proliferating, and it is the computational techniques that regulate the behavior of ML models and ultimately diminish the negative impact of the AI technology that would be advantageous. Over the years, the USPTO has encouraged innovation in the areas of green technology, climate control, and covid protection in terms of lower costs, shorter examination time, and other incentives. A prime candidate for incentivization is now AI bias mitigation. The identification of technical fields and sample computation techniques in this article can serve as a starting point for implementing mechanisms to reward contributions to appropriately harnessing the potential of AI technology by the intellectual property community.

Reprinted with permission from New Matter (Fall 2023, Volume 48, Edition 3), published by the California Lawyers Association.