Better than Human?

My 2023 AI Investment Thesis - Part 4

Feb 24, 2023

An image of a futuristic head inspired by Craiyon.com image generation — Image generated using Craiyon.com

This is the fourth blog post in a series on some of the challenges that I see coming for enterprises, with regards to ML challenges ahead. Feel free to read my previous blog or start from the beginning.

Complete vs Inclusive vs Selective Training Data

We touched on this slightly before: to get a well-performing ML outcome, you need to have accurate and complete data in the sense that it represents all possible scenarios of the reality that the ML model is going to make decisions in. To get to a representative data set, you also have to cover outliers. To give an example: in the case of autonomous driving, you also need to train the model over driving data collected in poorly lit situations, in country roads, with the possibility of wild animals crossing the road, and at various different weather conditions, e.g. driving on icy roads is very different from regular driving. Synthetic data have for the last decade grown as a possible solution to expand sparse data sets to better coverage, but in my opinion mostly successful in very narrow domains where the context is limited.

Another flavor of completeness of data would be to make the data much more inclusive. To continue on the autonomous driving use case (although it should be stated that this absolutely applies to other human-related industries, such as healthcare, education, financial support etc.) complete data should include pedestrian data of all heights, ability-levels, and skin color, to name a few dimensions. If you are not careful in data selection, your model may come out with a narrow view. As an example, a study by Georgia Institute of Technology showed some years ago that sensors and cameras collecting data for self-driving cars were more likely to detect a light-skinned person, meaning that a self-driving vehicle would be less likely to stop before crashing into a person of color. This held true for all the systems they tested, and the accuracy level decreased by 5% for people with darker skin.

In a different context completeness may be a bad thing. Do we really want an autonomous driving ML model to drive as the average driver? Without having the facts, I have often wondered how safe autonomous driving models will be, as the majority of the drivers collecting training data seem to be men in their early twenties who want to earn an extra hourly income for driving around collecting data on behalf of autonomous vehicle companies. At least if you peek into the vehicles as they pass you on Silicon Valley roads. But perhaps they have other kinds of drivers elsewhere in the country? As I said, I don’t have the facts here. Anyhow, the point being: perhaps the right selection of training data would lead to a better driving behavior? Shouldn’t we strive for better than human decision making in life-and-death scenarios? Hence not train off of “average” human behavior? In the above mentioned example it would translate into: more mature drivers and female drivers as this driver segment have been proven statistically to be way better drivers (less accidents) than young male drivers. So being selective on what representative data you want to use could be critical.

Removing Bias

We know that data can be skewed into what you want people to believe. The same is true for ML model training. You only get the behavior you feed it. As an example, a tech company decided to apply ML to their engineering resume screening process. The model was trained on historic engineering recruiting data. The ML model ended up recommending young adult males, with certain college backgrounds. The reason of course being that the company had hired similar candidates in the past. Studies have proven many times that diverse and inclusive teams perform better outcomes, design better products, and create a healthier work environment, as well as higher profitability. The best value for an enterprise would be to have recruiting ML models that decide better than human biased decisions, and hence are able to hold a more diverse outlook on candidates.

Data Compliance and Ownership

So who gets to make the decision on what data to use for training? How does an enterprise assure that those selections are inclusive, unbiased, and aligned with compliance and regulations ahead? Even with the best intentions of better-than-human quality results this can go so very wrong. I see a future where enterprises will rely more and more on ML-decision making and automation, but where there is an increasing risk of built in biases and design flaws not considering profiles of a much more diverse population and skill-level. By now we have all seen the movie ‘A man called Otto’ and how the elderly main character is yelling out of frustration at an unforgiving automatic phone bot. It’s a good example that we need a world where empathy and inclusion are built into ML models. We need easy ways for engineers to bias-test their data. We need test frameworks for model-discrimination testing and future regulations if they arise beyond the EU, where discussions have already begun.

In the long run, maybe we even need an engineering or ML oath - just as for medical doctors - that you can’t do harm with data or ML? That may be taking it one step too far, but I am trying to make a very important point here.

In conclusion

A model is only as good as the data you feed it. A model is only as responsible or considerate as you teach it. We need reliable data. We need inclusive thinking in model design. We need ways to quickly assess bias or fakeness in data. We need to innovate to make it scalable for Enterprises. We all need to strive towards ML being the better version of humankind.