what-is-automl

What is AutoML, and how can it be applied in Life Sciences?

First published on customer THINK by Benedict Timmerman

Automated machine learning (AutoML) refers to creating a process of applying machine learning to real-world problems, automating the time-consuming and iterative tasks of model development, and accelerating time to knowledge discovery. The primary goal behind the use of AutoML is to achieve a degree of automation in processes that would allow anyone to apply machine learning models and techniques even if they do not have any expertise in machine learning or coding. That does not mean it can’t be used by an experienced data scientist; AutoML can make it easier and faster for them to do their work, enabling them to accurately scale their efforts while focusing on what matters, knowledge discovery. This last caveat is important when it comes to life scientists. For example, an oncologist working with gene data and cancer samples needs to know which biomarkers collectively lead to optimal predictions, and understand their role in the analysis and predictive outcome.

There is no doubt that in today’s life sciences environment, there is an overabundance of data. However, while most fields and sectors are very adept at collecting data they have very few options to utilize this data effectively. This usually involves trained bioinformaticians, data scientists, machine learning experts in general.

AutoML allows researchers, oncologists, agronomists, and other life scientists, without ML or coding expertise, to build machine learning models accurately and efficiently while ensuring productivity and sustaining excellence in the model quality. With AutoML, learning is faster, but the time it takes to boost productivity is also shortened, resulting in greater efficiency. While artificial intelligence, and machine learning specifically, are tools for those who are in the know, AutoML permits novices to utilize machine learning techniques and models without coding expertise.

AutoML and Life Scientists

Healthcare and biomedical companies collect an enormous amount of data, which continues to grow at an exponential rate. Numerous scientists are conducting a variety of investigations collecting raw data. Unfortunately, much of the data has never been analyzed or tested; no one knows what insights these datasets hold, what they reflect, or what they mean, and yet, it is continuously collected. Some may be used for publishing research work in scientific journals, but much of it remains disjointed and without a conclusion. This is where AutoML comes to increase productivity, shorten time and costs. A model that might have taken 6 tedious months to be produced can now take weeks. That means life scientists can create a predictive model themselves, validate it, and focus on what they’re good at… unlocking valuable insights from data. Whether that leads to a new drug, treatment, or simply the understanding of a disease, it is the key objective of doing all this work.

Until recently, machine learning was only limited to the academic environment and benefitted only a handful of researchers with the required coding knowledge. This has changed in recent years for several reasons:

  • The overall cost for storage, access, and data analysis has dropped, which has permitted wider use
  • Modern technology like cloud computing and artificial intelligence application have permitted wider access across geographical borders
  • Many industries are embracing smart solutions to assess data in order to be successful. As more industries embrace the technology, the cost of machine learning has started to drop

In summary, it is the combination of economic factors and technological development that have shifted AutoML into mainstream science and broader usage.

How does AutoML work?

Obviously, each AutoML tool works differently but the basic steps outlined below are common for most. The purpose is to automate the laborious work of finding the best possible pipeline, algorithms, and tuning the necessary parameters:

1. The user starts by generating all their study data. That could either be data that has been processed and normalized by a bioinformatician, or a public dataset available in the known data repositories. The user can upload that curated dataset to the AutoML app in a .csv or other delimited file formats.
2. Next step is to select the desired predictive method and outcome. The user may select Classification if the outcome to predict is binary e.g. case-control, categorical, or different cancer types, or they may select Regression if the outcome is continuous e.g. a viral load. There’s also the option of Survival Analysis if the outcome is the time to an event e.g. death, relapse, etc. As for the performance metric, you want to optimize for, it could be AUC, Accuracy, F1 Score, R2, Concordance Index, to name a few.
3. That’s about it…the user can watch the tool perform the analysis. Most analyses complete in a couple of hours, and the model comes with a wide array of visuals and reports that can help the user interpret their model.
4. Last step involves using the model; either to validate it against labeled samples, or predict outcomes for unlabeled samples, or explore what-if scenarios (manually entered predictor values). In most cases, the user can just download the model to make predictions offline.

You might also be interested in signing up for JADBio Newsletter. Stay up-to-date with the latest news about ML, AutoML & DataScience…. and get some tips on how you can become a #JADai yourself.
May the Data Force be With You!

Will AutoML replace Data Researchers?

The answer is no. Just like computers have not supplanted mathematicians, it is unlikely that AutoML will replace data scientists. Automation in science has not led to the replacement of any researchers or scientists; in fact, there is so much more data today that more scientists are needed to create additional ML models. AutoML frees the researcher to focus on exploring other data, solving analytical problems, and interpreting results.

Conclusion

AutoML is another method for analyzing data and solving real-world problems. The final task is for it to create a reproducible model based on robust algorithms, feature selection, and model prediction with greater accuracy. Even though AutoML is relatively new, it has shown tremendous promise in the scientific arena providing solutions and explanations of complex data with numerous unrelated parameters. For life scientists, it allows each and every one of them to work with machine learning and gather insights from an exponentially growing amount of data. Something that could lead humanity faster towards new treatments and discoveries.

AutoML as a Service for Life Scientists

JADBio Specific to biomedical or multi-omics data. They also have an API that can analyze medical images/scans. They offer a FREE Basic plan for unlimited use.

Imagia Works with scans and medical images. Their AI platform provides insights on hospital-wide data sets.

A simple search will yield several more solutions, most of them for generic machine learning applications, like business insights, and many of them demanding some coding skills. Obviously, it’s up to the user and their deeper knowledge of ML and/or coding to utilize any of the platforms to fit their needs. Some available solutions are open-sourced but need more work, and definitely don’t fit the “Auto” bill in AutoML. For novices, it’s better if they keep it simple and stick with a completely automated solution.


Benedict Timmerman is a Senior IT Experience Analyst supporting Digital Giraffe’s clients operating within the AI industry. Benedict covers data and machine learning solutions, providing quantitative and qualitative analysis on the available practices, people and markets. Benedict also spearheads the company’s lead generation process for its clients designing outreach campaigns.