Machine learning (ML) is a technique garnering increasing popularity given its base within technology and promise of addressing complex issues. This popularity is apparent when reviewing publications over the previous 30 years. For example, in 1985, one can find four publications in PubMed. Over the last two years, the number of publications addressing ML has increased to over eighteen thousand (see Figure 1).
Figure 1
Introduction to AI and Machine Learning
Choi and colleagues (2020) describe one of the earliest propositions of machine learning in 1956. At that time, computer scientists thought that humans would have, at some point, the ability to mimic the intellectual tasks only previously described in humans. Artificial intelligence (AI) is a term commonly used when describing machine learning but is limited to simple issues with pre- defined factors (Choi et al., 2020). Machine learning falls within a deeper level of AI, using
high-level pattern recognition for prediction and identification. The focus of ML is the development and utilization of algorithms from a data set and can fall within four methods. These ML methods include unsupervised, supervised, semi-supervised, and reinforcement learning.
Supervised learning takes pre-identified factors to identify patterns within a training data set. Like when you ask a child the color of various objects, supervised learning algorithms take in large amounts of data to predict an outcome (Choi et al., 2020). As the number of input increases, the algorithm hopefully becomes more precise. This developed algorithm continues to learn with a validation data set, identifying the relationship between a feature (e.g., red) and target (e.g., apple). After the model has learned the association, evaluation occurs with a test data set to determine how well it can predict the feature and outcome. |
|
Unsupervised learning identifies patterns within a dataset without any assistance from humans. An example of this is clustering which categorizes instances into groups based on features (Choi et al., 2020). Semi-supervised learning is a combination of supervised and unsupervised; most helpful in
circumstances where imaging is involved (Choi et al., 2020)
The final category of machine learning is reinforcement learning, in which the model is allowed to learn and "play" to reach an outcome. Its application in healthcare and research is limited but appears the closest to the human mind and has the potential greatly influence the future (Choi et al., 2020).
Machine learning is tackling large problems…
One of the advantages of machine learning is the sheer number of data points that computer scientists and bioinformaticists can evaluate quickly. However, the human mind is not only limited by a storage capacity but can be influenced by bias as well. For example, there can remember an argument between two brothers during childhood very differently years later. As humans, we are constantly evaluating data coming in and influencing it with values.
Many issues the human society face is due to a large number of factors at play and the limitation of the human mind. As we (humans) have made more discoveries, more questions arise. We discovered DNA and mapped the human genome, but that is only the start. As the problems become more complicated, the limitations of the human mind become more apparent. This does not mean that the human mind is obsolete and that AI is the answer; it just means we need to supplement our knowledge with this new tool.
For example, since scientists first mapped the human genome, laboratories worldwide have painstakingly worked to identify the proteins coded by the genome and their structure.
Determining how a protein is structured (or folded) can assist in identifying new diseases, treating rare conditions, creating enzymes to break down plastics, and generating new hypotheses. Many methods and hours of work from many people have yielded more than 100,000 human protein structures (Jumper et al., 2021). However, there are still billions of known protein sequences without known structures despite these efforts. AlphaFold is an AI system created by Deepmind that reviewed the structure and sequence of over 100,000 proteins. AlphaFold (embedded link: https://www.deepmind.com/research/highlighted-research/alphafold) uses an approach called a neural network-based model (similar to that of the human brain) to
predict with atomic accuracy the structure of proteins (extensively outperforming many of the commonly used methods) (Jumper et al., 2021). The potential of this one AI system far outstretches what we can even begin to conceive.
Limitations
The basis of ML and AI is the utilization of large datasets. However, this basis is limited in instances of rare diseases as it will take some time to generate a large enough data set for ML application (Choi et al., 2020). Additionally, DL is not immune to bias. As humans input the pre- defined factors (in supervised learning) and evaluate the output, bias is possible. The data itself is also subject to poor quality or error.
Overall, the use of ML and AI within research and medicine will continue to grow. Machine learning has the potential to change how we approach research, hypothesize questions, and even interact with patients. This offers much promise and excitement but caution as well.
Jace Johnny Ph.D. Student
University of Utah College of Nursing
References
Choi, R. Y., Coyner, A. S., Kalpathy-Cramer, J., Chiang, M. F., & Campbell, J. P. (2020). Introduction to Machine Learning, Neural Networks, and Deep Learning. Translational vision science & technology, 9(2), 14. https://doi.org/10.1167/tvst.9.2.14
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., … Hassabis, D.. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2
Tunyasuvunakool, K., Adler, J., Wu, Z., Green, T., Zielinski, M., Žídek, A., Bridgland, A., Cowie, A., Meyer, C., Laydon, A., Velankar, S., Kleywegt, G. J., Bateman, A., Evans, R., Pritzel, A., Figurnov, M., Ronneberger, O., Bates, R., Kohl, S. A. A., … Hassabis, D.. (2021). Highly accurate protein structure prediction for the human proteome. Nature, 596(7873), 590–
596. https://doi.org/10.1038/s41586-021-03828-1