References

Anthropic. (2024). Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol.
Anthropic. (2025). Claude’s Extended Thinking. https://www.anthropic.com/news/visible-extended-thinking.
Bean et al. (2026). Real-world performance of large language models in patient self-assessment of symptoms. Nature Medicine.
Belkin, M., Hsu, D., Ma, S., & Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), 15849–15854. https://doi.org/10.1073/pnas.1903070116
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 610–623. https://doi.org/10.1145/3442188.3445922
Bishop, C. M., & Bishop, H. (2024). Deep learning: Foundations and concepts. Springer.
Bolton, E., Venigalla, A., Yasunaga, M., Hall, D., Xiong, B., Lee, T., Daneshjou, R., Frankle, J., Liang, P., Carbin, M., & Manning, C. D. (2024). BioMedLM: A 2.7B Parameter Language Model Trained on Biomedical Text. arXiv Preprint. https://arxiv.org/abs/2403.18421
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., et al. (2021). On the opportunities and risks of foundation models. https://arxiv.org/abs/2108.07258
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language Models are Few-Shot Learners. 33. https://arxiv.org/abs/2005.14165
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., et al. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv Preprint. https://arxiv.org/abs/2303.12712
Buuren, S. van, & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating Multi-label Discrete Patient Records using Generative Adversarial Networks. Proceedings of the 2nd Machine Learning for Healthcare Conference (MLHC). https://arxiv.org/abs/1703.06490
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A. K., Calvert, M. J., & SPIRIT-AI and CONSORT-AI Working Group. (2020). Guidelines for Clinical Trial Protocols for Interventions Involving Artificial Intelligence: The SPIRIT-AI Extension. Nature Medicine, 26, 1351–1363. https://doi.org/10.1038/s41591-020-1037-7
DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., et al. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv Preprint. https://arxiv.org/abs/2501.12948
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. Advances in Neural Information Processing Systems, 36. https://arxiv.org/abs/2305.14314
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://doi.org/10.18653/v1/N19-1423
Dwork, C., & Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4), 211–487. https://doi.org/10.1561/0400000042
European Parliament and Council. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689.
Evans, B. (2026). AI eats the world. Strategic presentation. https://www.ben-evans.com/presentations
Finlayson, S. G., Subbaswamy, A., Singh, K., Bowers, J., Kupke, A., Zittrain, J., Kohane, I. S., & Saria, S. (2021). The Clinician and Dataset Shift in Artificial Intelligence. New England Journal of Medicine, 385, 283–286. https://doi.org/10.1056/NEJMc2104626
Fleming, S. L., Lozano, A., Haberkorn, W. J., Jindal, J. A., Reis, E. P., Thapa, R., Blankemeier, L., Genkins, J. Z., Steinberg, E., Nayak, A., Patel, B. S., Chiang, C.-C., Callahan, A., Huo, Z., Gatidis, S., Adams, S. J., Fayanju, O., Shah, S. J., Savage, T., … Shah, N. H. (2024). MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records. Proceedings of the AAAI Conference on Artificial Intelligence. https://arxiv.org/abs/2308.14089
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
Gallifant, J., Afshar, M., Ameen, S., Aphinyanaphongs, Y., Chen, S., Cacciamani, G., Demner-Fushman, D., Dligach, D., Daneshjou, R., Fernandes, C., others, & Bitterman, D. S. (2025). The TRIPOD-LLM Reporting Guideline for Studies Using Large Language Models. Nature Medicine, 31, 60–69. https://doi.org/10.1038/s41591-024-03425-5
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for Datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec), 79–90. https://doi.org/10.1145/3605764.3623985
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2009.03300
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., et al. (2022). Training compute-optimal large language models. Advances in Neural Information Processing Systems 35.
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., Laroussilhe, Q. de, Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP. Proceedings of the 36th International Conference on Machine Learning (ICML). https://arxiv.org/abs/1902.00751
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2022). LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2106.09685
Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H., & Szolovits, P. (2021). What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams. Applied Sciences, 11(14), 6421. https://doi.org/10.3390/app11146421
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. https://arxiv.org/abs/2001.08361
Karpathy, A. (2023). NanoGPT: A minimal GPT implementation. GitHub repository. https://github.com/karpathy/nanoGPT
Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 1097–1105.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33. https://arxiv.org/abs/2005.11401
Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J., Denniston, A. K., & SPIRIT-AI and CONSORT-AI Working Group. (2020). Reporting Guidelines for Clinical Trial Reports for Interventions Involving Artificial Intelligence: The CONSORT-AI Extension. Nature Medicine, 26, 1364–1374. https://doi.org/10.1038/s41591-020-1034-x
Luo et al. (2026). Clinical environment simulator: Beyond static benchmarks for AI in healthcare. Nature Medicine.
Ma, J., He, Y., Li, F., Han, L., You, C., & Wang, B. (2024). Segment Anything in Medical Images. Nature Communications, 15, 654. https://doi.org/10.1038/s41467-024-44824-z
Marshall, I. J., & Wallace, B. C. (2019). Toward Systematic Review Automation: A Practical Guide to Using Machine Learning Tools in Research Synthesis. Systematic Reviews, 8, 163. https://doi.org/10.1186/s13643-019-0974-z
McCarthy, J. (1960). Recursive functions of symbolic expressions and their computation by machine, part I. Communications of the ACM, 3(4), 184–195. https://doi.org/10.1145/367177.367199
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133. https://doi.org/10.1007/BF02478259
Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computational geometry. MIT Press.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*), 220–229. https://doi.org/10.1145/3287560.3287596
Mitchell, M. (2019). Artificial intelligence: A guide for thinking humans. Farrar, Straus; Giroux.
Model Context Protocol Working Group. (2025). Model Context Protocol Specification. https://modelcontextprotocol.io/specification/2025-11-25.
Mollick, E. (2024a). 15 Times to use AI, and 5 Not to. One Useful Thing (Substack). https://www.oneusefulthing.org/p/15-times-to-use-ai-and-5-not-to
Mollick, E. (2024b). Four Singularities for Research. One Useful Thing (Substack). https://www.oneusefulthing.org/p/four-singularities-for-research
Mollick, E. (2024c). Getting started with AI: Good enough prompting. One Useful Thing (Substack). https://www.oneusefulthing.org/p/getting-started-with-ai-good-enough
Mollick, E. (2024d). Latent Expertise: Everyone is in R&D. One Useful Thing (Substack). https://www.oneusefulthing.org/p/latent-expertise-everyone-is-in-r
Mollick, E. (2024e). Thinking Like an AI. One Useful Thing (Substack). https://www.oneusefulthing.org/p/thinking-like-an-ai
Mollick, E. (2025a). An Opinionated Guide to Using AI Right Now. One Useful Thing (Substack). https://www.oneusefulthing.org/p/an-opinionated-guide-to-using-ai
Mollick, E. (2025b). Giving your AI a Job Interview. One Useful Thing (Substack). https://www.oneusefulthing.org/p/giving-your-ai-a-job-interview
Mollick, E. (2025c). On Jagged AGI: o3, Gemini 2.5, and everything after. One Useful Thing (Substack). https://www.oneusefulthing.org/p/on-jagged-agi-o3-gemini-25-and-everything
Mollick, E. (2025d). On Working with Wizards. One Useful Thing (Substack). https://www.oneusefulthing.org/p/on-working-with-wizards
Mollick, E. (2025e). Personality and Persuasion. One Useful Thing (Substack). https://www.oneusefulthing.org/p/personality-and-persuasion
Mollick, E. (2025f). Real AI Agents and Real Work. One Useful Thing (Substack). https://www.oneusefulthing.org/p/real-ai-agents-and-real-work
Mollick, E. (2025g). The Cybernetic Teammate. One Useful Thing (Substack). https://www.oneusefulthing.org/p/the-cybernetic-teammate
Mollick, E. (2025h). The End of Search, The Beginning of Research. One Useful Thing (Substack). https://www.oneusefulthing.org/p/the-end-of-search-the-beginning-of
Mollick, E. (2026a). Claude Dispatch and the Power of Interfaces. One Useful Thing (Substack). https://www.oneusefulthing.org/p/claude-dispatch-and-the-power-of
Mollick, E. (2026b). Management as AI Superpower. One Useful Thing (Substack). https://www.oneusefulthing.org/p/management-as-ai-superpower
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342
OpenAI. (2024). OpenAI o1 System Card. https://cdn.openai.com/o1-system-card-20241205.pdf. https://arxiv.org/abs/2412.16720
OpenAI. (2026). Why SWE-bench Verified no longer measures frontier coding capabilities. Technical post.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35.
Qian, Z., Davis, R., & Schaar, M. van der. (2023). Synthcity: A Benchmark Framework for Diverse Use Cases of Tabular Synthetic Data. Advances in Neural Information Processing Systems Datasets and Benchmarks Track, 36. https://arxiv.org/abs/2301.07573
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI technical report.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI technical report.
Raschka, S. (2026a). A dream of spring for open-weight LLMs: 10 architectures from Jan-Feb 2026. Sebastian Raschka Magazine. https://magazine.sebastianraschka.com/p/a-dream-of-spring-for-open-weight
Raschka, S. (2026b). A visual guide to attention variants in modern LLMs. Sebastian Raschka Magazine. https://magazine.sebastianraschka.com/p/visual-attention-variants
Raschka, S. (2026c). Components of a coding agent. Sebastian Raschka Magazine. https://magazine.sebastianraschka.com/p/components-of-a-coding-agent
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408. https://doi.org/10.1037/h0042519
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. https://doi.org/10.1038/323533a0
Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson.
Salvi, F., Ribeiro, M. H., Gallotti, R., & West, R. (2025). On the conversational persuasiveness of large language models: A randomized controlled trial. Nature Human Behaviour, 9(8), 1645–1653. https://doi.org/10.1038/s41562-025-02194-6
Schaeffer, R., Miranda, B., & Koyejo, S. (2023). Are emergent abilities of large language models a mirage? Advances in Neural Information Processing Systems 36.
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. Advances in Neural Information Processing Systems, 36. https://arxiv.org/abs/2302.04761
Shahul, E., James, J., Espinosa-Anke, L., & Schockaert, S. (2024). RAGAS: Automated Evaluation of Retrieval Augmented Generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (EACL). https://aclanthology.org/2024.eacl-demo.16/
Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Schärli, N., Chowdhery, A., Mansfield, P., Demner-Fushman, D., … Natarajan, V. (2023). Large Language Models Encode Clinical Knowledge. Nature, 620, 172–180. https://doi.org/10.1038/s41586-023-06291-2
Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L., Clark, K., Pfohl, S., Cole-Lewis, H., Neal, D., Schaekermann, M., Wang, A., Amin, M., Lachgar, S., Mansfield, P., Prakash, S., Green, B., Dominowska, E., Agüera y Arcas, B., … Natarajan, V. (2025). Toward Expert-Level Medical Question Answering with Large Language Models. Nature Medicine, 31, 943–950. https://doi.org/10.1038/s41591-024-03423-7
Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29, 1930–1940. https://doi.org/10.1038/s41591-023-02448-8
Tierney, A. A., Gayre, G., Hoberman, B., Mattern, B., Ballesca, M., Kipnis, P., Liu, V., & Lee, K. (2024). Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation. NEJM Catalyst Innovations in Care Delivery, 5(3). https://doi.org/10.1056/CAT.23.0404
Tu, T., Palepu, A., Schaekermann, M., Saab, K., Freyberg, J., Tanno, R., Wang, A., Li, B., Amin, M., Tomasev, N., Azizi, S., Singhal, K., et al. (2025). Towards Conversational Diagnostic Artificial Intelligence. Nature, 642, 442–450. https://doi.org/10.1038/s41586-025-08866-7
U.S. Food and Drug Administration. (2024). Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/marketing-submission-recommendations-predetermined-change-control-plan-artificial-intelligence.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. 30. https://arxiv.org/abs/1706.03762
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. 35. https://arxiv.org/abs/2201.11903
Xiong, G., Jin, Q., Lu, Z., & Zhang, A. (2024). Benchmarking Retrieval-Augmented Generation for Medicine. Findings of the Association for Computational Linguistics: ACL 2024. https://aclanthology.org/2024.findings-acl.372/
Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling Tabular Data using Conditional GAN. Advances in Neural Information Processing Systems, 32. https://arxiv.org/abs/1907.00503
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2210.03629
Zakka, C., Shad, R., Chaurasia, A., Dalal, A. R., Kim, J. L., Moor, M., Fong, R., Phillips, C., Alexander, K., Ashley, E., Boyd, J., Boyd, K., et al. (2024). Almanac — Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI, 1(2). https://doi.org/10.1056/AIoa2300068
Zhang, S., Xu, Y., Usuyama, N., Xu, H., Bagga, J., Tinn, R., Preston, S., Rao, R., Wei, M., Valluri, N., Wong, C., Tupini, A., Wang, Y., Mazzola, M., Shukla, S., Liden, L., Gao, J., Lungren, M. P., Naumann, T., … Poon, H. (2025). BiomedCLIP: A Multimodal Biomedical Foundation Model Pretrained from Fifteen Million Scientific Image-Text Pairs. NEJM AI. https://arxiv.org/abs/2303.00915
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems Datasets and Benchmarks Track, 36. https://arxiv.org/abs/2306.05685
Zhou, Y., Chia, M. A., Wagner, S. K., Ayhan, M. S., Williamson, D. J., Struyven, R. R., Liu, T., Xu, M., Lozano, M. G., Woodward-Court, P., et al. (2023). A Foundation Model for Generalizable Disease Detection from Retinal Images. Nature, 622, 156–163. https://doi.org/10.1038/s41586-023-06555-x