References

Anthropic. (2024). Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol.

Anthropic. (2025). Claude’s Extended Thinking. https://www.anthropic.com/news/visible-extended-thinking.

Bean et al. (2026). Real-world performance of large language models in patient self-assessment of symptoms. Nature Medicine.

Belkin, M., Hsu, D., Ma, S., & Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), 15849–15854. https://doi.org/10.1073/pnas.1903070116

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT), 610–623. https://doi.org/10.1145/3442188.3445922

Bishop, C. M., & Bishop, H. (2024). Deep learning: Foundations and concepts. Springer.

Bolton, E., Venigalla, A., Yasunaga, M., Hall, D., Xiong, B., Lee, T., Daneshjou, R., Frankle, J., Liang, P., Carbin, M., & Manning, C. D. (2024). BioMedLM: A 2.7B Parameter Language Model Trained on Biomedical Text. arXiv Preprint. https://arxiv.org/abs/2403.18421

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., et al. (2021). On the opportunities and risks of foundation models. https://arxiv.org/abs/2108.07258

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language Models are Few-Shot Learners. 33. https://arxiv.org/abs/2005.14165

Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S., et al. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv Preprint. https://arxiv.org/abs/2303.12712

Buuren, S. van, & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03

Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating Multi-label Discrete Patient Records using Generative Adversarial Networks. Proceedings of the 2nd Machine Learning for Healthcare Conference (MLHC). https://arxiv.org/abs/1703.06490

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018

Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A. K., Calvert, M. J., & SPIRIT-AI and CONSORT-AI Working Group. (2020). Guidelines for Clinical Trial Protocols for Interventions Involving Artificial Intelligence: The SPIRIT-AI Extension. Nature Medicine, 26, 1351–1363. https://doi.org/10.1038/s41591-020-1037-7

DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., et al. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv Preprint. https://arxiv.org/abs/2501.12948

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. Advances in Neural Information Processing Systems, 36. https://arxiv.org/abs/2305.14314

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, 4171–4186. https://doi.org/10.18653/v1/N19-1423

Dwork, C., & Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4), 211–487. https://doi.org/10.1561/0400000042

European Parliament and Council. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689.

Evans, B. (2026). AI eats the world. Strategic presentation. https://www.ben-evans.com/presentations

Finlayson, S. G., Subbaswamy, A., Singh, K., Bowers, J., Kupke, A., Zittrain, J., Kohane, I. S., & Saria, S. (2021). The Clinician and Dataset Shift in Artificial Intelligence. New England Journal of Medicine, 385, 283–286. https://doi.org/10.1056/NEJMc2104626

Fleming, S. L., Lozano, A., Haberkorn, W. J., Jindal, J. A., Reis, E. P., Thapa, R., Blankemeier, L., Genkins, J. Z., Steinberg, E., Nayak, A., Patel, B. S., Chiang, C.-C., Callahan, A., Huo, Z., Gatidis, S., Adams, S. J., Fayanju, O., Shah, S. J., Savage, T., … Shah, N. H. (2024). MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records. Proceedings of the AAAI Conference on Artificial Intelligence. https://arxiv.org/abs/2308.14089

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

Gallifant, J., Afshar, M., Ameen, S., Aphinyanaphongs, Y., Chen, S., Cacciamani, G., Demner-Fushman, D., Dligach, D., Daneshjou, R., Fernandes, C., others, & Bitterman, D. S. (2025). The TRIPOD-LLM Reporting Guideline for Studies Using Large Language Models. Nature Medicine, 31, 60–69. https://doi.org/10.1038/s41591-024-03425-5

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for Datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.

Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not What You’ve Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security (AISec), 79–90. https://doi.org/10.1145/3605764.3623985

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring Massive Multitask Language Understanding. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2009.03300

Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., et al. (2022). Training compute-optimal large language models. Advances in Neural Information Processing Systems 35.

Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., Laroussilhe, Q. de, Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP. Proceedings of the 36th International Conference on Machine Learning (ICML). https://arxiv.org/abs/1902.00751

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2022). LoRA: Low-Rank Adaptation of Large Language Models. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2106.09685

Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H., & Szolovits, P. (2021). What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams. Applied Sciences, 11(14), 6421. https://doi.org/10.3390/app11146421

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. https://arxiv.org/abs/2001.08361

Karpathy, A. (2023). NanoGPT: A minimal GPT implementation. GitHub repository. https://github.com/karpathy/nanoGPT

Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 1097–1105.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33. https://arxiv.org/abs/2005.11401

Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J., Denniston, A. K., & SPIRIT-AI and CONSORT-AI Working Group. (2020). Reporting Guidelines for Clinical Trial Reports for Interventions Involving Artificial Intelligence: The CONSORT-AI Extension. Nature Medicine, 26, 1364–1374. https://doi.org/10.1038/s41591-020-1034-x

Luo et al. (2026). Clinical environment simulator: Beyond static benchmarks for AI in healthcare. Nature Medicine.

Ma, J., He, Y., Li, F., Han, L., You, C., & Wang, B. (2024). Segment Anything in Medical Images. Nature Communications, 15, 654. https://doi.org/10.1038/s41467-024-44824-z

Marshall, I. J., & Wallace, B. C. (2019). Toward Systematic Review Automation: A Practical Guide to Using Machine Learning Tools in Research Synthesis. Systematic Reviews, 8, 163. https://doi.org/10.1186/s13643-019-0974-z

McCarthy, J. (1960). Recursive functions of symbolic expressions and their computation by machine, part I. Communications of the ACM, 3(4), 184–195. https://doi.org/10.1145/367177.367199

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115–133. https://doi.org/10.1007/BF02478259

Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to computational geometry. MIT Press.

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*), 220–229. https://doi.org/10.1145/3287560.3287596

Mitchell, M. (2019). Artificial intelligence: A guide for thinking humans. Farrar, Straus; Giroux.

Model Context Protocol Working Group. (2025). Model Context Protocol Specification. https://modelcontextprotocol.io/specification/2025-11-25.

Mollick, E. (2024a). 15 Times to use AI, and 5 Not to. One Useful Thing (Substack). https://www.oneusefulthing.org/p/15-times-to-use-ai-and-5-not-to

Mollick, E. (2024b). Four Singularities for Research. One Useful Thing (Substack). https://www.oneusefulthing.org/p/four-singularities-for-research

Mollick, E. (2024c). Getting started with AI: Good enough prompting. One Useful Thing (Substack). https://www.oneusefulthing.org/p/getting-started-with-ai-good-enough

Mollick, E. (2024d). Latent Expertise: Everyone is in R&D. One Useful Thing (Substack). https://www.oneusefulthing.org/p/latent-expertise-everyone-is-in-r

Mollick, E. (2024e). Thinking Like an AI. One Useful Thing (Substack). https://www.oneusefulthing.org/p/thinking-like-an-ai

Mollick, E. (2025a). An Opinionated Guide to Using AI Right Now. One Useful Thing (Substack). https://www.oneusefulthing.org/p/an-opinionated-guide-to-using-ai

Mollick, E. (2025b). Giving your AI a Job Interview. One Useful Thing (Substack). https://www.oneusefulthing.org/p/giving-your-ai-a-job-interview

Mollick, E. (2025c). On Jagged AGI: o3, Gemini 2.5, and everything after. One Useful Thing (Substack). https://www.oneusefulthing.org/p/on-jagged-agi-o3-gemini-25-and-everything

Mollick, E. (2025d). On Working with Wizards. One Useful Thing (Substack). https://www.oneusefulthing.org/p/on-working-with-wizards

Mollick, E. (2025e). Personality and Persuasion. One Useful Thing (Substack). https://www.oneusefulthing.org/p/personality-and-persuasion

Mollick, E. (2025f). Real AI Agents and Real Work. One Useful Thing (Substack). https://www.oneusefulthing.org/p/real-ai-agents-and-real-work

Mollick, E. (2025g). The Cybernetic Teammate. One Useful Thing (Substack). https://www.oneusefulthing.org/p/the-cybernetic-teammate

Mollick, E. (2025h). The End of Search, The Beginning of Research. One Useful Thing (Substack). https://www.oneusefulthing.org/p/the-end-of-search-the-beginning-of

Mollick, E. (2026a). Claude Dispatch and the Power of Interfaces. One Useful Thing (Substack). https://www.oneusefulthing.org/p/claude-dispatch-and-the-power-of

Mollick, E. (2026b). Management as AI Superpower. One Useful Thing (Substack). https://www.oneusefulthing.org/p/management-as-ai-superpower

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342

OpenAI. (2024). OpenAI o1 System Card. https://cdn.openai.com/o1-system-card-20241205.pdf. https://arxiv.org/abs/2412.16720

OpenAI. (2026). Why SWE-bench Verified no longer measures frontier coding capabilities. Technical post.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35.

Qian, Z., Davis, R., & Schaar, M. van der. (2023). Synthcity: A Benchmark Framework for Diverse Use Cases of Tabular Synthetic Data. Advances in Neural Information Processing Systems Datasets and Benchmarks Track, 36. https://arxiv.org/abs/2301.07573

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI technical report.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI technical report.

Raschka, S. (2026a). A dream of spring for open-weight LLMs: 10 architectures from Jan-Feb 2026. Sebastian Raschka Magazine. https://magazine.sebastianraschka.com/p/a-dream-of-spring-for-open-weight

Raschka, S. (2026b). A visual guide to attention variants in modern LLMs. Sebastian Raschka Magazine. https://magazine.sebastianraschka.com/p/visual-attention-variants

Raschka, S. (2026c). Components of a coding agent. Sebastian Raschka Magazine. https://magazine.sebastianraschka.com/p/components-of-a-coding-agent

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408. https://doi.org/10.1037/h0042519

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. https://doi.org/10.1038/323533a0

Russell, S., & Norvig, P. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson.

Salvi, F., Ribeiro, M. H., Gallotti, R., & West, R. (2025). On the conversational persuasiveness of large language models: A randomized controlled trial. Nature Human Behaviour, 9(8), 1645–1653. https://doi.org/10.1038/s41562-025-02194-6

Schaeffer, R., Miranda, B., & Koyejo, S. (2023). Are emergent abilities of large language models a mirage? Advances in Neural Information Processing Systems 36.

Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. Advances in Neural Information Processing Systems, 36. https://arxiv.org/abs/2302.04761

Shahul, E., James, J., Espinosa-Anke, L., & Schockaert, S. (2024). RAGAS: Automated Evaluation of Retrieval Augmented Generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (EACL). https://aclanthology.org/2024.eacl-demo.16/

Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P., Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Schärli, N., Chowdhery, A., Mansfield, P., Demner-Fushman, D., … Natarajan, V. (2023). Large Language Models Encode Clinical Knowledge. Nature, 620, 172–180. https://doi.org/10.1038/s41586-023-06291-2

Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L., Clark, K., Pfohl, S., Cole-Lewis, H., Neal, D., Schaekermann, M., Wang, A., Amin, M., Lachgar, S., Mansfield, P., Prakash, S., Green, B., Dominowska, E., Agüera y Arcas, B., … Natarajan, V. (2025). Toward Expert-Level Medical Question Answering with Large Language Models. Nature Medicine, 31, 943–950. https://doi.org/10.1038/s41591-024-03423-7

Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29, 1930–1940. https://doi.org/10.1038/s41591-023-02448-8

Tierney, A. A., Gayre, G., Hoberman, B., Mattern, B., Ballesca, M., Kipnis, P., Liu, V., & Lee, K. (2024). Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation. NEJM Catalyst Innovations in Care Delivery, 5(3). https://doi.org/10.1056/CAT.23.0404

Tu, T., Palepu, A., Schaekermann, M., Saab, K., Freyberg, J., Tanno, R., Wang, A., Li, B., Amin, M., Tomasev, N., Azizi, S., Singhal, K., et al. (2025). Towards Conversational Diagnostic Artificial Intelligence. Nature, 642, 442–450. https://doi.org/10.1038/s41586-025-08866-7

U.S. Food and Drug Administration. (2024). Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/marketing-submission-recommendations-predetermined-change-control-plan-artificial-intelligence.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. 30. https://arxiv.org/abs/1706.03762

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E. H., Le, Q. V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. 35. https://arxiv.org/abs/2201.11903

Xiong, G., Jin, Q., Lu, Z., & Zhang, A. (2024). Benchmarking Retrieval-Augmented Generation for Medicine. Findings of the Association for Computational Linguistics: ACL 2024. https://aclanthology.org/2024.findings-acl.372/

Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling Tabular Data using Conditional GAN. Advances in Neural Information Processing Systems, 32. https://arxiv.org/abs/1907.00503

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2210.03629

Zakka, C., Shad, R., Chaurasia, A., Dalal, A. R., Kim, J. L., Moor, M., Fong, R., Phillips, C., Alexander, K., Ashley, E., Boyd, J., Boyd, K., et al. (2024). Almanac — Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI, 1(2). https://doi.org/10.1056/AIoa2300068

Zhang, S., Xu, Y., Usuyama, N., Xu, H., Bagga, J., Tinn, R., Preston, S., Rao, R., Wei, M., Valluri, N., Wong, C., Tupini, A., Wang, Y., Mazzola, M., Shukla, S., Liden, L., Gao, J., Lungren, M. P., Naumann, T., … Poon, H. (2025). BiomedCLIP: A Multimodal Biomedical Foundation Model Pretrained from Fifteen Million Scientific Image-Text Pairs. NEJM AI. https://arxiv.org/abs/2303.00915

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems Datasets and Benchmarks Track, 36. https://arxiv.org/abs/2306.05685

Zhou, Y., Chia, M. A., Wagner, S. K., Ayhan, M. S., Williamson, D. J., Struyven, R. R., Liu, T., Xu, M., Lozano, M. G., Woodward-Court, P., et al. (2023). A Foundation Model for Generalizable Disease Detection from Retinal Images. Nature, 622, 156–163. https://doi.org/10.1038/s41586-023-06555-x