References
Anthropic. (2024). Introducing the Model
Context Protocol. https://www.anthropic.com/news/model-context-protocol.
Anthropic. (2025). Claude’s Extended
Thinking. https://www.anthropic.com/news/visible-extended-thinking.
Bean et al. (2026). Real-world performance
of large language models in patient self-assessment of symptoms.
Nature Medicine.
Belkin, M., Hsu, D., Ma, S., & Mandal, S. (2019). Reconciling modern
machine-learning practice and the classical bias–variance trade-off.
Proceedings of the National Academy of Sciences,
116(32), 15849–15854. https://doi.org/10.1073/pnas.1903070116
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S.
(2021). On the Dangers of Stochastic
Parrots: Can Language Models Be Too Big?
Proceedings of the 2021 ACM Conference on Fairness, Accountability,
and Transparency (FAccT), 610–623. https://doi.org/10.1145/3442188.3445922
Bishop, C. M., & Bishop, H. (2024). Deep learning: Foundations
and concepts. Springer.
Bolton, E., Venigalla, A., Yasunaga, M., Hall, D., Xiong, B., Lee, T.,
Daneshjou, R., Frankle, J., Liang, P., Carbin, M., & Manning, C. D.
(2024). BioMedLM: A 2.7B Parameter
Language Model Trained on Biomedical Text. arXiv
Preprint. https://arxiv.org/abs/2403.18421
Bommasani, R., Hudson, D. A., Adeli, E., Altman,
R., Arora, S., et al. (2021). On the opportunities and risks
of foundation models. https://arxiv.org/abs/2108.07258
Breiman, L. (2001). Random forests. Machine Learning,
45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984).
Classification and regression trees. Wadsworth.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M.,
Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., et al. (2020). Language
Models are Few-Shot Learners. 33. https://arxiv.org/abs/2005.14165
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke,
J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S.,
et al. (2023). Sparks of Artificial General
Intelligence: Early experiments with
GPT-4. arXiv Preprint. https://arxiv.org/abs/2303.12712
Buuren, S. van, & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation
by Chained Equations in R. Journal of Statistical
Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03
Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J.
(2017). Generating Multi-label Discrete Patient
Records using Generative Adversarial Networks. Proceedings of
the 2nd Machine Learning for Healthcare Conference (MLHC). https://arxiv.org/abs/1703.06490
Cortes, C., & Vapnik, V. (1995). Support-vector networks.
Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A. K., Calvert, M. J.,
& SPIRIT-AI and CONSORT-AI Working Group. (2020). Guidelines for Clinical Trial Protocols for Interventions
Involving Artificial Intelligence: The SPIRIT-AI
Extension. Nature Medicine, 26, 1351–1363. https://doi.org/10.1038/s41591-020-1037-7
DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song,
J., Zhang, R., Xu, R., et al. (2025). DeepSeek-R1:
Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning. arXiv Preprint. https://arxiv.org/abs/2501.12948
Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023).
QLoRA: Efficient Finetuning of
Quantized LLMs. Advances in Neural Information Processing
Systems, 36. https://arxiv.org/abs/2305.14314
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019).
BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding.
Proceedings of NAACL-HLT 2019, 4171–4186. https://doi.org/10.18653/v1/N19-1423
Dwork, C., & Roth, A. (2014). The Algorithmic
Foundations of Differential Privacy. Foundations and Trends
in Theoretical Computer Science, 9(3–4), 211–487. https://doi.org/10.1561/0400000042
European Parliament and Council. (2024). Regulation (EU) 2024/1689 of the European Parliament and
of the Council Laying Down Harmonised Rules on Artificial Intelligence
(Artificial Intelligence Act). https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689.
Evans, B. (2026). AI eats the world. Strategic
presentation. https://www.ben-evans.com/presentations
Finlayson, S. G., Subbaswamy, A., Singh, K., Bowers, J., Kupke, A.,
Zittrain, J., Kohane, I. S., & Saria, S. (2021). The Clinician and Dataset Shift in Artificial
Intelligence. New England Journal of Medicine,
385, 283–286. https://doi.org/10.1056/NEJMc2104626
Fleming, S. L., Lozano, A., Haberkorn, W. J., Jindal, J. A., Reis, E.
P., Thapa, R., Blankemeier, L., Genkins, J. Z., Steinberg, E., Nayak,
A., Patel, B. S., Chiang, C.-C., Callahan, A., Huo, Z., Gatidis, S.,
Adams, S. J., Fayanju, O., Shah, S. J., Savage, T., … Shah, N. H.
(2024). MedAlign: A
Clinician-Generated Dataset for Instruction Following with Electronic
Medical Records. Proceedings of the AAAI Conference on
Artificial Intelligence. https://arxiv.org/abs/2308.14089
Friedman, J. H. (2001). Greedy function approximation: A gradient
boosting machine. The Annals of Statistics, 29(5),
1189–1232. https://doi.org/10.1214/aos/1013203451
Gallifant, J., Afshar, M., Ameen, S.,
Aphinyanaphongs, Y., Chen, S., Cacciamani, G., Demner-Fushman, D.,
Dligach, D., Daneshjou, R., Fernandes, C., others, & Bitterman, D.
S. (2025). The TRIPOD-LLM Reporting
Guideline for Studies Using Large Language Models. Nature
Medicine, 31, 60–69. https://doi.org/10.1038/s41591-024-03425-5
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H.,
Daumé III, H., & Crawford, K. (2021). Datasheets for Datasets. Communications of the
ACM, 64(12), 86–92. https://doi.org/10.1145/3458723
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep
learning. MIT Press.
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., &
Fritz, M. (2023). Not What You’ve Signed Up
For: Compromising Real-World LLM-Integrated
Applications with Indirect Prompt Injection. Proceedings of
the 16th ACM Workshop on Artificial Intelligence and Security
(AISec), 79–90. https://doi.org/10.1145/3605764.3623985
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements
of statistical learning: Data mining, inference, and prediction
(2nd ed.). Springer.
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D.,
& Steinhardt, J. (2021). Measuring Massive Multitask Language
Understanding. International Conference on Learning
Representations (ICLR). https://arxiv.org/abs/2009.03300
Hoffmann, J., Borgeaud, S., Mensch, A.,
Buchatskaya, E., Cai, T., Rutherford, E., et al. (2022). Training
compute-optimal large language models. Advances in Neural
Information Processing Systems 35.
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., Laroussilhe, Q.
de, Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP.
Proceedings of the 36th International Conference on Machine Learning
(ICML). https://arxiv.org/abs/1902.00751
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang,
L., & Chen, W. (2022). LoRA: Low-Rank Adaptation of Large Language Models.
International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2106.09685
Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H., & Szolovits,
P. (2021). What Disease Does This Patient Have? A
Large-Scale Open Domain Question Answering Dataset from Medical
Exams. Applied Sciences, 11(14), 6421. https://doi.org/10.3390/app11146421
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B.,
Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020).
Scaling laws for neural language models. https://arxiv.org/abs/2001.08361
Karpathy, A. (2023). NanoGPT: A minimal
GPT implementation. GitHub repository. https://github.com/karpathy/nanoGPT
Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen,
D., & Yih, W. (2020). Dense Passage Retrieval
for Open-Domain Question Answering. Proceedings of the 2020
Conference on Empirical Methods in Natural Language Processing
(EMNLP), 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012).
ImageNet classification with deep convolutional neural
networks. Advances in Neural Information Processing Systems 25,
1097–1105.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning.
Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N.,
Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., &
Kiela, D. (2020). Retrieval-Augmented Generation
for Knowledge-Intensive NLP Tasks. Advances in Neural
Information Processing Systems, 33. https://arxiv.org/abs/2005.11401
Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J., Denniston, A. K.,
& SPIRIT-AI and CONSORT-AI Working Group. (2020). Reporting Guidelines for Clinical Trial Reports for
Interventions Involving Artificial Intelligence: The
CONSORT-AI Extension. Nature Medicine, 26,
1364–1374. https://doi.org/10.1038/s41591-020-1034-x
Luo et al. (2026). Clinical environment
simulator: Beyond static benchmarks for AI in healthcare.
Nature Medicine.
Ma, J., He, Y., Li, F., Han, L., You, C., & Wang, B. (2024). Segment Anything in Medical Images. Nature
Communications, 15, 654. https://doi.org/10.1038/s41467-024-44824-z
Marshall, I. J., & Wallace, B. C. (2019). Toward Systematic
Review Automation: A Practical Guide to
Using Machine Learning Tools in Research Synthesis.
Systematic Reviews, 8, 163. https://doi.org/10.1186/s13643-019-0974-z
McCarthy, J. (1960). Recursive functions of symbolic expressions and
their computation by machine, part I. Communications of
the ACM, 3(4), 184–195. https://doi.org/10.1145/367177.367199
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the
ideas immanent in nervous activity. The Bulletin of Mathematical
Biophysics, 5(4), 115–133. https://doi.org/10.1007/BF02478259
Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to
computational geometry. MIT Press.
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L.,
Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model Cards for Model Reporting. Proceedings
of the Conference on Fairness, Accountability, and Transparency
(FAT*), 220–229. https://doi.org/10.1145/3287560.3287596
Mitchell, M. (2019). Artificial intelligence: A guide for thinking
humans. Farrar, Straus; Giroux.
Model Context Protocol Working Group. (2025). Model Context
Protocol Specification. https://modelcontextprotocol.io/specification/2025-11-25.
Mollick, E. (2024a). 15 Times to use AI, and 5
Not to. One Useful Thing (Substack). https://www.oneusefulthing.org/p/15-times-to-use-ai-and-5-not-to
Mollick, E. (2024b). Four Singularities for
Research. One Useful Thing (Substack). https://www.oneusefulthing.org/p/four-singularities-for-research
Mollick, E. (2024c). Getting started with AI:
Good enough prompting. One Useful Thing (Substack). https://www.oneusefulthing.org/p/getting-started-with-ai-good-enough
Mollick, E. (2024d). Latent Expertise: Everyone
is in R&D. One Useful Thing (Substack). https://www.oneusefulthing.org/p/latent-expertise-everyone-is-in-r
Mollick, E. (2024e). Thinking Like an
AI. One Useful Thing (Substack). https://www.oneusefulthing.org/p/thinking-like-an-ai
Mollick, E. (2025a). An Opinionated Guide to
Using AI Right Now. One Useful Thing (Substack). https://www.oneusefulthing.org/p/an-opinionated-guide-to-using-ai
Mollick, E. (2025b). Giving your AI a Job
Interview. One Useful Thing (Substack). https://www.oneusefulthing.org/p/giving-your-ai-a-job-interview
Mollick, E. (2025c). On Jagged AGI: o3, Gemini
2.5, and everything after. One Useful Thing (Substack). https://www.oneusefulthing.org/p/on-jagged-agi-o3-gemini-25-and-everything
Mollick, E. (2025d). On Working with
Wizards. One Useful Thing (Substack). https://www.oneusefulthing.org/p/on-working-with-wizards
Mollick, E. (2025e). Personality and
Persuasion. One Useful Thing (Substack). https://www.oneusefulthing.org/p/personality-and-persuasion
Mollick, E. (2025f). Real AI Agents and Real
Work. One Useful Thing (Substack). https://www.oneusefulthing.org/p/real-ai-agents-and-real-work
Mollick, E. (2025g). The Cybernetic Teammate. One
Useful Thing (Substack). https://www.oneusefulthing.org/p/the-cybernetic-teammate
Mollick, E. (2025h). The End of Search, The
Beginning of Research. One Useful Thing (Substack). https://www.oneusefulthing.org/p/the-end-of-search-the-beginning-of
Mollick, E. (2026a). Claude Dispatch and the
Power of Interfaces. One Useful Thing (Substack). https://www.oneusefulthing.org/p/claude-dispatch-and-the-power-of
Mollick, E. (2026b). Management as AI
Superpower. One Useful Thing (Substack). https://www.oneusefulthing.org/p/management-as-ai-superpower
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019).
Dissecting Racial Bias in an Algorithm Used to
Manage the Health of Populations. Science,
366(6464), 447–453. https://doi.org/10.1126/science.aax2342
OpenAI. (2024). OpenAI o1 System
Card. https://cdn.openai.com/o1-system-card-20241205.pdf. https://arxiv.org/abs/2412.16720
OpenAI. (2026). Why SWE-bench Verified
no longer measures frontier coding capabilities. Technical post.
Ouyang, L., Wu, J., Jiang, X., Almeida, D.,
Wainwright, C. L., Mishkin, P., et al. (2022). Training language
models to follow instructions with human feedback. Advances in
Neural Information Processing Systems 35.
Qian, Z., Davis, R., & Schaar, M. van der. (2023).
Synthcity: A Benchmark Framework for
Diverse Use Cases of Tabular Synthetic Data. Advances in
Neural Information Processing Systems Datasets and Benchmarks
Track, 36. https://arxiv.org/abs/2301.07573
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018).
Improving language understanding by generative pre-training.
OpenAI technical report.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever,
I. (2019). Language models are unsupervised multitask learners.
OpenAI technical report.
Raschka, S. (2026a). A dream of spring for open-weight
LLMs: 10 architectures from Jan-Feb 2026.
Sebastian Raschka Magazine. https://magazine.sebastianraschka.com/p/a-dream-of-spring-for-open-weight
Raschka, S. (2026b). A visual guide to attention variants in modern
LLMs. Sebastian Raschka Magazine. https://magazine.sebastianraschka.com/p/visual-attention-variants
Raschka, S. (2026c). Components of a coding agent. Sebastian
Raschka Magazine. https://magazine.sebastianraschka.com/p/components-of-a-coding-agent
Rosenblatt, F. (1958). The perceptron: A probabilistic model for
information storage and organization in the brain. Psychological
Review, 65(6), 386–408. https://doi.org/10.1037/h0042519
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning
representations by back-propagating errors. Nature,
323(6088), 533–536. https://doi.org/10.1038/323533a0
Russell, S., & Norvig, P. (2021). Artificial intelligence: A
modern approach (4th ed.). Pearson.
Salvi, F., Ribeiro, M. H., Gallotti, R., & West, R. (2025). On the
conversational persuasiveness of large language models: A randomized
controlled trial. Nature Human Behaviour, 9(8),
1645–1653. https://doi.org/10.1038/s41562-025-02194-6
Schaeffer, R., Miranda, B., & Koyejo, S. (2023). Are emergent
abilities of large language models a mirage? Advances in Neural
Information Processing Systems 36.
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M.,
Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023).
Toolformer: Language Models Can Teach
Themselves to Use Tools. Advances in Neural Information
Processing Systems, 36. https://arxiv.org/abs/2302.04761
Shahul, E., James, J., Espinosa-Anke, L., & Schockaert, S. (2024).
RAGAS: Automated Evaluation of
Retrieval Augmented Generation. Proceedings of the 18th
Conference of the European Chapter of the Association for Computational
Linguistics: System Demonstrations (EACL). https://aclanthology.org/2024.eacl-demo.16/
Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W.,
Scales, N., Tanwani, A., Cole-Lewis, H., Pfohl, S., Payne, P.,
Seneviratne, M., Gamble, P., Kelly, C., Babiker, A., Schärli, N.,
Chowdhery, A., Mansfield, P., Demner-Fushman, D., … Natarajan, V.
(2023). Large Language Models Encode Clinical Knowledge.
Nature, 620, 172–180. https://doi.org/10.1038/s41586-023-06291-2
Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Hou, L.,
Clark, K., Pfohl, S., Cole-Lewis, H., Neal, D., Schaekermann, M., Wang,
A., Amin, M., Lachgar, S., Mansfield, P., Prakash, S., Green, B.,
Dominowska, E., Agüera y Arcas, B., … Natarajan, V. (2025). Toward Expert-Level Medical Question Answering with Large
Language Models. Nature Medicine, 31, 943–950.
https://doi.org/10.1038/s41591-024-03423-7
Tierney, A. A., Gayre, G., Hoberman, B., Mattern, B., Ballesca, M.,
Kipnis, P., Liu, V., & Lee, K. (2024). Ambient
Artificial Intelligence Scribes to Alleviate the Burden of Clinical
Documentation. NEJM Catalyst Innovations in Care
Delivery, 5(3). https://doi.org/10.1056/CAT.23.0404
Tu, T., Palepu, A., Schaekermann, M., Saab, K.,
Freyberg, J., Tanno, R., Wang, A., Li, B., Amin, M., Tomasev, N., Azizi,
S., Singhal, K., et al. (2025). Towards Conversational
Diagnostic Artificial Intelligence. Nature,
642, 442–450. https://doi.org/10.1038/s41586-025-08866-7
U.S. Food and Drug Administration. (2024). Marketing Submission Recommendations for a Predetermined
Change Control Plan for Artificial Intelligence-Enabled Device Software
Functions. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/marketing-submission-recommendations-predetermined-change-control-plan-artificial-intelligence.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is
All You Need. 30. https://arxiv.org/abs/1706.03762
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi,
E. H., Le, Q. V., & Zhou, D. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large
Language Models. 35. https://arxiv.org/abs/2201.11903
Xiong, G., Jin, Q., Lu, Z., & Zhang, A. (2024). Benchmarking Retrieval-Augmented Generation for
Medicine. Findings of the Association for Computational
Linguistics: ACL 2024. https://aclanthology.org/2024.findings-acl.372/
Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K.
(2019). Modeling Tabular Data using Conditional
GAN. Advances in Neural Information Processing Systems,
32. https://arxiv.org/abs/1907.00503
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., &
Cao, Y. (2023). ReAct: Synergizing
Reasoning and Acting in Language Models. International
Conference on Learning Representations (ICLR). https://arxiv.org/abs/2210.03629
Zakka, C., Shad, R., Chaurasia, A., Dalal, A. R.,
Kim, J. L., Moor, M., Fong, R., Phillips, C., Alexander, K., Ashley, E.,
Boyd, J., Boyd, K., et al. (2024). Almanac —
Retrieval-Augmented Language Models for Clinical Medicine.
NEJM AI, 1(2). https://doi.org/10.1056/AIoa2300068
Zhang, S., Xu, Y., Usuyama, N., Xu, H., Bagga, J., Tinn, R., Preston,
S., Rao, R., Wei, M., Valluri, N., Wong, C., Tupini, A., Wang, Y.,
Mazzola, M., Shukla, S., Liden, L., Gao, J., Lungren, M. P., Naumann,
T., … Poon, H. (2025). BiomedCLIP: A
Multimodal Biomedical Foundation Model Pretrained from Fifteen Million
Scientific Image-Text Pairs. NEJM AI. https://arxiv.org/abs/2303.00915
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y.,
Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., &
Stoica, I. (2023). Judging LLM-as-a-Judge with
MT-Bench and Chatbot Arena. Advances in Neural Information
Processing Systems Datasets and Benchmarks Track, 36. https://arxiv.org/abs/2306.05685
Zhou, Y., Chia, M. A., Wagner, S. K., Ayhan, M. S.,
Williamson, D. J., Struyven, R. R., Liu, T., Xu, M., Lozano, M. G.,
Woodward-Court, P., et al. (2023). A
Foundation Model for Generalizable Disease Detection from Retinal
Images. Nature, 622, 156–163. https://doi.org/10.1038/s41586-023-06555-x