Bibliography#

1

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: training multi-billion parameter language models using model parallelism. CoRR, 2019. URL: http://arxiv.org/abs/1909.08053, arXiv:1909.08053.

2

Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, and Matei Zaharia. Efficient large-scale language model training on GPU clusters. CoRR, 2021. URL: https://arxiv.org/abs/2104.04473, arXiv:2104.04473.

3

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. CoRR, 2020. URL: https://arxiv.org/abs/2001.08361, arXiv:2001.08361.

4

Christian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, and Kevin K. Yang. Flip: benchmark tasks in fitness landscape inference for proteins. 2022. doi:10.1101/2021.11.09.467890.

5

Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. U. S. A., April 2021. doi:10.1073/pnas.2016239118.

6

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. October 2018. arXiv:1810.04805.

7

UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res., 49(D1):D480–D489, January 2021. doi:10.1093/nar/gkaa1100.

8

Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, and Burkhard Rost. Prottrans: towards cracking the language of life's code through self-supervised deep learning and high performance computing. CoRR, 2020. arXiv:2007.06225.

9

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR, 2019. arXiv:1910.10683.

10

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. BART: denoising Sequence-to-Sequence pre-training for natural language generation, translation, and comprehension. October 2019. arXiv:1910.13461.

11

Ross Irwin, Spyridon Dimitriadis, Jiazhen He, and Esben Jannik Bjerrum. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn.: Sci. Technol., 3(1):015022, January 2022. doi:10.1088/2632-2153/ac3ffb.

12

Teague Sterling and John J Irwin. ZINC 15–ligand discovery for everyone. J. Chem. Inf. Model., 55(11):2324–2337, November 2015. doi:10.1021/acs.jcim.5b00559.