Title: Evidence for ZAP-independent CpG reduction in SARS-CoV-2 genome, and pangolin coronavirus origin of 5’UTR
Abstract: SARS-CoV-2, the causative agent of COVID-19, has an RNA genome, which is, overall, closely related to the bat coronavirus sequence RaTG13. However, the ACE2-binding domain of this virus is more similar to a coronavirus isolated from pangolin. In addition to this unique feature, the genome of SARS-CoV-2 (and its closely related coronaviruses) has a low CpG content. This has been postulated to be signature of an evolutionary pressure exerted by the host antiviral protein ZAP. Here, we analyzed the sequences of a wide range of viruses using both alignment-based and alignment free approaches to investigate the origin of SARS-CoV-2 genome. Our analyses revealed a high level of similarity between the 5UTR of SARS-CoV-2 and that of a Guangdong pangolin coronavirus. These data suggest that not only ACE2, but also the 5UTR of SARS-CoV-2 likely has a pangolin coronavirus origin. Additionally, we performed a detailed analysis of viral genome compositions as well as expression and RNA binding data of ZAP to show that the low CpG abundance in SARS-CoV-2 is not related to an evolutionary pressure from ZAP.