logo

SARS-CoV-2 – Part I

evaluating variant data for better patient care

May 3rd 2020

The COVID-19 pandemic is most probably one of the most singular events that anyone globally has witnessed at this scale. It’s the one time we can truly say nature has won over man by simply exerting its very presence. The very tools of globalization, modern medicine, information technology, and policy has not prepared the world to take on such a large-scale pandemic. This might be the new normal for many years to come. I took the opportunity to review as much literature that I could find on SARS-CoV-2 over the last few weeks along with doing some preliminary analysis on sequences submitted. I have put references for all that has been used to draw up my opinions, so things can be cross checked. Reviewing the data from published literature, along with analyzing sequences the goal of this document is to analyze what can be developed to allow us to tackle this unknown pandemic enemy. I have done a general overview SARS-CoV-2 mode of transmission, infection, and replication before diving into the analysis. The idea of the analysis not to only show correlation with the biological aspects from various studies but to also create future data points to compare sequences after the April 17th, 2020 and evaluate any further changes. The initial process of doing this was more of a discovery but going forward all new sequences can be evaluated under the criteria drawn up in a few hours. Furthermore, the idea of linking clinical data to variants in future should help to assist frontline healthcare to triage and resource manage based on the variant matrix that evolves. A future endeavor would be to link management with the variant type based on machine learning algorithms. General Information – the foundation

General Information – the foundation:

Mode of transmission:

Is usually by droplet infection or contact with fomites. Once the virus is ingested or enters the nasal, oral orifice it tends to stay in the upper respiratory tract. The virus has a rate of transmission R0 = 2.5[11]. This is comparatively higher than other viruses in the environment and without herd immunity or vaccines in some studies it is estimated to transmit at higher rates.

How does the coronavirus choose the human cell it wants to harm?

The virus initiates a process called the infectious cycle. A human cell maybe susceptible, permissive or both to the virus. The general steps are:

Attachment --> Penetration --> Uncoating --> Biosynthesis (Replicates) --> Maturation --> Release

The virus as we know enters the cell through its spikes which is a S glycoprotein. This gene responsible is the S gene (bp position 21563..35384). Its structure is similar to the human SARS virus from 2002. So once a virus has found a susceptible (this is a receptive cell for infection – in this case ACE-2) it enters the host’s cell and replicates itself using the host cell (human cell) machinery.

What happens once the virus enters the human cell?

Once the virus enters the cytoplasm of the human cell the virus releases its RNA into the cytoplasm. The viral RNA is transformed into mRNA which then initiates viral protein translation. The viral mRNA has enough genetic information to encode many proteins. These proteins are the precursor viral proteins created to initiate the process of replication. The RdRP (RNA dependent RNA polymerase) is an important protein which generates RNA from an RNA template, this especially holds true for the coronaviruses. The ORF1ab area which starts from bases 266 all the way 21,555 encompasses RdRP (join position 13422.. 13468, 13468 .. 16236) within the frame. The open reading frame also encompasses various polyproteins that is used in the replication process.

How does the virus replicate?

Once in the cytoplasm the viral RNA creates mRNA which synthesize various protein complexes by multiple mechanisms namely IRES-mediated, CAP-dependent translation[9][7]. I will not cover the process itself but more importantly talk about how the process is initiated. At the 5’ end of the process of translation is initiated by a group of proteins called eIFs[8] (eukaryotic initiation factors). The multiple eukaryotic initiation factors, along with enzymes and other complexes of proteins contribute to creating complexes which initiate the replication of the virus using the host cell. The most important thing to understand that when it comes to coronaviruses because of the size (average of 30,000 bases) there is an inbuilt process of proof-reading mechanism reducing error rates of replication and in turn slowing mutations[15]. Another method by which coronaviruses are known to propagate is by recombination. Here they can infect and also cross species which might explain how viruses cross infect various hosts.

A positive take away is the fact that 1. Mutation rate of the coronavirus is slower than other viruses 2. Indirectly this is good as it there is a good chance for vaccine development.

How do we know so much about the virus and why don’t we have any drugs or vaccines?

A lot of the information of SARS-CoV-2 came from representative information from SARS which was a global issue between 2002 – 2003. To put it in context SARS (and most probably SARS-CoV-2) mutate at a slower rate than that of HIV. Influenza’s mutation rate[10], is much higher than both coronavirus and HIV. We have the flu vaccine, but why don’t we have a vaccine[4] for common cold causing coronavirus strains such as OC43 and HKU1?

If it is not possible to develop vaccines, then maybe it is possible to develop a drug. Let’s not consider a repurposed drug, there are over 400 clinical trials out there at the moment for this. If we knew the mechanism of SARS and coronavirus in general surely, we’d be able to have a usable drug. Targeting replication especially the eIF4 initiation complex[13] would theoretically prevent multiplication. Taking this approach into account there are already a few molecules that researchers have found and even registered patents for to treat coronavirus. One particular patent (https://patents.google.com/patent/EP3305290A1/en) in 2016 specifically states silvestrol[6] / episilvestrol as a molecule to treat coronavirus which is structurally very similar to SARS-CoV-2. Further to silvestrol as a possible silver bullet, other molecules such as hippuristanol[19],pateamine A, or amidino-rocaglates[12] could be looked at as alternatives based on studies. A recent study evaluates blocking the spike protein[2] using a fusion inhibitor EK1C4, allowing it be used across all human betacoronaviruses. Yes, clinical trials would have to be done, but considering 3 major outbreaks from these group of viruses SARS, MERs and now SARS-CoV-2 would it not be in the interest of mankind to do this?

Analysis:

Tools used:

The reference NC_045512.2

Accessions uploaded to NCBI upto 17th of April 2020. All the accessions uploaded are SARS-CoV-2.

Bioinformatic tools:

    Blast – NCBI – sequence extraction and database formation
    Samtools – indexing reference, sequences and alignments
    Minimap2 – create the aligned bam against the reference genome
    Pileup (bcftools) – to draw up genotypes (considering the consensus)
    Call (bcftools) – to call the variantsIGV – to view the variants
When analyzing the virus this was done against the reference virus from NCBI. The accession number of the reference number published is NC_045512. As my objective is not epidemiological nor to evaluate whether the virus was man made I focused on SARS-CoV-2 virus accessions only. From that point all the accessions representing complete genomes were analyzed. As of date the evaluation was carried on 824 accessions the list of which can be found on the virfast.com/covid19 page along with the variant type found. Only the complete genomes were assessed in the accessions.

Criteria applied:

There were 11 variants noted and frequencies of mutations above 20% were considered.Looking at the codons and amino acids the mutation found at position 241 at the 5’ UTR was dropped from the list as its impact was outside any major coding region.only the genes referenced as per their CDS (nucleotide) positions were considered.All the considered mutations following the consensus were SNP (single nucleotide polymorphisms) and a frequency on the consensus of more than 20% was reviewed.Some sequences had deletions and insertions with the reported variants which have been documented here however the representative percentage was 1.3%. An observation based on the patterns seen needs to be linked with clinical data. Other samples in the consensus which could not be categorized as synonymous or non-synonymous were removed from being analyzed. This was a group of 5 samples which comprised of 0.6%. Consistent mutations were found in 3 open reading frame areas and also in the ‘S’ gene area. The areas are listed below:
    ORF1ab
    ORF3a
    ORF8
    ‘S' gene
The mutation in the ORF1ab region where the RdRP protein complex is represented was considered separately. Other changes from the reference seen in the ORF1ab are also considered. As per the paper[20] the nCOV – SARS2 virus has been categorized into 2 types namely the L (leucine)type and the S(Serine) type. Based on the above classifications done for L type and S type, the variants were characterized further by position and type. I had to consider that the open reading frame ORF1ab would cover variants on the RdRP protein area. Below are the variants and related mutations:

ORF1ab Mutations:

Number Mutation Position Mutation Type
1 C -> T 241 Not considered
2 C -> T 1,059 Non Synonymous
3 C -> T 3,037 Synonymous
4 C -> T 8,782 Synonymous
5 C -> T 14,408 Non Synonymous
6 C -> T 17,747 Non Synonymous
7 A-> G 17,858 Non Synonymous
8 C -> T 18,060 Non Synonymous
The RdRP protein site is represented by the nucleotides [13442..13468,13468..16236] seen in ORF1ab. Multiple nsp complexes are seen throughout the ORF1ab gene region which contribute to the translation complex.

RdRP mutation position:

14408

'S' Gene mutation:

Number Mutation Position Mutation Type
1 A -> G 23,403 Non-Synonymous


The ‘S’ gene is responsible for the S ‘spike’ glycoprotein production. As per studies[5] the S glycoprotein has two subunits S1 and S2 that use a furin cleavage site at the boundary of both subunits to attach to ACE2 cells in respiratory tract and gain access into the host system. This similar mechanism has been used by other betacoronaviruses including CoV-SARS which caused the original outbreak in 2002.


23403

ORF3a mutation:


Number

Mutation

Position

Mutation Type

1.

G>T

25,563

Non-Synonymous


The ORF3a protein has been found to play a contributory role in UPR(unfolded protein response) through the ER(endoplasmic reticulum) hence also contributing to the protein complexes for translation[17]. The particular region is also involved in activating the inflammasome[18] which creates a pro-inflammatory reaction through an interleukin response. Could this possibly explain the cytokine storm that has been seen in multiple clinical cases[3] ?. Furthermore as per clinical meta-analysis studies the fact that IL-1 and TNF are activated causing insulin resistance, a possibility could exist between a mutation here and the co-morbidity seen in diabetes.



25563


ORF8 mutation:

Number

Mutation

Position

Mutation Type

1.

T>C

28,144

Non-Synonymous


The significance of this mutation can be linked to the previously mentioned paper[20] where the presence of Leucine due to the above mutation has been linked to the L type. The L type from an observational stand point has been linked with a different set of clinical features which seem to be more aggressive. As per the paper the spread of the L type is more compared to the S type which is approximately 70% as opposed to the 30% of S type. In this study set the L type represented 63% of the samples analyzed.


No.

Variant

S / L type

1

Variant 1

L type

2

Variant 2

L type

3

Variant 3

L type

4

Variant 4

L Type

5

Variant 5

S type

6

Variant 6

S type

7

Variant 7

S type

8

Variant 8

S type

9

Variant 9

L Type


Exclusions:


The following samples were excluded from being grouped into variants because of non-specific nucleic acid changes. These exclusions represent 0.6% of the data set.

    MT326065
    MT258381
    MN988713
    MT325585
    MT344944


Mutations:


Variants based on > 20% frequency noted for 824 samples:


Variant

1059

3037

8782

17747

17858

18060

14048

23403

28144

25563

Variant 1

C/C

C/C

C/C

C/C

A/A

C/C

C/C

A/A

T/T

G/G

Variant 2

C/C

C/T

C/C

C/C

A/A

C/C

C/T

A/G

T/T

G/G

Variant 3

C/C

C/T

C/C

C/C

A/A

C/C

C/T

A/G

T/T

G/T

Variant 4

C/T

C/T

C/C

C/C

A/A

C/C

C/T

A/G

T/T

G/T

Variant 5

C/C

C/C

C/T

C/C

A/A

C/C

C/C

A/A

T/C

G/G

Variant 6

C/C

C/C

C/T

C/T

A/G

C/T

C/C

A/A

T/C

G/G

Variant 7

C/C

C/C

C/T

C/C

A/A

C/T

C/C

A/A

T/C

G/G

Variant 8

C/C

C/C

C/T

C/T

A/G

C/C

C/C

A/A

T/C

G/G

Variant 9

C/C

C/C

C/C

C/C

A/A

C/C

C/T

A/G

T/T

G/T



snp


Apart from the mutations seen above there were some noticeable and consistent deletions, a couple of insertions seen in the consensus sequence. Though these are listed below, it remains inconclusive if these insertions and deletions have a correlation with pathogenicity.


List of Samples with Deletions Insertions:


Number

Sample

Insertion

Deletion

1.

MT044258

-

508-523,686- 695

2.

MT345855

-

686-695

3.

MT326123

-

686-695

4.

MT326124

-

510-519

5.

MT334539

-

669-672

6.

MT188341

21,386-21,388

29,835-29,837

29,845-29852, 29,859-29863, 29867-29871

7.

MT263469

-

29,867-29,869

8.

MT263425

-

29,867-29,869

9.

MT263426

-

29,867-29,869

10.

MT246485

-

29,867-29,869

11.

MT163717

-

29,867-29,869



Insertion and deletions represented roughly 1.3% of the sample size of 824 samples. Reviewing this data for consistency in terms of read quality for some sequences will be important. Furthermore, the specific impact of these deletions and insertions would have to be co related with clinical data.


mutate



Country Variant representation:


usa



china



others


Clincal Impact:


As per this metanalysis study[16] the common presenting symptoms and represented by prevalence:

    1. Fever – 91%
    2. Cough – 68%
    3. Fatigue – 51%
    4. Dynspea (shortness of breath) – 30%


Further to the above symptomology clinicians have noticed a cytokine storm which tends to cause the body to fight itself based on an immunological response[3] . A recent notice from from the PICS(Paediatric Intensive Care Society) shows a small number of children reporting symptoms which mimic a toxic shock syndrome and atypical Kawasaki’s disease. In relation to the immune response and the cytokine storm a study in China[1] shows that inflammatory cytokines were raised through both severe and non-severe forms of the disease. The meta-analysis study[16] looks at 1,576 patients in total. It draws on the impact of infection along with co-morbidities. Common co-morbidities seen in the study are hypertension, diabetes, respiratory diseases (such as COPD), and other cardiovascular diseases. Other studies[21] expands on the laboratory investigations with CT Scan for all patients showing patchy opacity or ground glass appearance. Other tests[14] that showed an abnormal range in the readings across the study population was the procalcitonin test, lactate dehydrogenase, interleukin-6 test, raised D-dimers, CRP (c-reactive protein), ESR (erythrocyte sedimentation rate), APTT (activated partial thromboplastin time) and a raised prothrombin time. Compared to non ICU patients a comparative increase was seen in white blood count, neutrophil count, D-dimers, and creatinine kinase.




Conclusion:



Using the variant data and based on the literature its clear the variations in sequence of the SARS-CoV-2 do have a direct effect on pathology in the host. Linking clinical features and ancillary investigations along with species and variant identification would allow for efficient resource allocation in patient management and effective triage. The above information can also be used to develop treatment algorithms that are linked to the various variants. With more data an efficient machine learning algorithm will add immense value to information that has been generated to date.



Next update – analysis of SARS-CoV-2 sequences over the last 2 weeks quantifying observations in this article.


References:

    1. Qin, Chuan, et al. “Dysregulation of Immune Response in Patients with COVID-19 in Wuhan, China.” SSRN Electronic Journal, 2020, doi:10.2139/ssrn.3541136.
    2. Xia, Shuai, et al. “Inhibition of SARS-CoV-2 (Previously 2019-NCoV) Infection by a Highly Potent Pan-Coronavirus Fusion Inhibitor Targeting Its Spike Protein That Harbors a High Capacity to Mediate Membrane Fusion.” Cell Research, vol. 30, no. 4, 2020, pp. 343–355., doi:10.1038/s41422-020-0305-x.
    3. Goodman, Brenda. “Cytokine Storms May Be Fueling Some COVID Deaths.” WebMD, WebMD, 17 Apr. 2020, www.webmd.com/lung/news/20200417/cytokine-storms-may-be-fueling-some-covid-deaths.
    4. Graham, Rachel L., et al. “Evaluation of a Recombination-Resistant Coronavirus as a Broadly Applicable, Rapidly Implementable Vaccine Platform.” Communications Biology, vol. 1, no. 1, 2018, doi:10.1038/s42003-018-0175-7.
    5. Walls, Alexandra C., et al. “Structure, Function and Antigenicity of the SARS-CoV-2 Spike Glycoprotein.” Cell , no. 180, 16 Apr. 2020, pp. 281–292., doi:10.1101/2020.02.19.956581.
    6. Müller, Christin, et al. “Broad-Spectrum Antiviral Activity of the eIF4A Inhibitor Silvestrol against Corona- and Picornaviruses.” Antiviral Research, vol. 150, 2018, pp. 123–129., doi:10.1016/j.antiviral.2017.12.010.
    7. Nakagawa, K., et al. “Viral and Cellular MRNA Translation in Coronavirus-Infected Cells.” Coronaviruses Advances in Virus Research, 2016, pp. 165–192., doi:10.1016/bs.aivir.2016.08.001.
    8. Merrick, William C. “Cap-Dependent and Cap-Independent Translation in Eukaryotic Systems.” Gene, vol. 332, 2004, pp. 1–11., doi:10.1016/j.gene.2004.02.051.
    9. Richter, Joel D., and Nahum Sonenberg. “Regulation of Cap-Dependent Translation by eIF4E Inhibitory Proteins.” Nature, vol. 433, no. 7025, 2005, pp. 477–480., doi:10.1038/nature03205.
    10. Zhao, Zhongming, et al. “Moderate Mutation Rate in the SARS Coronavirus Genome and Its Implications.” Moderate Mutation Rate in the SARS Coronavirus Genome and Its Implications, 1471-2148/4/21, 28 June 2004, doi:doi:10.1186/1471-2148-4-21.
    11. Lauer, Stephen A., et al. “The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application.” Annals of Internal Medicine, 2020, doi:10.7326/m20-0504.
    12. Chu, Jennifer, et al. “Amidino-Rocaglates: A Potent Class of eIF4A Inhibitors.” Cell Chemical Biology, vol. 26, no. 11, 2019, doi:10.1016/j.chembiol.2019.08.008.
    13. Cencic, R., et al. “Blocking eIF4E-eIF4G Interaction as a Strategy To Impair Coronavirus Replication.” Journal of Virology, vol. 85, no. 13, 2011, pp. 6381–6389., doi:10.1128/jvi.00078-11.
    14. Wu, Chaomin, et al. “Risk Factors Associated With Acute Respiratory Distress Syndrome and Death in Patients With Coronavirus Disease 2019 Pneumonia in Wuhan, China.” JAMA Internal Medicine, 2020, doi:10.1001/jamainternmed.2020.0994.
    15. Peck, Kayla M., and Adam S. Lauring. “Complexities of Viral Mutation Rates.” Journal of Virology, vol. 92, no. 14, 2018, doi:10.1128/jvi.01031-17.
    16. Yang, Jing, et al. “Prevalence of Comorbidities and Its Effects in Coronavirus Disease 2019 Patients: A Systematic Review and Meta-Analysis.” International Journal of Infectious Diseases, vol. 94, 2020, pp. 91–95., doi:10.1016/j.ijid.2020.03.017.
    17. Minakshi, Rinki, et al. “The SARS Coronavirus 3a Protein Causes Endoplasmic Reticulum Stress and Induces Ligand-Independent Downregulation of the Type 1 Interferon Receptor.” PLoS ONE, vol. 4, no. 12, 2009, doi:10.1371/journal.pone.0008342.
    18. Siu, Kam‐Leung, et al. “Severe Acute Respiratory Syndrome Coronavirus ORF3a Protein Activates the NLRP3 Inflammasome by Promoting TRAF3‐Dependent Ubiquitination of ASC.” The FASEB Journal, vol. 33, no. 8, 2019, pp. 8865–8877., doi:10.1096/fj.201802418r.
    19. Cencic, Regina, and Jerry Pelletier. “Hippuristanol - A Potent Steroid Inhibitor of Eukaryotic Initiation Factor 4A.” Translation, vol. 4, no. 1, 2016, doi:10.1080/21690731.2015.1137381.
    20. Tang, Xiaolu, et al. “On the Origin and Continuing Evolution of SARS-CoV-2.” National Science Review, 2020, doi:10.1093/nsr/nwaa036.
    21. Huang, Chaolin, et al. “Clinical Features of Patients Infected with 2019 Novel Coronavirus in Wuhan, China.” The Lancet, vol. 395, no. 10223, 2020, pp. 497–506., doi:10.1016/s0140-6736(20)30183-5.