4.5.2. Get sample data#

To proceed, you will need to download the protein query sequence and database used in this exercise.

Download query sequence#

The protein query sequence used in this exercise is Spike glycoprotein from Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It is available from UniProtKB — the protein knowledge base.

The database identifier for this protein is P0DTC2. You can download the sequence in FASTA format from the entry page or using this direct link:


Download protein database#

The database used in this exercise is UniProtKB Swiss-Prot. It is a manually annotated database of protein sequences with added functional information.

You can download the entire database as a compressed FASTA format file from the downloads page on the website.