4.5.2. Get sample data#
To proceed, you will need to download the protein query sequence and database used in this exercise.
Download query sequence#
The protein query sequence used in this exercise is Spike glycoprotein from Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It is available from UniProtKB — the protein knowledge base.
The database identifier for this protein is P0DTC2. You can download the sequence in FASTA format from the entry page or using this direct link:
Download protein database#
The database used in this exercise is UniProtKB Swiss-Prot. It is a manually annotated database of protein sequences with added functional information.
You can download the entire database as a compressed FASTA format file from the downloads page on the website.