2 A

2.1 Alignments

From Sharber, W. Introduction to Sequence Alignments with Biopython. Towards Data Science. Medium. https://towardsdatascience.com/introduction-to-sequence-alignments-with-biopython-f3b6375095db

“When working with biological sequence data, either DNA, RNA, or protein, biologists often want to be able to compare one sequence to another in order to make some inferences about the function or evolution of the sequences. Just like you wouldn’t want to use data from data tables where data was in the wrong column for analyses, in order to make robust inferences from sequence data, we need to make sure our sequence data is well organized or “aligned.” Unfortunately, sequence data does not come with nice labels, like a date, miles per gallon, or horsepower. Instead, all we have is the position number in the sequence, and that is relative to that sequence only. Luckily, many sequences are highly conserved or similar between related organisms (and all organisms are related to some degree!). If we’re fairly certain that we’ve obtained data from the same sequence from multiple organisms, we can put that data into a matrix that we call an alignment. If you’re only comparing two sequences, it’s called a pairwise alignment. If you’re comparing three or more sequences, it’s called a multiple sequence alignment (MSA).

“Using the positions and the identity of each molecule in the sequence, we can infer the relative placement of each molecule in the matrix. Sometimes there will be differences in the sequence, for example, in a position where most sequences are C, we find a sequence with a G. This is referred to as a single nucleotide polymorphism (SNP). In other times, we find that a sequence is missing a molecule that is present in the rest, or a sequence has an extra molecule. The former is a deletion, while the latter is an insertion, together referred to as “indels.” When aligning sequences with indels, we must account for these extra or missing molecules by adding gaps to the remaining sequences. These small differences are usually the interesting parts of sequence data because the variation is how we can make inferences on the function or evolution of the sequence…”

2.2 Talking Glossary: Allele (0.5 min)

Note: This glossary definition focuses on the traditional definition of an allele. More broadly, an allele is any genetic variant, whether it is in coding or non-coding DNA, impacts the phenotype of an organism or is neutral, involves a single base or many. This is mentioned in the last line of the abstract.

Abstract: “An allele is one of two or more versions of a gene [or, more generally, a locus]. An individual inherits two alleles for each gene [locus], one from each parent. If the two alleles are the same, the individual is homozygous for that gene [locus]. If the alleles are different, the individual is heterozygous. Though the term allele was originally used to describe variation among genes, it now also refers to variation among non-coding DNA sequences [emphasis added].”

Audio: https://www.genome.gov/sites/default/files/tg/en/narration/allele.mp3

Note Any time they say “gene”, think “locus.”

Transcript: ““Allele” is the word that we use to describe the alternative form or versions of a gene. People inherit one allele for each autosomal gene [locus] from each parent, and we tend to lump the alleles into categories. Typically, we call them either normal or wild-type alleles, or abnormal, or mutant alleles.”

Leslie G. Biesecker, M.D.

Image (from) https://rarediseases.info.nih.gov/files/glossary/english/allele_sm.jpg