Disease carrier frequencies – how to estimate them?

When a Carrier Screening panel is designed, an important factor in deciding which diseases should be included is the carrier frequencies of the diseases. Scientific societies recommend that only those diseases that are more prevalent in the population and, therefore, there is a higher risk of having affected offspring if no genetic analysis is performed, should be screened in patients as well as in donors. However, how these frequencies are estimated?

To begin with, the carrier frequency of a disease would be defined as the proportion of individuals in the general population who are carriers of pathogenic or probably pathogenic variants in a given gene or genes that can cause a disease. To calculate the carrier frequency two approaches can be used, which will give us the “estimated carrier frequency” and the “observed carrier frequency”.

To obtain the estimated carrier frequency in the general population, a calculation is made from the number of affected individuals in the population, based on the Hardy and Weinberg law (H-W law). This law denominates “p and q” the frequency of a normal allele and of a mutated allele, respectively. Likewise, in the case of recessive diseases,the frequency of affected individuals can be referred as q2, the frequency of healthy carriers as 2pq and the frequency of healthy individuals with the two normal alleles as p2. The H-W law also states that the sum of these 3 factors is always 1: in other words, p2+2pq+q2=1. Therefore, if the frequency of affected individuals of a recessive disease in general population is known, through this formula the value of p and q can be known and, therefore, calculate the frequency of healthy carriers in the general population (2pq). However, it should be noted that this law has limitations since it assumes no natural selection, no de novo mutations and no migration. In addition to these implicit limitations of the H-W law, there are many factors that can influence our calculations of the estimated carrier frequency, such as:

  • Existence of more than one gene that causes a disease. Sometimes there are diseases that may be due to alterations in different genes that cause a very similar phenotype, which may result in patients not being genetically diagnosed and, therefore, when the frequency of affected patients in the general population is calculated, this number is not accurate because the corresponding cases are not assigned to the correct gene.
  • Diseases with a mild form. If a disease can have different degrees of severity, it is possible that some of the affected individuals are not diagnosed with the disease, which leads to the total number of affected individuals in the population being underestimated and, therefore, the estimated frequency of carriers in the population is also lower than the real one.
  • Diseases whose course is so severe that some of those affected die prematurely, causing miscarriages in the early stages.
  • Lack of general databases that allow us to make an exact count of affected people. In many cases, patients are correctly diagnosed by their physician, but in the absence of a global registry it is not possible to obtain an exact number of affected people.

Another way of assessing carrier frequencies, which does not require a reliable count of affected individuals in the general population or the assumptions of the H-W law, is to perform genetic screening studies in a large number of individuals from the general healthy population. Then a manual curation of the identified variants has to be performed for each of them, which means that it has to be decided, by means of tools such as literature searches, functional studies and predictors, which of the variants found are pathogenic or probably pathogenic (and, therefore, can lead to a disease). Thus, the observed carrier frequency will be obtained. This method also has its limitations, such as:

  • Variants of uncertain significance. Most people are carriers of changes in their DNA that with the information currently available it is not possible to determine whether they can lead to disease (pathogenic) or not.
  • Group of individuals analyzed. If individuals are selected randomly, it is possible that some populations may be underrepresented. If a disease is particularly prevalent in a population and there are few individuals from that population in the group of individuals analyzed, the observed frequency will be lower than the real frequency. Likewise, if the proportion of individuals from a population in the group is higher than the proportion of individuals from that population in the general population, the observed carrier frequency will be higher than the actual frequency.

In conclusion, each method has its advantages and disadvantages, so there is no method better than the other. However, it is expected that as science and diagnostic methods improve, the limitations of the analysis for observed frequencies can be overcome so that real carrier frequencies in the general population and in specific populations can be more accurately estimated.