Although all clinical MS-subtypes were included, most had a relapsing-remitting RR onset. All participating centers used identical inclusion and diagnostic criteria. Control subjects were matched with cases by age and gender. The Committee on Human Research at each of the participating centers approved the protocol and informed consent was obtained from each study participant.
The genotyping and quality control methods utilized for the analysis of this cohort have been previously described in detail . This analysis resulted in genotype information about , SNPs in cases and controls. Although several of the MS-related genetic loci were screened preliminarily including an intergenic region , two loci were selected for detailed analysis, both of which had been previously associated with MS in a large GWAS .
DRB1 encodes a protein a Class II molecule of the major histocompatibility complex , which binds foreign peptides derived from extracellular proteins for presentation to thymic derived lymphocytes T-cells. It is expressed on the surface of antigen presenting cells such as dendritic cells, bone marrow derived lymphocytes B-cells and macrophages. As it might potentially relate to MS risk, however, the function of this protein is not well defined. To simplify program development, only eleven SNPs were used from each genomic region.
The choice of eleven SNPs was arbitrary. Preliminary exploration demonstrated that the method could define haplotypes using anywhere between 3 and 24 SNPs although, in theory, there is no upper limit to the method as long as there are a sufficient number of homozygotes and single-site heterozygotes in the population.
- SNP genotyping - Wikipedia.
- Single Nucleotide Polymorphisms - Methods and Protocols | Pui-Yan Kwok | Springer?
- Theories of Coalition Formation (Basic Studies in Human Behavior Series).
- Cliff Notes on The Chosen?
- The Role of Prescriptivism in American Linguistics 1820-1970?
- Dangerous Encounters--Avoiding Perilous Situations with Autism.
Nevertheless, for the purpose of this study, the eleven SNPs were chosen because they flanked both the most significantly associated SNP and the putative gene of interest Figure 1. Each SNP was labeled sequentially from n1 to n11 based on its chromosomal location Table 1 ; Figure 1.
The eleven SNPs, which were analyzed at the DRB1 locus, did not include the four tagging SNPs rs, rs, rs, and rs identified previously  because these particular SNPs were not available in this dataset. The DRB1 cluster spans a length of For each SNP-cluster, the SNP-information for each individual was converted into an ordered set the subject-vector of eleven ternary numbers 0, 1, or 2 based on whether they had zero, one or two copies of a particular SNP-variant for each sequential SNP in the cluster these SNP-variants were designated according to the number of copies of the minor allele - in the control population - at each location.
For example, the 5 th subject in the database had the DRB1 subject-vector of , which indicated that he possessed 2 copies of n1 , 0 copies of n2 , 0 copies of n3 , and so forth. For the purpose of the present analysis, SNP-strings were vectors, defined as those specific ordered sets of eleven binary numbers 0 or 1 , representing the two haplotypes over the entire cluster span , which combined added to produce each observed subject-vector.
The first was to identify all subject-vectors, which consisted entirely of zeros 0s and twos 2s. These individuals must be homozygous for the same SNP-string. For example, the 6 th subject in the database had a DRB1 subject-vector of , which indicated that she possessed two copies of the SNP-string. The second method was to identify all individuals who were single SNP-heterozygotes i.
Single Nucleotide Polymorphisms: Methods and Protocols - Google книги
These individuals must have identical SNP-strings except for the single location where one SNP-string had a 0 and the other had a 1. For example, the th person in the database had a DRB1 subject-vector of , which could only arise from the combination of the SNP-strings and In this manner, a list of most common SNP-strings in the population was compiled Figure 2. Moreover, the relative frequency of the homozygous representation of each SNP-string in the cases and controls provides an estimate of the underlying SNP-string frequency in each population Figure 3.
Subject-vectors strings of 0s, 1s, and 2s are searched for homozygous and single-site heterozygus individuals A. Following this, the entire subject-vector list is decomposed into the possible combination categories B. Allellic frequencies are recalculated from the unique decompositions and unambiguous combinations and these frequencies used to resolve the conflicted decompositions. Those SNP-strings that never selected i. The a3 bar has been cut off at in order to better illustrate the remainder of the distribution.
For each person, one of three outcomes was possible. Generally, however, this process was complete after 2 or 3 cycles. For the present analysis, whenever the SNP-string identification was conflicted, these conflicts were resolved based on the relative probabilities of the different allelic combinations in the population. These probabilities, in turn, were estimated from the SNP-string frequencies for all non-conflicted identifications Figure 2 ; Table 2. For example, the 7 th person in the database had a DRB1 subject-vector of This could have arisen from the combination of either the a7—a8 or the a13—a20 SNP-strings.
The probability of the latter combination determined from the non-conflicted identifications in the whole population , however, involves two very rare SNP-strings compared to the former Table 2 and, in this case, there is fold difference in likelihood. Consequently, this particular conflict was resolved in favor of the a7—a8 combination. In rare cases, there was little difference in likelihood between possible haplotype pairs although, more commonly, the likelihoods differed by an order of magnitude or more between pairs.
Therefore, for the purposes of the present analysis method, all conflicts were always resolved in favor of the most likely SNP-string combination. We then decomposed the subject-vectors using the entire combined list. Again conflicts were resolved using the product of the estimated frequencies derived from the non-conflicted SNP-string identifications. Again, if a particular SNP-string was never observed in a non-conflicted combination, the estimated frequency was taken half the smallest possible frequency. However, for every SNP-string that was observed in a non-conflicted combination, its estimated frequency was taken as that, which had actually been observed.
Following, this penultimate decomposition, we reassessed those individuals who carried haplotypes that were found in fewer than 9 individuals. The number nine was chosen because, for haplotypes having this expected number of observations or more , there is greater than a The details of this method are described elsewhere  ,  , .
The SHAPEIT-2 method has been shown to be superior to several other methods based on its performance using several large-sample, whole-chromosome data sets from a range of SNP genotyping chips . For the purposes of our analysis, we compared the haplotype predictions using the two phasing methods. The actual haplotype frequencies in the population were estimated in three manners. The first Figure 3 , was to determine the most frequent haplotypes based on the number of homozygotes in the sample population. Because both the case and control populations are at HWE, at least with respect to the DRB1 locus 9 , the different haplotype frequencies can be estimated as the square root of the homozygotic frequencies  , .
This method is independent of which phasing method is used. The second method was to use the haplotype frequencies, estimated from all non-conflicted haplotypes found using the phasing method presented in this study Tables 2 and 3. These frequencies were estimated jointly, combining cases with controls, although they are presented separately in Table 2. All three of these methods provided substantially similar estimates for the relative frequencies of the most common haplotypes Figure 3 ; Tables 2 and 3.
We analyzed discrepancies between SHAPEIT-2 and our phasing algorithm by using these predicted allele frequencies to estimate the likelihood of individual phasing predictions defined as the product of frequencies of the two phased haplotypes identified by each algorithm. Subjects were then grouped into homozygous carriers, heterozygous carriers and non-carriers of a particular SNP-string and tested for differences in the SNP-string distribution between patients and controls.
ORs were calculated separately from homozygous and heterozygous frequencies with non-carrier frequencies as reference and the significance of any distribution shift assessed by a Chi Square test with 1 degree of freedom. Because the European and American data were acquired independently and in different geographic regions but were otherwise similar, these two data sets were used separately to assess the replicability of any findings. SNP-strings a9 and a13 were homozygous in 1 individual each. Of the participants in this study, 54 had missing data in the DRB1 region and, therefore, their subject-vectors could not be constructed.
Of the conflicted identifications, however, only selected SNP-strings were involved in the conflict. Thus, for example, the a2 SNP-string was involved in only of the conflicts. In addition, this set was adequate to explain completely all but 20 The OR of disease for having one-copy of this allele was 3. This is in keeping with the previously reported observation that both the control populations and the MS populations are in HWE with respect to the allele .
Thus, the weighting scheme for homozygous non-carriers, heterozygous carriers, and homozygous carriers of this allele, at least for the US population, is geometric 1, w, w 2 , as it must be for the cases to be in HWE  , .
Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput.
This type of weighting has also been referred to as co-dominant or allelic . All other correlations were negative Table 6. The other positive correlations ranged from 0. Nevertheless, this is an illusion. When the a2 SNP-string is removed from the analysis, the apparent protective effect vanishes. Thus, the protective effect of these SNP-strings lies in the fact that carriers are less likely to also carry the a2 SNP-string. In addition, one individual an a2—a21 heterozygote was homozygous for the allele, which implies that the a21 SNP-string can also carry this allele.
It is, therefore, hard to rationalize short of invoking some double crossover event or exon-exchange whereby this linkage would be possible. SNP-strings a6 , a13 , and a15 were homozygous in 1 individual each. Again, only selected SNP-strings were involved in each conflict. For example, the a4 SNP-string was involved in only of the conflicts. In addition, this set was adequate to explain completely all but 34 As was the case for the DRB1 locus, the MMEL1 locus also seemed to consist of related families of SNP-strings although, in this region, the families are more interconnected and, thus, the distinction of one family from another was less clear-cut Table 5.
Family 4 only had two members, which differed from each other only at SNP n Moreover, this was replicated in both independent subpopulations with the OR in Europe being 2. By contrast, the OR for possessing 1 copy of this allele was 0. Presumably, therefore, this susceptibility allele acts in an autosomal recessive manner. When the two methods predicted different haplotype combinations to explain a particular subject-vector, in general, the combinations chosen by SHAPEIT-2 had substantially lower likelihoods compared to those predicted by the SNP-string method.
By contrast, when the length of the SNP data included in the SHAPEIT-2 analysis was increased ten fold to a span of 2, kb surrounding the SNP sequences used in the above analyses the results of the 2 methods were almost identical. Thus, the two methods were concordant in In most of these case there is no way to determine which method is more accurate. In the DRB1 region, for two such individuals, however, there was additional information. Consequently, provided these individuals were typed correctly, and assuming that DRB1 status is a good surrogate for a2 status, each method was correct only once.
For the remaining discrepancies in the DRB1 region, the two methods chose combinations from the same subset of SNP-strings so that probabilistic comparisons were possible. In these circumstances, the ratios of the two probabilities ranged from 1. Nevertheless, such high agreement between the two methods was not uniform throughout the 2, kb span of DNA but, rather, was characterized by peaks and valleys of agreement alternating throughout the span. By contrast, when the SNP sequence was begun at SNP-position rs the agreement between the two methods fell to Because the SNP-string method is based upon only the local tuple subject-vectors, the outcome of this method depends only upon the known identity of the subject-vectors in the population.
By contrast, because SHAPEIT-2 seemed to perform poorly when the input was limited to the tuple subject-vectors of interest, it also seemed possible that SHAPEIT-2 might perform less well depending upon where in the genome the haplotype analysis was begun. As can be seen from the figure, the concordance between the two SHAPEIT-2 analyses is very similar to the concordance between the two haplotype methods, being characterized by peaks and valleys of agreement alternating throughout the span of DNA Figure 4.
The tuple subject-vectors for the principle analysis undertaken here began at SNP-position rs , as indicated by the verticle red line. Currently, many groups throughout the world conduct GWAS studies to identify genomic regions that are associated with complex human diseases  ,  ,  — .
If a SNP or several SNPs in a particular region are associated with the disease then it is presumed that some allele of a nearby gene is responsible for the observed association, thus GWAS are designed to identify genomic regions of association. The difficulty with this approach, however, is that the associations are often weak and require thousands of patients to uncover  , . Moreover, at least for the DRB1 locus, each associated SNP has a much greater allelic frequency compared to the underlying susceptibility allele, which is an example of synthetic association .
Also, unless the SNP is itself produces the associated genetic abnormality, it may difficult to determine which allele is responsible for the association. Rather, these methods search the data for clusters of SNPs, which are jointly associated with the disease and, which, therefore, presumably belong to a particular disease-associated haplotype.
Presumably, many of these potential difficulties could be mitigated if the two haplotypes at a given genetic locus could be identified confidently for each individual. Thus, the SNP-string method mitigates many of the potential problems discussed earlier. Indeed, as anticipated because both populations were largely of northern European origin, the frequency distribution of the different SNP-strings was almost identical in the two groups Table 3.
Consequently, there is little doubt that these SNP-strings so identified represent the actual haplotypes, which cover the entire kb segment and, consequently, this method should facilitate comparisons regarding the genetic make-up different human populations in specific genetic regions. Although the analysis presented here represents only two loci, the same pattern pertained to every locus screened preliminarily including MS-associated intergenic genomic regions.
Some of these alleles are either very rare or only present in non-European ethnic groups  ,  and might not be present in our sample. Others, however, likely, share the same tuple SNP-string. However, this will also probably decrease the number of unambiguous SNP-string identifications. Each of these changes could be either good or bad, depending on which effects predominate. Unquestionably, therefore, each of these variables will need to be studied systematically in order to determine the optimum number of SNPs and the optimum length of DNA to be included in the analysis.
In addition, each of the SNP-string lists here developed i.
- Contemporary Chinese Rural Reform.
- Get this edition;
- Extrapolation theory with applications, Nummers 438-440.
- Navigation menu.
Therefore, these lists and the estimated haplotype frequencies are independent of the order of data entry. Elahe Elahi for critical reading of this manuscript. References 1. Ronaghi, M. Pyrosequencing 2. Garcia, A. Ahmadian, A. Alderborn, A. Genome Res. Introduction 1. Until recently a major drawback of this method was that it was labor-intensive and without high-throughput instrumentation 4.
It is robust, rapid, inexpensive, and allows accurate measurement of allele frequencies in pools of DNA, facili- tating large-scale gene mapping. The allelic specificity of the PCR amplification is conferred by placing the 3' end of one the primers directly over and matching one or the other of the variant nucleotides see Fig. Ideally, only completely matched primers are extended and only the matching allele is amplified. To delay this amplification as much Germer and Higuchi as possible, we have used the Stoffel fragment of Taq DNA poly- merase 5,8 , which has been shown to enhance discrimination of 3' primer-template mismatches 9.
Recently we have further derived a new variant, CEA2 11 , of the Stoffel fragment polymerase. This has helped us develop hundreds of genotyping Fig. Allele-specific PCR. B A sample to be genotyped is divided among two PCRs. One PCR contains one of the two allele-specific primers and the other contains the other allele- specific primer. Both contain the common primer. DNA amplification between an allele-specific primer and the common primer will not occur or be greatly delayed if the primer is mismatched to the template.
Allele-specific PCR monitored in real-time.