This sequencing data provides the. This rearrangement process is specific to lymphocytes and results in breaking and subsequent joining of portions of DNA7. The process introduces diversity into the gene segment, allowing for identification of new antigens and production of a diverse set of antibodies throughout the body7.

The V D J recombination process specifically introduces diversity into the complementarity determining region 3 CDR3 loop8. Therefore, the more recombination introduced diversity in this region, the greater gain in the variety of antigens that can be recognized by antibodies8. The advent in high-throughput DNA sequencing methods has resulted in large amounts of data that provide the opportunity to understand the clonal expansion that generates unique antibodies4.

The vast number of parameters available to classify clonal lineages leaves a lot of room for a variety of combination of parameters. Many experimental methodologies have been investigated to analyze the heavy and light chains of an antibody, including techniques such as hybridoma panels, antibody phage display, single cell cloning, and bulk sequencing of heavy and light chains3. We have tested an easy to follow sequence of steps that identifies clone lineages of DNA sequencing data from a population.

This stepwise methodology involves identifying sequences with common germline V and J genes and similarity of length and sequences for the CDR3 region. Understanding clonal evolution of B cells is vitally important to understanding how autoantibodies develop in autoimmune disorders4,9.

Methods for analyzing immunoglobulin genes and their common ancestor have been of interest for many years, and the majority of analysis methods involve fitting a lineage tree to the observed distances in the data A paper by Felsenstein describes two interpretations of the branch lengths The first is the path length interpretation, which requires calculating observed distance from ancestors to the tips of a lineage tree A branch length represents the evolutionary distance from ancestor to descendant and can be determined if a set of imaginary ancestors fitting the observed distance are fit, to allow for lineage interpretations The second interpretation uses a statistical framework, where the assumption is that the distances are independently drawn from a distribution Here, the branch length is a parameter in a statistical model, estimated using the observed distances This interpretation would then focus on choosing branch lengths to minimize the sum of squares, as shown in Equation Given the development of newer techniques that make it possible to analyze antibodies from individual B cells, statistical methods have been refined over time to study clonal lineages.

Kepler presents a method to understand the uncertainties present in reconstructing lineages, Kepler analyzed heavy and light chain data from individual B cells to present a model containing two features: 1 using all the information available in a set of clonally related immunoglobulin genes and 2 having systematic uncertainty estimates on the non-mutated ancestors Next, a likelihood function is used to describe the probability that the query set arose from a given ancestor by somatic hypermutation Additionally, understanding where the uncertainty in assumptions comes from in this method allows for finding the non-mutated ancestor with greater precision Various distance metrics and linkage methods that can be used to group clonally related sequences were compared in a recent study The study focused on parameters used in.

Hierarchical clustering uses these parameters to define cluster and produce a tree relationship between the input sequences This study found that single linkage hierarchical clustering and length normalizing nucleotide Hamming distance are an optimal combination of parameters to group immunoglobulin genes by clonal relatedness Additionally, this study automated choosing the threshold level for clonal distance The analysis framework we tested utilized a similar series of steps outlined in this paper, including starting the analysis by looking for the same V, J genes, and junction length.

A detailed step by step instructions guide for going from the sequencing data to a file containing the sequences clustered by clones and with information to create isotype lineages and identify mutations is presented in Appendix A. A brief overview of each package, its usage, and its purpose is presented here, and a snapshot of each corresponding output file can be found in Appendix B.

This tool compares the input sequence with known germline V and J genes for the species of interest, giving an output summary of matching V, D, and J genes A snapshot of the output file is shown in Figure 4, which contains, for each input sequence, the best matched V D J germline gene as well as summary details regarding the quality of the match for regions of the sequence based on various factors i.

An overall percent identity for each portion of the sequence is also reported The Hamming model was used here, which is the absolute number of differences between two sequences9. Then, distances were normalized by length using the normalization of distances -norm parameter. Finally, a default distance of 0. A sample of this output file is shown in Figure 7, with the newly added columns added to the end of the file.

The information in this file can then be used in a R program found in Appendix C , to first match the sample sequence with a list of known sequences and then create lineage trees. These known sequences corresponded to B cells identified using certain techniques, and such B cells are well documented, since we know their light chain sequences.

Similarity here means that the BCR sequences would likely be in the same cluster. We then conducted lineage analysis on these sequences identified, to generate lineage trees. The lineage trees help understand the timeline for differentiation and can aid in identifying mutations and isotype switching occurring during the development process The R program helps identify the number of sequences contained within each clone, and for each clone, a lineage tree was created.

Figures 2 and 3 below show lineage trees for a clone within JLDO, matching designated sequence 2 and 3, respectively. Figure 2. The lineage tree for Figure 2 contains two isotypes, IgD and IgM, which are present at multiple levels of differentiation.

The multiple isotypes present at a single level can be. The presence of IgG1 isotype, despite the precursor cells only displaying IgD and IgM isotype is evidence for isotype switching. In further differentiation steps of this lineage tree, only IgM is present.

The figure shows that only a few different clones of IgG1 further differentiated. In each step in Figure 3, the IgG1 isotype is present and this particular tree does not show evidence of isotype switching occurring. An important difference to note between these two lineage trees is. This highlights how although these lineage trees are from B cells of the same individual, cells that are clonally related have a similar timeline within the same group.

While only one lineage tree per designated sequence is created here, multiple can be created per designated sequence, depending on the number of matched BCR sequences from study sample to the designated sequence. If such analyses are conducted for multiple individuals and the lineage trees and results are aggregated, then patterns can be understood regarding the order of formation of these isotypes. Additionally, mutations that resulted during this isotype formation can also be discerned.

Finally, this methodology points to an important characteristic of immunology with respect to B cells, called isotype switching. It is the process where a B cell switches from producing one isotype to another9. This analysis technique allows for the lineage trees to be subjected to further analysis to extract patterns to understand what events result in isotype switching and similarities across different types of responses.

Other methods to analyze the heavy and light chains of an antibody that are of particular interest include hybridoma panels, antibody phage display, and single cell cloning3. Hybridoma panels are a technique involving white blood cells fused to myeloma cells and then gene. The antibody phage display technique involves using cell and molecular biology techniques to insert V genes from antibody Heavy and Light chains into phages and propagating to discover patterns of specific antibody and antigen binding3.

Lastly, the single cell cloning method involves extracting and sequencing RNA from single cells to identify V gene sequences from Heavy and Light chains, and to express these antibody genes to test their specificity3. All of these methods are generally low throughput and more time intensive than the method we investigated, bulk. Interestingly, the single cell cloning method does have the potential to be applied with high throughput sequencing methods, and would be a new area of interest to further develop analysis methodologies3.

Introduction to T and B lymphocytes. Autoimmunity: From Bench to Bedside [Internet]. Chapter 5. Molecular Biology of the Cell. New York: Garland Science; B Cells and Antibodies. Available from:. The analysis of clonal expansions in normal and autoimmune B cell repertoires. Clustering-based identification of clonally-related immunoglobulin gene sequence sets.

Introduction to T and B lymphocytes. Autoimmunity: From Bench to Bedside [Internet]. Chapter 5. Molecular Biology of the Cell. New York: Garland Science; B Cells and Antibodies. Available from:. The analysis of clonal expansions in normal and autoimmune B cell repertoires. Clustering-based identification of clonally-related immunoglobulin gene sequence sets. Immunome Research. Stavnezer J, Schrader CE. The Journal of Immunology. From hematopoietic progenitors to B cells: mechanisms of lineage restriction and commitment. Current Opinion in Immunology. CDR3 length in antigen-specific immune receptors. The Journal of Experimental Medicine. Human B-cell isotype switching origins of IgE. Journal of Allergy and Clinical Immunology. Felsenstein J. Kepler TB. Reconstructing a B-cell clonal lineage.

University of Notre Dame Press. University of North Carolina Press. Purdue University Press. Syracuse University Press. Texas Tech University Press. Vanderbuilt University Press. University of Virginia Press. University of the West Indies Press. University of Calgary Press. Cork University Press. University of Manitoba Press. See a snapshot of our services below:. A Twitter List by Longleafserv. Pine, our purpose-built filesystem for high-throughput and data-intensive computing as well as information-processing tasks, includes:.

By default all users have access to personal Mass Storage space and can easily use 1 TB of space here. Mass storage is intended for long-term storage and archiving of files; it is a very slow file system and you should only copy files into and out of mass storage. Jobs running on the cluster can not access the mass storage filesystem. If you keep data, files, etc.

If you are part of a department, group, or lab that needs shared mass storage space you can send an email to research unc. The application environment on Longleaf is presented as modules using lmod. Please refer to the Help document on modules for information and examples of using module commands on Longleaf. Modules are essentially software installations for general use on the cluster. Therefore, you will primarily use module commands to add and remove applications from your Longleaf environment as needed for running jobs.

Applications used by many groups across campus have already been installed and made available on Longleaf. To see the full list of applications currently available run. Users are able to create their own modules. In general there are two ways to submit a job. You can either construct a job submission script or you can use a command-line approach. To learn more about the many different job submission options feel free to read the man pages on the sbatch command:. Save your file and exit nano.

