HelpAutosomal Tools › Collins-Leeds Method

Collins-Leeds Method (CLM)

Overview

The Collins Leeds Method (CLM) Tool uses a grid (often called a matrix) to show your matches arrayed in groups (clusters). Membership in clusters is based on matches sharing other matches in common (ICW) with your selected kit. Cluster members share a common line of descent within a few generations, often even the same ancestors. You may optionally show ICW matches that do not meet the criteria you have set for inclusion in clusters.

A*DNA, FTDNA, 23andMe, MyHeritage, and GEDmatch data are supported. CLM can operate on the database or CSV files output by the Client.

CLM Settings

Setting Purpose
Kit and Match Selection
Kit FilterPrefilter list of kits in the database
DNA KitSelects the target kit for clustering
Match/ICW filesSets the kit data source to previously output CSV files
cM RangeSelects matches based on total cM shared (default: 50–400)
Inclusion ThresholdSelects matches based on criteria for cluster membership
Surname ListFilters matches for matching surnames
Sort Order
MatchesDetermines how matches appear within clusters
ClustersDetermines the order clusters are arrayed
Show
Unclustered matchesAllows other relationships for cluster members to be shown
Painted MidlineCreates a diagonal midline to make match pairs easier to locate
Include ChromosomeShows chromosome segment data in cluster output
Include AncestorsShows ancestor information in cluster output
Open HTMLAutomatically opens the saved HTML output on completion

CLM Setup Details

Kit Filter

Typing a name or part of a name in this field will limit the kits listed in the DNA Kit dropdown to only those kits that have a name containing that input. For example, to find all kits for John Collins, you may type the full name or just “John”. After you select a kit, there are two methods for clustering: DNA Kit vs Match File/ICW File.

DNA Kit versus Output Files Selection

The DNA Kit dropdown lists all the kits in your database previously gathered with the DNAGedcom Client. You can cluster any A*DNA, FTDNA, 23andMe, MyHeritage, or GEDmatch kit in your database. Alternatively, you can cluster a set of Match/ICW CSV output files.

Kits in the database can be distinguished from files:

  • Database Kit: (FTDNA) John Doe
  • Files: FTDNA match/ICW files

A file selection form will appear automatically when you select an available Match/ICW File option.

Using a Match and ICW file instead of database has advantages:

  • You can modify files to eliminate unwanted matches. You only need to remove it from one file or the other. For example, if you have tagged all maternal matches, you could include or exclude them.
  • You can save files before a gather as a “snapshot” for future comparison.

Because the Client overwrites output files in its default folder, if you want to cluster different versions of Match/ICW CSV files for the same kit, you will need to store them in a separate directory.

cM Range

Sets the upper and lower limit for total cM. Default: 50 to 400 cM. Try moving it up or down to target matches in certain relationship ranges. The wider the range the longer clustering takes. Different ranges yield different results, so running at various ranges may offer more insight. Output files include the range values in file names.

Surname List

Filter matches for matching surname(s) by inserting whole or partial surname strings separated by commas.

Inclusion Threshold

Determines how many people within a set of matches a person must match to be included.

Threshold Ratio Description
Easy1/3Match stays with 1/3 ICW. Larger clusters.
Standard1/2Must match at least half. Balanced default.
Strict2/3Must match 2/3. Smaller, tighter clusters.

Default is 1/2. Try 2/3 for tighter clusters.

Match Sort

Order of matches within each cluster:

  • By Inclusion (default) — those matching the most kits in the cluster are in the upper-left corner
  • By cM — those matching the primary kit by highest cM are in the upper-left corner

Cluster Sort

Two types of clusters: full clusters (solid color square) and superclusters (a set of full clusters sharing cross-cluster matches).

Options:

  • By Size — largest to smallest
  • By max cM — ordered by maximum cM match
  • By Size, Superclustered (default) — matches arrayed by kit count, related clusters placed near each other within supercluster boundaries
  • By max cM, Superclustered — matches arrayed by cM, related clusters grouped

Include Unclustered Matches

Default: On. Shows matches that don’t fit any cluster. Helpful because cross-cluster matches are still drawn. Density of unclustered matches can indicate endogamy or pedigree collapse.

Paint Midline

Default: On. Self-matches painted black. Helpful when reordering clusters in Excel.

Include Chromosome

Default: On. Shows chromosome segment data. Click a cluster in HTML to see Chromosome Browser details (excluding A*).

Include Ancestors

Default: On. Shows ancestor information. Small green leaf icons appear where matches share common ancestors.

Open HTML When Done

Default: On. Auto-opens HTML in default browser.

CLM Output

CLM generates two output files: HTML (primary, auto-displayed in browser) and Excel, both saved in the database folder.

Reading the HTML Cluster Report

Viewing Your Clusters

HTML maps are drawn with filled-in squares, each representing a match between two people. Mousing over any colored square shows the kit names.

Three types of color-coded squares (four with diagonal):

  • Solid color squares — match between two members of the same cluster
  • Pale squares with two colors — matches between members of two corresponding clusters (signal superclusters)
  • Pale squares showing grey plus one pale color — match between cluster member and another kit
  • Solid black squares (diagonal) — reference point for locating a kit

Squares with a small green leaf indicate ancestors in common.

Clusters are surrounded by a black line. Membership is based on ICW, indicating shared descent.

Superclusters are surrounded by a dark grey line. They show related clusters sharing the same or closely related lines of descent.

While CLM cluster maps provide valuable clues to shared descent, they are not proof per se of specific ancestors.

Click a cluster (with internet connection) for a popup with cluster members or Chromosome Browser details.

Match Details

Following the horizontal line to the leftmost columns shows:

  • A link to a publicly shared tree (private trees have red line through them)
  • Total shared cM
  • Match’s kit name (clicking opens Match page in new tab)
  • Color code for the match’s cluster

Reading the Excel Cluster Report

The Excel Report has three worksheets: Chart, Data, and Ancestors tabs.

The Chart Tab

Same clusters as HTML but in Excel. Dark black border around clusters, gray border around Superclusters. Non-cluster matches shown as gray boxes.

Important notes:

  • Colors are arbitrary and repeat after 10! Same color doesn’t mean same cluster if they don’t touch.
  • Same line of descent may appear in several clusters.
  • Cross-cluster matches are sometimes as important as clusters themselves. Treat them as clues to cluster relationships.

The Data Tab

List of clustered matches: Cluster number, Name, cM, Match Page link, Tree page link.

The Ancestors Tab

List of specific ancestors appearing in trees of matches associated with clusters.