Tips and Tactics for Gathering DNA Data
Getting the most out of the DNAGedcom Client starts with a smart gathering strategy. These tips will help you save time, avoid frustration, and produce better analysis results.
Start with Larger Segments
When gathering for the first time, begin with a minimum cM threshold of 30 cM or higher. Matches at this level are almost always genuine relatives within a few generations. Gathering at higher thresholds is significantly faster because there are far fewer matches to process, and the data you collect will be the most reliable and actionable. Once you have analyzed your higher-cM matches, you can always lower the threshold and gather again to expand your dataset.
Match Your Gather to Your Goals
The right cM range depends on what you are trying to accomplish:
- Adoptee search or close family discovery: Focus on higher cM ranges (50+ cM). These matches represent closer relationships and are the most useful for identifying immediate family connections.
- Multi-generational genealogy research: Work your way down to 20 cM over time. Matches in the 20–50 cM range often correspond to 4th–6th cousin relationships, which can help you extend your family tree further back.
- Endogamy (populations with intermarriage): Be cautious with lower cM ranges. Endogamous populations produce many small-segment matches that may not indicate recent common ancestry. Clustering results can become noisy at lower thresholds in these cases.
Gather in Stages
Rather than trying to gather everything at once, work in stages:
- Start at 30+ cM. Gather matches, ICW, and trees at this level. Run your clustering tools and review the results.
- Lower to 20 cM. Once you understand your high-confidence clusters, expand your dataset. New matches will fill in gaps and may connect clusters together.
- Go lower if needed. For specific research questions, you can gather at even lower thresholds, but be aware that gather times increase substantially and the signal-to-noise ratio decreases.
This incremental approach lets you build understanding at each stage before expanding, making your analysis more manageable and your results easier to interpret.
Get Matches Into the Database First
Matches are the foundation for every other data type — ICW relationships, family trees, and chromosome segments all attach to specific matches, so a populated match list has to exist before those data types can be processed. You have two equally valid ways to get there:
- One combined gather. Turn matches plus whatever else you want (ICW, trees, chromosomes) on at the same time and click Gather once. The Client gathers matches first internally, then moves on to the linked data types. This is the fastest total wall-clock approach.
- Matches-only first, then a follow-up gather. Run a matches-only gather so your full match list lands in the database within minutes, then start a second gather with the other data types turned on. Useful when you want to begin analyzing while the longer phases finish.
The one thing to avoid is running an ICW-only, trees-only, or chromosome-only gather as the very first gather of a new kit. There are no matches in the database yet, so there is nothing for those linked records to attach to. Once you have any match data in place, you can re-run individual data types in any order.
The internal order the Client uses, when multiple data types are turned on, is:
- Matches
- In-Common-With (ICW)
- Trees
- Chromosome segments
Understanding Gather Time
Gather time varies significantly depending on the cM range and DNA service:
- Above 30 cM: Most kits complete in minutes. A typical gather at this level might take 5–15 minutes depending on the number of matches.
- 20–30 cM: Expect 30 minutes to a few hours, depending on the size of the match list and which data types you are gathering.
- Below 20 cM: Large kits can have tens of thousands of matches. Gathering 40,000+ matches with ICW data at low cM thresholds can take several hours.
- MyHeritage is typically the slowest service due to rate limiting. Be patient and allow extra time when gathering from MyHeritage.
You can leave the application running in the background while it gathers. Progress indicators will show you how far along the process is.
Preserve Your Data
DNA testing companies can and do change their data, remove features, or alter how matches are calculated. By gathering your data regularly with DNAGedcom, you create a local backup that you fully control. This is especially important because:
- Companies may remove matches below certain thresholds without notice.
- Algorithm changes can alter match lists and shared cM values.
- Service shutdowns or account changes could result in permanent data loss.
- Having historical data lets you track changes over time and compare against previous results.
We recommend gathering at least once a month for your primary kits, and backing up your database folder regularly.
Working with Multiple Kits
If you manage DNA kits for multiple family members, each kit's data is stored separately within the database. This keeps the data clean and lets you run analysis tools on each kit independently.
To get the most from multiple kits:
- Gather all kits at the same cM threshold so your data is comparable across kits.
- Use Kits in the People section to find matches that appear across multiple kits. A match shared by two or more family members can help narrow down which branch of the family they connect to.
- Run clustering on each kit separately to see how clusters compare. Clusters that appear in both a parent and child kit point to the parent's side of the family.
One Database File or Several?
For the great majority of users, one combined database is the right default — cross-kit People searches, Common Ancestors, and surname/ancestor lookups only span data that lives in the same database, so splitting kits up forfeits much of the analytical value of running the Client across a family.
A separate database file is worth considering when:
- Unrelated research projects — e.g., your own family lines in one file, an unrelated client or unknown-parentage case in another. Keeps results from cross-contaminating searches.
- A* filtered subsets — if you want to gather a specific A* filter (one tag, one group, etc.) and analyze that subset on its own. The Client doesn't currently track which A* filter was active when each match was gathered, so isolating the filtered set in its own database is the cleanest way today.
- Sandboxes / experiments — trying out new gather settings or test data without polluting your main file.
- Performance — very large multi-kit databases (tens of millions of rows) can slow some operations. Splitting can help in extreme cases.
If you do run more than one database, remember the launch behavior: when you close the Client, the path to the database that was active in the last-closed session is what reopens next time. Set the database explicitly each launch, or get into the habit of closing your “main” session last, to avoid surprises. See the FAQ entries on running multiple Client sessions and one-DB-vs-many trade-offs for the full discussion.