After testing a few desktop genealogy applications for how well they handled GEDCOM files, it occurred to me that I should write an article about my testing philosophy and methodology, as well as some of the problems with complying with the GEDCOM standard. Take a moment to ponder that: this series was motivated by the problem of genealogy software like Family Tree Maker (FTM) not complying with the GEDCOM standard. But there may be unintended consequences, both for users and software publishers, of following the standard.
Testing Philosophy
First, my philosophy (and a little background). I was a personnel officer in the US Air Force for 20 years. The military has regulations for just about everything, from how to wear a uniform to how to service a Pratt & Whitney F135 engine. Along with the regulations come checklists to help ensure the regulations are complied with. Checklists are used at every level from the workshop to the Inspector General. The military cultivates a culture of compliance (or at least attempts to), so when I decided to test genealogy apps, I looked for a checklist. As a user and usability/compliance tester, I have to test all the apps the same way, or it wouldn’t be fair to the apps or useful to other users.
The checklist I chose was the GEDCOM standard, and the 5.5.1 version in particular. But 5.5.1 is just a draft, you say, and 5.5 is the most current standard (dating to 1995); why 5.5.1? Well, because it’s actually the current standard, whether app developers admit it or not. Tamura Jones has made this point many times*. If an app supports the UTF-8 character set* or includes fields for email, fax, or web address, then it at least partially supports 5.5.1. And how many apps do you know export media as embedded binary objects, which was one of two options in 5.5? There are other changes included in 5.5.1 as well, like the addition of lat and long coordinates, so if apps include them, they support 5.5.1, even if their GEDCOMs are labeled as 5.5.
My purpose when testing apps, therefore, is to determine how well they comply with the GEDCOM 5.5.1 standard, and I merely report how well they comply. I can’t say they comply with something if they don’t, simply because they’re using a deviant structure that a majority of apps use. But I acknowledge that complying with the standard is fraught with peril, both for app users and developers. It’s perilous because so many apps don’t follow the standard in various ways. For users who try to clean up their GEDCOM file, as I advised in Parts 1 and 2 of my “Replacing Family Tree Maker Series,” it’s perilous because other apps might ignore or mangle some of their data, even when they are correctly structured. For app developers it’s perilous because so many other apps are doing certain things wrong that if they start doing it right, then other apps or websites might not be able to read their users’ data.
Let’s use the address field as an example. The standard (both 5.5 and 5.5.1) states, “The ADDR and CONT lines are required for any address,” (5.5.1, p. 31). (As a side note, the standard contradicts itself by stating that CONT is required but specifying that there many be 0 to 3 instances of it). Furthermore, the standard states that if city, state, postal code, or country fields are used, they must be subordinate to the address line, and email, fax number, phone number, and web address, if used, must also be part of the address structure. Finally, the standard states that the address structure, when part of an individual record (not the header, a repository, etc.), must be part of an event detail (either individual or family). As I’ve found in my tests so far, not all apps follow these specs. Many apps, including Ancestral Quest, Legacy Family Tree, The Master Genealogist, Personal Ancestral File (PAF), and RootsMagic (RM), attach the address structure directly to the individual rather than an event (such as residence or census). One developer, Bruce Buzbee of RM, explained they do it this way “for the common good:”
“Some of the ‘illegal’ GEDCOM you might see in RM (and other programs) is due to the fact that ‘every other program does it that way’. While we have tried to follow the GEDCOM spec as closely as possible, sometimes it is necessary to tweak the import and export a bit in order to support better transfer between programs. We have an enormous amount of conditional code in our import to support all the different programs (including many that don’t even exist anymore like UFT and Generations)” (RootsMagic Support Ticket #55958: GEDCOM Import Issues).
While this may be true, I doubt if most apps even use the same address structure. For example, FTM uses the address line plus place structure, which isn’t allowed, instead of the separate address element fields. At least some apps use the structure in the spec, including Brother’s Keeper 7, Family Historian 6, and GEDitCOM II.
So app developers have a predicament: do they follow the standard and thus risk losing data when imported by apps that don’t follow the standard, or do they try to follow what a majority of other developers do and thus risk losing data when imported by apps that do follow the standard? To me, the answer seems simple: follow the standard, but then compliance is second nature to me. If everyone followed the spec, we wouldn’t have the problem of developers doing things different ways, or creating custom code to accommodate different species of GEDCOM. If there’s a consensus in the genealogy community that something needs to change, then change the standard. But unfortunately, now we have the problem of GEDCOM being frozen in time, while projects like BetterGEDCOM and FHISO go nowhere. In the meantime, users are stuck with the problem of exchanging data between systems that aren’t fully compatible, resulting in loss of data.
At least one app developer, RootsMagic (RM), has decided to try to import data coming from many different apps (even legacy ones). While I think it’s commendable, in order to prevent loss of user’s data, it almost helps perpetuate the problem. When it comes to custom fields or even bad GEDCOM grammar from other apps, I think there are four approaches an app could take, in order of preference:
- Ignore the custom or bad data without informing the user. This approach should not be taken, but some apps do.
- Notify users in an import log which fields are unrecognized by the app and won’t be imported. This is the bare minimum.
- Import the data as they are with their custom tags (like _MDCL) intact and leave it up to the user to modify the field names and descriptions. This may not be possible with some ungrammatical structures, but I would think it is if the only problem is a missing underscore on a custom tag. This is the next best option for users, but it can perpetuate the problem of not following the GEDCOM standard.
- Import the data and translate the custom tags into their clear text field names, if known. This is the ideal, for users, but it can also perpetuate the problem. RootsMagic has tried to take this course, and we can see this in the recent updates to their app to accommodate FTM GEDCOMs. (Update 28 Apr 2016: Now that RootsMagic can read FTM files directly, it can at least import all FTM data, but the problem with other GEDCOMs still stands.)
However, it seems to me that if apps can import bad grammar, they ought to be able to import good grammar, too. As for how they export data like addresses, they’re back in the predicament I described above. But it also seems to me that if all developers used good grammar for structures that are defined, they wouldn’t have to make that decision. And then they wouldn’t have to include so much custom code. There should be very little need for custom code, at least for facts and events; the EVEN and FACT structures are elegant solutions for user-defined fields. I don’t even care if no one uses FACT; every app ought to be able to read EVEN.TYPE. I’m pleased that RM translates FTM’s custom tags using that structure—that’s one way of breaking out of the vicious circle caused by bad grammar.
I’m willing to make one exception of bad grammar in my testing. I’m not LDS, so I didn’t know that FamilySearch expects GEDCOMs to use the WAC tag for the LDS Initiatory field. It amazes me that the LDS Church, which owns FamilySearch, PAF, and GEDCOM, would include WAC in PAF but not GEDCOM, or that FTM used _INIT instead.
In summary, my philosophy is that GEDCOM files should comply with the GEDCOM 5.5.1 standard. This series used to be called “Replacing Family Tree Maker,” back before Software MacKiev bought and resuscitated it, but I changed it to “Family Tree Software Alternatives” because it applies to any app a user might want to switch from. The first two articles in the series were about how to make an FTM GEDCOM more standards compliant so that a greater portion of the data would be imported by other apps. But users need to be aware that a standards-compliant file has risks of its own when imported into non-standards-compliant apps or websites. For at least 16 major apps, however, users can see which parts of their GEDCOM might be at risk using my GEDCOM crosswalk table.
Testing Methodology
- I created a 99.9% standards-compliant GEDCOM file using Family Tree Maker 3 (for Mac), TextEdit, and GedPad Pro. I followed the steps in Part 1 and Part 2 of “Family Tree Software Alternatives.”
- Test GEDCOM file: contains 7 people, 3 marriages (including 1 to a non-existent spouse), 1 adoption, 2 media files, and 2 sources (including one using an FTM template) with two citations each. It contains every available field in FTM, including at least 1 of every kind of note field. I deliberately included two ungrammatical elements to test how other apps handled them: an ALIA tag used for the Also Known As field, and an illegal description on a birth field that also included a date and place (only the letter Y is permitted in this case). The test GEDCOM file is available for inspection upon request (use the comment section below).
- I tested my GEDCOM file using the GigaTrees VGedX GEDCOM Validator, which my independent tests have shown to correctly parse GEDCOM files. (Update 28 Apr 2016: VGedX has been discontinued, so I am now using GED-Inline and the Chronoplex validator. Here are the results of the test:
GEDCOM File Test GEDCOM Import Master.ged GEDCOM Version 5.5.1 GEDCOM Encoding Non-ANSI Product Vendor Ancestry.com Product ID FTM Product Name Family Tree Maker for Mac OS X Product Version 22.2.5 ID reference substitution 143 INDI.ALIA Maximum data length exceeded 250 1 BIRT Probably in Szepes County Maximum data length exceeded 470 1 FILE ~/Documents/Family Tree Maker/Test GEDCOM E... Maximum data length exceeded 475 1 FILE ~/Documents/Family Tree Maker/Test GEDCOM E...
- I imported the test GEDCOM file into the app I was testing. I used the full version of the app when possible, but failing that used the least-disabled version I could obtain. The version tested is listed before the “Pros” section of the report. In cases where I did not use the full version, app developers may provide me with a license for the full version if they wish.
- Test machines: 2008 MacBook Pro with 6 GB RAM and 1 TB SSD running OS X 10.10 (Yosemite) and 2012 MacBook Pro with 16 GB RAM and 1 TB SSD running OS X 10.11 (El Capitan). Windows apps were tested using either the edition bundled with CrossOver, or I bundled them with the Wine engine myself using WineBottler, or failing that, ran them in Windows 8.1 using Parallels 9 for Mac.
- After importing the test GEDCOM, I checked for an import log to see what errors the app identified, usually either custom fields or bad grammar, but sometimes it contained false positives, i.e., errors that were not really errors.
- Before making any changes to the file within the app, other than to add submitter information, I immediately exported the file to a new GEDCOM and compared it to the original GEDCOM using the app DiffMerge (thanks to Tim Forsythe of GigaTrees for this suggestion). This helped me identify errors the app’s import log missed and zero in on problems with the app’s handling of GEDCOM.
- I went through every piece of data in the app’s family tree file, using my original GEDCOM as a checklist, to determine how the data were imported and if any were missing.
- I updated the GEDCOM crosswalk with the names of the fields used by the app.
- I used the app’s GEDCOM file, if available, to update the GEDCOM crosswalk with the tags it used.
- I noted my findings as I went and later expanded them into the test report. I also noted a few other pros and cons having to do with usability, etc., but these reports focus on GEDCOM compliance. I will test and report for usability after the GEDCOM testing is complete.
Disclosures
I have no conflicts of interest to report. I receive no compensation from any app developer or website, to include GenealogyTools.com, or Mint Yogi LLC, other than a complimentary membership to GenealogyTools.com.
As usual, comments or questions are welcome in the comments. Although I may not respond to all of them, since I’m trying to finish the initial tests, I read every one.
*Updates:
Thanks to Louis Kessler for reminding me about UTF-8.
2 Jan 2016: Added a credit to Tamura Jones. Apologies to Tamura for the oversight of not including it in the first place. Tamura has also pointed out the problems with the address structure. Also added a quotation by Bruce Buzbee and a disclosure statement.
28 Apr 2016: Made changes based on Family Tree Maker’s continued existence and RootsMagic’s ability to directly open FTM files. Also added a note about the discontinuation of VGedX.
Arlene Miles says
My ancestry.com tree data in GEDcom transferred fine, however, all the photos attached to individuals did not transfer.
Keith says
That’s correct; the only way to get the photos in a tree from Ancestry.com is to download it using Family Tree Maker. This is a known limitation of Ancestry GEDCOMs; I’ve complained to them about it.
Teresa says
I am finding the discussion about GEDCOM very interesting and educational – thanks Keith! I’m a Reunion user (although I did think about FTM for a while….glad I didn’t go there now!), so don’t have to worry at the moment about the GEDCOM compatibility issue, but looking at the crosswalk table got me wondering…..
I don’t deeply understand the way importing GEDCOM works, but I was wondering what happens when you create an event type in an App (in my case Reunion), and give it a GEDCOM code? For example, in the crosswalk table it says Reunion is missing the Probate event. As I needed this event, I created it in Reunion, and gave it the GEDCOM tag PROB. Does this mean that if I create a GEDCOM file for export in to another App, that it will be imported correctly, seeing as the event has the “correct” GEDCOM tag (assuming of course that the receiving App can accept it)?
Thanks for any clarity you can provide, Keith.
Keith says
Teresa, if you ever try to export your Reunion file to GEDCOM, you will have issues, unless you correct them ahead of time. Yes, PROB is the standard GEDCOM tag for Probate, so assuming Reunion exports it correctly, you should be fine. For user-defined events or facts that do not have standard tags, the best course of action is to use Reunion’s Misc. Event and then specify your own type. These should also be exported correctly, and all other apps and websites should be able to import them.
Teresa says
Thanks Keith – makes sense! I’ve created a couple of other event types that Reunion did not have, and each time have consulted the GEDCOM standard and used the “correct” tag. So hopefully if I ever need to export, I’ll be fine :-).
Really appreciate and enjoy your articles!
lkessler says
Test comment
Keith says
Have any other readers experienced troubling leaving a comment on this blog? If so, please describe the problem in detail.
Mike says
Just came across your blog and glad for it. I appreciate the detail you’ve gone into. I’ll be back!
Matt Petersen says
Having been frustrated/confused by multiple programs, Windows and Mac, by the inconsistencies in their use of GEDCOMs, I am grateful for what you’re doing in this series. A bit amazed, too; it is a daunting undertaking.
One of my primary goals in the use of genealogical software is to be able to provide the data to other family members interested in working further with it. This means being able to give it to them in a format that can be loaded into whatever program they want to use, not in form of reports or output for web browsers to read. I was conned into believing for a long time that the reliable way to do this was through GEDCOM files. But each program seems to have its own inconsistencies in reading and/or writing those files, and most programs internally keep track of data that cannot easily, if at all, be included in a GEDCOM.
For the past few years, I have abandoned the “whatever program they want to use” part of my goal by going with GRAMPS because it is: 1) free; 2) works on Windows, Mac and Linux; 3) extremely powerful in its ability to label, filter, privatize and otherwise mince and dice data, 4) open source and 5) has a very active development community that openly shows what issues are being addressed and allows any user to participate to one degree or another in that process. I do use other programs on occasion for specific purposes, but my main data repository is in GRAMPS. That doesn’t mean that GRAMPS handles GEDCOM-related things better than the other programs. But in my opinion, GEDCOM is a “standard” in name only and has become a “lowest common denominator” instead.
Anyway, thanks so much for your series and especially for explaining exactly how you’re going about your assessments. I’m really looking forward to reading about how FTM GEDCOM output can/should be handled by GRAMPS users.
Keith says
I look forward to trying out Gramps again, too!
Jim Stuckey says
Thoughtful, comprehensive, excellent, the more I understand the harder it gets. Thanks so much for your dedication.
Gal inAZ says
Another comment on standards, I worked on them for 20+ years in the financial industry. You have to have an enforcement body that can impose meaningful sanctions and a fast change process that responds to market needs. Without those, the most successful implementers were the companies that made a business of doing conversions from one flavor of the standard to the next.
Mike Tate says
The comments about DRAFT GEDCOM 5.5.1 inexactitudes are not surprising as it was under review and never formally published. Its first page says: “This document may be copied for purposes of review only. It must not be used for programming of genealogical software while in draft.” The software developers who have ignored that only have themselves to blame, and that includes PAF. The DRAFT has even more loopholes than GEDCOM 5.5 so is impossible to adhere to consistently. That is a pity because it does propose some useful enhancements that different products implement to varying degrees.
GEDCOM Pros: I believe the general record and field structures have influenced most genealogy products without which data interchange would be exceedingly more difficult.
GEDCOM Cons: No mention is made of NEW_TAG user-defined tags with a leading underscore, that every genealogy product employs, but makes its GEDCOM dialect unlikely to be imported satisfactorily by any other product without loss of data.
Keith says
What’s the subject of your sentence, “No mention is made…”? It’s passive voice, so I can’t tell if the subject is the GEDCOM standard or me. The GEDCOM 5.5.1 standard discusses user-defined tags on pp. 17, 56, 74 and 83.
Paul R Culley says
I’ve been trying to convert from FTM2014 (Win) to Gramps; I’ve been warned by your blogs (Thanks) about many possible pitfalls and am trying to deal with them specifically in the Gramps import. I’ve even been contemplating “tweeking” the gramps gedcom import code to make this easier so editing of gedcoms is not necessary. If I do change the Gramps code, I expect to submit the changes for potential inclusion into Gramps.
But my database almost certainly doesn’t contain all of the possible problems. Would you be willing to send me your test files, preferrably the FTM files before gedcom export and edit (as well as your final gedcoms)?
Keith says
Paul, I’ll email you my test files. Please let us know your experience with Gramps in the comments at Part 8: Importing Your FTM Tree into Gramps 4.
dianemahoney says
Hi, Keith! I have been resolving “places” in FTM, and expect to be mostly done in a few weeks. I am anxious to upload the free MacKiev upgrade for FTM (left a message on another article blog) so that I do not have to worry about trying to rush this process to transfer to Legacy. I am a little concerned that the GEDCOM on 2014 FTM may mean a loss of data. Do you know if the free update has a better GEDCOM than current 2014 FTM?
Keith says
I have not seen the FTM 2014 upgrade yet; I doubt if anyone will see it until they push it out around March 1st. Developers usually do beta testing only for major changes to a program. But I doubt if there will be any improvements to GEDCOM, even though I provided feedback to Ancestry back in Dec.
Have you seen Parts 1 and 2 of my series on Replacing Family Tree Maker about scrubbing your data and exporting your FTM file to GEDCOM? Those should help you maximize the amount of data that’s exported cleanly. Also see my review of how Legacy handles GEDCOM.
I really don’t think there is any rush to get away from FTM and transfer to a different application. My advice is to take your time and clean up your FTM tree first. Also try out several different applications to make sure you like one before you buy it.
Tim Forsythe says
Keith, I brought Gigatrees back (sort of). It has been converted from a web service to a downloadable app. It still supports VGedX. Thanks for recommending it BTW.
Keith Riggle says
Tim, that’s great news!
Matt Harrah says
As someone who is actively writing and maintaining a program for parsing and writing GEDCOM data you are spot on here. Compliance with the spec is really spotty out there and trying to write something that is forgiving of input but strict on output is a lot of work. Thanks for the article.