After testing a few desktop genealogy applications for how well they handled GEDCOM files, it occurred to me that I should write an article about my testing philosophy and methodology, as well as some of the problems with complying with the GEDCOM standard. Take a moment to ponder that: this series was motivated by the problem of genealogy software like Family Tree Maker (FTM) not complying with the GEDCOM standard. But there may be unintended consequences, both for users and software publishers, of following the standard.
First, my philosophy (and a little background). I was a personnel officer in the US Air Force for 20 years. The military has regulations for just about everything, from how to wear a uniform to how to service a Pratt & Whitney F135 engine. Along with the regulations come checklists to help ensure the regulations were complied with. Checklists are used at every level from the workshop to the Inspector General. The military cultivates a culture of compliance (or at least attempts to), so when I decided to test genealogy apps, I looked for a checklist. As a user and usability/compliance tester, I have to test all the apps the same way, or it wouldn’t be fair to the apps or useful to other users.
The checklist I chose was the GEDCOM standard, and the 5.5.1 version in particular. But 5.5.1 is just a draft, you say, and 5.5 is the most current standard (dating to 1995); why 5.5.1? Well, because it’s actually the current standard, whether app developers admit it or not. Tamura Jones has made this point many times*. If an app supports the UTF-8 character set* or includes fields for email, fax, or web address, then it at least partially supports 5.5.1. And how many apps do you know export media as embedded binary objects, which was one of two options in 5.5? There are other changes included in 5.5.1 as well, like the addition of lat and long coordinates, so if apps include them, they support 5.5.1, even if their GEDCOMs are labeled as 5.5.
My purpose when testing apps, therefore, is to determine how well they comply with the GEDCOM 5.5.1 standard, and I merely report how well they comply. I can’t say they comply with something if they don’t, simply because they’re using a deviant structure that a majority of apps use. But I acknowledge that complying with the standard is fraught with peril, both for app users and developers. It’s perilous because so many apps don’t follow the standard in various ways. For users who try to clean up their GEDCOM file, as I advised in Parts 1 and 2 of my “Replacing Family Tree Maker Series,” it’s perilous because other apps might ignore or mangle some of their data, even when they are correctly structured. For app develops it’s perilous because so many other apps are doing certain things wrong that if they start doing it right, then other apps or websites might not be able to read their users’ data.
Let’s use the address field as an example. The standard (both 5.5 and 5.5.1) states, “The ADDR and CONT lines are required for any address,” (5.5.1, p. 31). (As a side note, the standard contradicts itself by stating that CONT is required but specifying that there many be 0 to 3 instances of it). Furthermore, the standard states that if city, state, postal code, or country fields are used, they must be subordinate to the address line, and email, fax number, phone number, and web address, if used, must also be part of the address structure. Finally, the standard states that the address structure, when part of an individual record (not the header, a repository, etc.), must be part of an event detail (either individual or family). As I’ve found in my tests so far, not all apps follow these specs. Many apps, including Ancestral Quest, Legacy Family Tree, The Master Genealogist, Personal Ancestral File (PAF), and RootsMagic (RM), attach the address structure directly to the individual rather than an event (such as residence or census). One developer, Bruce Buzbee of RM, explained they do it this way “for the common good:”
“Some of the ‘illegal’ GEDCOM you might see in RM (and other programs) is due to the fact that ‘every other program does it that way’. While we have tried to follow the GEDCOM spec as closely as possible, sometimes it is necessary to tweak the import and export a bit in order to support better transfer between programs. We have an enormous amount of conditional code in our import to support all the different programs (including many that don’t even exist anymore like UFT and Generations)” (RootsMagic Support Ticket #55958: GEDCOM Import Issues).
While this may be true, I doubt if most apps even use the same address structure. For example, FTM uses the address line plus place structure, which isn’t allowed, instead of the separate address element fields. At least some apps use the structure in the spec, including Brother’s Keeper 7, Family Historian 6, and GEDitCOM II.
So app developers have a predicament: do they follow the standard and thus risk losing data when imported by apps that don’t follow the standard, or do they try to follow what a majority of other developers do and thus risk losing data when imported by apps that do follow the standard? To me, the answer seems simple: follow the standard, but then compliance is second nature to me. If everyone followed the spec, we wouldn’t have the problem of developers doing things different ways, or creating custom code to accommodate different species of GEDCOM. If there’s a consensus in the genealogy community that something needs to change, then change the standard. But unfortunately, now we have the problem of GEDCOM being frozen in time, while projects like BetterGEDCOM and FHISO go nowhere. In the meantime, users are stuck with the problem of exchanging data between systems that aren’t fully compatible, resulting in loss of data.
At least one app developer, RootsMagic (RM), has decided to try to import data coming from many different apps (even legacy ones). While I think it’s commendable, in order to prevent loss of user’s data, it almost helps perpetuate the problem. When it comes to custom fields or even bad GEDCOM grammar from other apps, I think there are four approaches an app could take, in order of preference:
- Ignore the custom or bad data without informing the user. This approach should not be taken, but some apps do.
- Notify users in an import log which fields are unrecognized by the app and won’t be imported. This is the bare minimum.
- Import the data as they are with their custom tags (like _MDCL) intact and leave it up to the user to modify the field names and descriptions. This may not be possible with some ungrammatical structures, but I would think it is if the only problem is a missing underscore on a custom tag. This is the next best option for users, but it can perpetuate the problem of not following the GEDCOM standard.
- Import the data and translate the custom tags into their clear text field names, if known. This is the ideal, for users, but it can also perpetuate the problem. RootsMagic has tried to take this course, and we can see this in the recent updates to their app to accommodate FTM GEDCOMs. (Update 28 Apr 2016: Now that RootsMagic can read FTM files directly, it can at least import all FTM data, but the problem with other GEDCOMs still stands.)
However, it seems to me that if apps can import bad grammar, they ought to be able to import good grammar, too. As for how they export data like addresses, they’re back in the predicament I described above. But it also seems to me that if all developers used good grammar for structures that are defined, they wouldn’t have to make that decision. And then they wouldn’t have to include so much custom code. There should be very little need for custom code, at least for facts and events; the EVEN and FACT structures are elegant solutions for user-defined fields. I don’t even care if no one uses FACT; every app ought to be able to read EVEN.TYPE. I’m pleased that RM translates FTM’s custom tags using that structure—that’s one way of breaking out of the vicious circle caused by bad grammar.
- I created a 99.9% standards-compliant GEDCOM file using Family Tree Maker 3 (for Mac), TextEdit, and GedPad Pro. I followed the steps in Part 1 and Part 2 of “Family Tree Software Alternatives.”
- Test GEDCOM file: contains 7 people, 3 marriages (including 1 to a non-existent spouse), 1 adoption, 2 media files, and 2 sources (including one using an FTM template) with two citations each. It contains every available field in FTM, including at least 1 of every kind of note field. I deliberately included two ungrammatical elements to test how other apps handled them: an ALIA tag used for the Also Known As field, and an illegal description on a birth field that also included a date and place (only the letter Y is permitted in this case). The test GEDCOM file is available for inspection upon request (use the comment section below).
- I tested my GEDCOM file using the GigaTrees VGedX GEDCOM Validator, which my independent tests have shown to correctly parse GEDCOM files. (Update 28 Apr 2016: VGedX has been discontinued, so I am now using GED-Inline and the Chronoplex validator. Here are the results of the test:
GEDCOM File Test GEDCOM Import Master.ged GEDCOM Version 5.5.1 GEDCOM Encoding Non-ANSI Product Vendor Ancestry.com Product ID FTM Product Name Family Tree Maker for Mac OS X Product Version 22.2.5 ID reference substitution 143 INDI.ALIA Maximum data length exceeded 250 1 BIRT Probably in Szepes County Maximum data length exceeded 470 1 FILE ~/Documents/Family Tree Maker/Test GEDCOM E... Maximum data length exceeded 475 1 FILE ~/Documents/Family Tree Maker/Test GEDCOM E...
- I imported the test GEDCOM file into the app I was testing. I used the full version of the app when possible, but failing that used the least-disabled version I could obtain. The version tested is listed before the “Pros” section of the report. In cases where I did not use the full version, app developers may provide me with a license for the full version if they wish.
- Test machines: 2008 MacBook Pro with 6 GB RAM and 1 TB SSD running OS X 10.10 (Yosemite) and 2012 MacBook Pro with 16 GB RAM and 1 TB SSD running OS X 10.11 (El Capitan). Windows apps were tested using either the edition bundled with CrossOver, or I bundled them with the Wine engine myself using WineBottler, or failing that, ran them in Windows 8.1 using Parallels 9 for Mac.
- After importing the test GEDCOM, I checked for an import log to see what errors the app identified, usually either custom fields or bad grammar, but sometimes it contained false positives, i.e., errors that were not really errors.
- Before making any changes to the file within the app, other than to add submitter information, I immediately exported the file to a new GEDCOM and compared it to the original GEDCOM using the app DiffMerge (thanks to Tim Forsythe of GigaTrees for this suggestion). This helped me identify errors the app’s import log missed and zero in on problems with the app’s handling of GEDCOM.
- I went through every piece of data in the app’s family tree file, using my original GEDCOM as a checklist, to determine how the data were imported and if any were missing.
- I updated the GEDCOM crosswalk with the names of the fields used by the app.
- I used the app’s GEDCOM file, if available, to update the GEDCOM crosswalk with the tags it used.
- I noted my findings as I went and later expanded them into the test report. I also noted a few other pros and cons having to do with usability, etc., but these reports focus on GEDCOM compliance. I will test and report for usability after the GEDCOM testing is complete.
I have no conflicts of interest to report. I receive no compensation from any app developer or website, to include GenealogyTools.com, or Mint Yogi LLC, other than a complimentary membership to GenealogyTools.com.
As usual, comments or questions are welcome in the comments. Although I may not respond to all of them, since I’m trying to finish the initial tests, I read every one.
Thanks to Louis Kessler for reminding me about UTF-8.
2 Jan 2016: Added a credit to Tamura Jones. Apologies to Tamura for the oversight of not including it in the first place. Tamura has also pointed out the problems with the address structure. Also added a quotation by Bruce Buzbee and a disclosure statement.
28 Apr 2016: Made changes based on Family Tree Maker’s continued existence and RootsMagic’s ability to directly open FTM files. Also added a note about the discontinuation of VGedX.