Spitting for Science: The Truth about DNA Testing and Privacy

It was late October, 1997, and the controversial film Boogie Nights opened in theaters nationwide. Mark Wahlberg’s portrayal of dishwasher-turned-pornstar, Dirk Diggler, had everyone and your mom lining up to see some award-winning acting. Meanwhile, Gattaca, a dystopian-future sci-fi about the social implications of unlocking the human genome, flopped at the box office.

Gattaca tells the story of Vincent Freeman (Ethan Hawke), a product of old-fashioned backseat conception, who struggles to hide his imperfections from his genetically-engineered peers. In Vincent’s world, the DNA of every citizen is stored in a giant database. A social structure emerges, relegating the genetically flawed to subordinate positions, while promoting the genetically perfect to stations of authority.

It’s twenty years later, and Boogie Nights is just a hazy roller-skating memory. Thanks to the growing use of at-home DNA saliva tests, however, Gattaca has a newfound relevance. Recent articles cite the film as a prophetic vision and criticize companies like 23andme for creating DNA databases that may be used for nefarious purposes.

While the threat of losing our most personal information, our genetic code, is quite real, private DNA companies are not the enemy. Contrary to popular opinion, the best way to regain ownership of your DNA is to acquire your data from a private testing service, because like it or not, the government already has it.

As predicted, it is now possible to test embryos for fatal genetic disorders. With a few drops of saliva, a DNA sequencer can figure out the color of your eyes, hair, and skin. You can also find out if you hate cilantro without even tasting it, or if eating asparagus will make your urine smell bad.

With the help of industry-leader 23andme and other independent research groups, we may soon be able to predict the onset of diseases and identify which medications will be most effective at treating them. Right now, 23and me is beginning a study of 25,000 research participants in what could be the most intensive study ever of bipolar disorder.

This technology doesn’t only apply to medicine. With the help of DNA and ancestral databases, adoptees can find birth parents and biological relative much more easily than they could a decade ago. Genealogists can use the data to verify branches in a family tree. Anthropologists can draw more precise conclusions about ancient societies. However, Luddites and technophiles alike are warning against the use of this technology.

Scientific American calls 23andme “terrifying”, citing potential privacy issues. This New York Times opinion piece by the executive director of the Center for Genetics and Society claims that DNA data should not be in the hands of a privately-held company, but instead handled solely by the government and associated nonprofit organizations (like the Center for Genetics and Society, perhaps?). They have suggested that by voluntarily sharing your DNA via saliva sample, aka, “spit kit”, you are signing away the rights to your most personal information.

Consumers are understandably concerned, because this data in the wrong hands can open the door to a host of negative consequences. What if insurance companies use this information to refuse coverage or raise premium? What if employers refuse to hire people who don’t meet specific genetic criteria? What if Anne Wojcicki, co-founder and chief executive officer of 23andme, is secretly building a Neanderthal army? These are mostly valid questions, but they ignore a very important fact. The “bad corporations” and the government already have your DNA.

What’s that? You haven’t taken a DNA test. It doesn’t matter. if you have had children in the past few years, the government has your DNA. Per the ACLU:

The DNA of virtually every newborn in the United States is collected and tested soon after birth…Today it is increasingly common for states to hold onto these samples for years, even permanently. Some states also use the samples for unrelated purposes, such as in scientific research, and give access to the samples to others.

Without getting too far into the boring details of genetic recombination, it can be simply stated that a baby gets 50% of its DNA from each parent. Though the exact amount inherited from grandparents can vary, it’s about 25%. Similarly, a baby will share 25% of its genetic makeup with half-siblings, aunts, and uncles. If a person shares half or 1/4 of your DNA and they’re in a federal database, you might as well be on there, too.

What’s that? You don’t have children? Sorry, but your data is still out there. Just ask the brother of the ‘Grim Sleeper’, aka, Sacramento’s ‘Roaming Rapist’.

Police became frustrated, unable to identify him even though they had his DNA from the crimes. Desperate for a break, they checked a database of convicted felons, but came up empty-handed.

Finally, they searched for a partial match to see whether he had a relative in the database. They got lucky — the man had a brother in custody, which led authorities to the assailant.

The “Roaming Rapist” is one of a handful of cases that California authorities have quietly solved in recent years using a controversial technique that scours an offender DNA database for a father, son or brother of an elusive crime suspect.

Still not buying it? You aren’t a felon with a CODIS (Combined DNA Index System) profile? Your genetic data is still out there. Grim Sleeper and his brother share about 50% of their DNA, but they also share a very special kind of DNA that is carried only by men. Called Y-DNA, this code is passed nearly unchanged from father to son, generation after generation.

Any direct male descendant of the Sleeper or his brother, from now until the end of human history, will carry this identifying code. Although Y-DNA is primarily used to study migration patterns of ethnic groups and to follow paternal surnames back several centuries in tandem with genealogy studies, it is still a form of DNA that is widely collected and shared.

And sure, the type of DNA data used by CODIS (STRs, or short tandem repeat) isn’t the same as other types of DNA tests (SNPs, or single nucleotide polymorphisms), but for the purpose of this article, I’ll be lumping them all together as personal genetic data. This article explains the difference between the two.

(Note: It is not conjecture when I say that a person can be identified by proxy using a family member’s DNA. I have no formal education in the field of genetics, but with the help of some basic online tutorials, I have learned how to triangulate chromosome segments to determine common ancestors. Using just a laptop, my own raw data, and my vague memory of high school biology class, I can identify people who have never even had a DNA test. Google “segmentology”  or “chromosome triangulation” to learn how to do this.)

But it isn’t only babies and felons whose DNA the government is seeking. Back in January 2015, Barack Obama unveiled the “Precision Medicine Initiative,” a plan to collect and study large amounts of DNA obtained from volunteers. Initially earmarked as a $215 million expense, the initiative aims to collect genetic samples and corresponding medical history data from at least one million Americans. At present (August 2017), the program is in its beta testing phase with a large-scale roll-out scheduled for early 2018.

A million people sounds like a lot, but the aforementioned Combined DNA Index System (CODIS) already contains data from over 12.5 million convicted felons. The government has been collecting DNA from newborns since about 2000 (varies by state).

Your family tree is about as private as a front-yard brawl. Image source:: Idiocracy (2006),Twentieth Century Fox

Averaging about 4 million births per year in the US, over 17 years, that would total a staggering 68 million infant DNA samples. Add those all together and we get about 82 million samples taken. That’s 1 in 4 people. The government possibly owns/stores/uses a DNA sample from 25% of its citizens. Most of these samples are obtained without consent.

These newborn screening DNA databases make a complete mockery of informed consent. What people also don’t know…is that this is the one test that is not done by the hospital or a third party on behalf of the hospital.… It is done by the state department of public health.

– Jeremy Gruber, president of the Council for Responsible Genetics

Several years ago, CNN published an illuminating article entitled The Government Has Your Baby’s DNA. It shares the story of a woman named Annie Brown, whose daughter Isabel tested positive for a gene that might cause cystic fibrosis. Though it could be a page right from the Gattaca screenplay, the story below is real-life.

Brown says she first lost trust when she learned that Isabel had received genetic testing in the first place without consent from her or her husband. “I don’t have a problem with the testing, but I wish they’d asked us first,” she says.

Since health insurance paid for Isabel’s genetic screening, her positive test for a cystic fibrosis gene is now on the record with her insurance company, and the Browns are concerned this could hurt her in the future.

“It’s really a black mark against her, and there’s nothing we can do to get it off there,” Brown says. “And let’s say in the future they can test for a gene for schizophrenia or manic-depression and your baby tests positive — that would be on there, too.”

Brown says if the hospital had first asked her permission to test Isabel, now 10 months old, she might have chosen to pay for it out of pocket so the results wouldn’t be known to the insurance company.

Although most states claim to destroy samples after a certain time, usually 2-5 years, others choose to hold them indefinitely. Even if your state says it destroys the samples after 5 years, are you allowed to show up on DNA dump day and watch to make sure it has been completed? I doubt it.

If the government already has over 80 million samples, why do they need the Precision Medicine Initiative? To answer that question, we must first follow the history of a little company called 23andMe. Founded by Anne Wojcicki, Linda Avey, and Paul Cusenza in 2006, 23andMe launched its Personal Genome Service® in 2007. Over the past decade, the company has collected more than 2 million samples, with over 85% of participants answering health-related questions.

23andme has a business model and history that many may find unsavory. They offer consumers affordable home DNA saliva collection “spit” kits for as little as $99 and provide ancestry info and/or a list of potential genetic markers for disease. At the same time, they are selling this collected data to other companies and using it for research purposes.

23andme CEO Anne Wojcicki is the ex-wife of Google co-Founder Sergey Brin. It’s not unreasonable to see the ties to Google, a company routinely criticized for harvesting user data, as a red flag. Also, 23andme has repeatedly defied the FDA and basically blew them off for six months while their product was being reviewed for federal approval.

Wait. What? They sell the data? Yes, they do sell the data, but not as individual specimens. Per their official privacy statement, “We will not sell, lease, or rent your individual-level information (i.e., information about a single individual’s genotypes, diseases or other traits/characteristics) to any third-party or to a third-party for research purposes without your explicit consent.” They also provide customers with a raw DNA file, which contains all the data stored on each of the 23 chromosomes (300,000 lines when viewed in excel).  The same data can be obtained from a company that doesn’t sell aggregate data, like Sure Genomics, for $2,500.

As a customer of 23andme (my overuse of memes should dispel any suspicion that my comments are in any way endorsed or compensated by the company) I have not personally been asked for individual-level consent, but I imagine it’s something that happens when a very rare gene mutation is found. I picture a man with a vestigial tail and webbed toes. He submits a sample and some sort of alarm goes off at 23andme HQ. Calls are made, and the customer is contacted regarding a top secret lizard-man study. He happens to be an adoptee and agrees to participate, hoping to find his birth family.

Though my reptile scenario is ridiculous, it is the sale of genetic data, and possible the bigger bucks paid for access to rare samples of genetic mutations, that make the company profitable. A DNA sequencing machine is crazy expensive and employing technicians who know how to calibrate and fix those things isn’t cheap.

Who buys the data? Presumably, pharmaceutical companies and other entities I generally dislike. But this collaboration isn’t necessarily a bad thing. After all, testing costs are offset by the willingness of participants to share their data, so an individual’s data that would be otherwise prohibitively expensive and difficult to obtain is now affordable and as easy as spitting in a tube.

But what about that Google connection? Yep. It sounds bad. It’s easy to paint a picture of Ms. Wojcicki as just the head of another evil corporation, aided by the data-mining skills of her ex-husband’s company, scheming to profit from either the gullibility or hubris of their target demographic.

Many articles have been written, warning about the so-called dangers of using 23andme. In reality, Sergei Brin’s expertise has probably helped 23andme obtain such a large database of DNA with accompanying health history profiles. Strong ties to Google isn’t necessarily a bad thing.

And the FDA? What kind of person defies the federal agency that regulates all health related products and services? Maybe a few words from the scofflaw in question will put things into perspective.

One of the big drivers for me is that health care is a very elitist system. As much as we try to make it free and democratic for all, the reality is that it’s expensive and not all therapies are accessible to all people. So I have been very focused on making sure that we democratize genetic information so it’s available to everyone. – Anne Wojcicki

What. A. Monster.

It might help to take a few steps back and explain exactly why Wojcicki and the FDA have been at odds. The short story is that 23andme wanted to encourage people to participate in both the “spit kit” and the questionnaire portion of the data collection process. To entice participants, they promised their reports would tell users if they were at high risk for a number of nasty diseases. For just $199, the average person could learn if she carried specific gene mutations that might put her at risk of developing breast cancer, Parkinson’s disease, or a number of other conditions.

The most aggressive-looking photo of Wojcicki on the internet.

This report was generated by comparing participant genetic information to thousands of genetic studies and suggesting the customer consult a genetic counselor if the risks appeared to indicate a likelihood of developing a serious illness. The FDA stepped in and told Wojcicki and her colleagues that selling this information directly to consumers (even with a slew of disclaimers and warnings) was akin to practicing medicine without a license.

23andme, always preaching the importance of the individual’s right to his own personal data, would not submit to what they saw as an unnecessary barrier to self-ownership. The company continued to ignore the FDA, and in turn the FDA sent Wojcicki a letter ordering her to stop selling DNA kits.

Remember those potential 80+ million DNA samples the government may be holding? Well, those samples aren’t particularly useful in the field of medical research. Raw genetic data can be compared to other samples to test for common ancestry and find living relatives, but the code alone tells very little about genetic predisposition for disease. This is why the government needs the Precision Medicine Initiative if it hopes to own more usable data than a non-government entity.

With the exception of conditions like Down Syndrome (the presence of an extra chromosome) and Klinefelter syndrome (a male with two X chromosomes instead of one), very little can actually be learned by simply looking at a person’s DNA. It would be useful to keep a sample of a baby’s DNA (with identifying information) in the event that the child developed a disease. The file could be noted and compared to other children, looking for chromosomal similarities between children suffering from the same conditions.

This type of study, though possibly effective, could take decades. And of course, the basis of ethical research is informed consent. If it is the intention of the government to use infant DNA for long-term observation, that should be publicly stated. Even an apathetic populace would likely be upset to learn about such a practice.

With the CODIS database of convicts, at best, one can use the information to trace ancestry or find family members. A DNA sample alone is missing the medical history needed to paint the full picture needed for quantifiable research.  This is where 23andme excels and where the Precision Medicine Initiative would presumably attempt to follow suit.

For example, if 100 people all share the same gene mutation and 50 of those people have a family history of a certain disease, the combined data can help point researchers in the right direction. A long list of studies made possible by this type of combined research can be found right on the 23andme website.

But what about the data held by other DNA testing companies? There are dozens of other reputable and affordable DNA testing companies like Ancestry DNA, Family Tree DNA, and National Geographic’s Geno 2.0., but none of these services have compiled health data on their members. The FDA isn’t particularly concerned with their data because it doesn’t have the valuable personal information that 23andme collects.

Also to be noted, the data collected by these ancestry-only companies cannot be verifiably linked to any particular person when users are submitting false names and using throwaway email addresses. Although it’s hard to determine exactly how many people have used a DNA testing service, Ancestry.com alone claims to have over 3 million samples and Family Tree DNA has close to a million. Odds are, at least one of your aunts, uncles, or cousins has already mapped you in their genetic family tree. 

Alternately, 23andme has the massive data bank that the US government is trying to obtain. Currently, the power to use this data lies solely with 23andme and its partners, while the government is hoping to obtain a similar database within the next two years. It’s no wonder the FDA and 23andme have been at odds.

Genetic research has the potential to change the way medications are prescribed, the way diseases are treated, and the way people eat asparagus. Fortunately, participating in this research is much safer than most people realize. Way down in section 4.b. of 23andme’s privacy policy, it mentions something called a Certificate of Confidentiality.

If you are participating in 23andMe Research, 23andMe will withhold disclosure of your personal information involved in such research in response to judicial or other government subpoenas, warrants or orders in accordance with any applicable Certificate of Confidentiality that 23andMe has obtained from the National Institutes of Health (NIH)

When you become a 23andme customer (and answer a few questions), you are technically joining a medical research study. As a study participant, you are covered by the same protections given to someone who takes part in a traditional in-person study. As of this writing, 23andme is the only home DNA test service that meets the criteria required to offer the protection provided by the NIH-issued Certificate of Confidentiality.

Per the National Institutes of Health,  “A Certificate of Confidentiality helps researchers protect the privacy of human research participants enrolled in biomedical, behavioral, clinical and other forms of sensitive health-related research. Certificates protect against compulsory legal demands, such as court orders and subpoenas, for identifying information or identifying characteristics of a research participant.” A detailed list of court cases where the Certificates of Confidentiality were upheld can be found here.

Again, using other companies like Ancestry DNA and Family Tree DNA, is also an option. They do not offer the extra layer of security provided by the Certificates of Confidentiality. Of course, use of their services doesn’t entail sharing personal details about your family and medical history. If you are interested in tracing your ancient ancestors or locating biological family members, one of these companies should meet your needs. As long as the company allows you to download your raw data file, it’s worth the $99.

You may be wondering what is in one of these raw data files. What makes this information so precious? Here’s a sample. Download the file and open it in excel and you’ll see 300,000 lines that look like this. There’s a column for RSID (Reference SNP cluster ID), Chromosome number, and position of data. Kind of disappointing, Huh?

As I mentioned before, this file alone isn’t all that useful, but when compared to the files of other people, software can locate similar segments across multiple users’ data sets and from there suggest possible relatives. Standard percentages of common DNA can accurately identify parent/child, sibling, aunt/uncle, grandparent/grandchild, and first, second, and sometimes up to 4th cousin matches. The match accuracy decreases as the distance from a common ancestor increases.

Similarly, a customer’s sample may be compared to region-specific samples to determine ethnicity. Data may be collected from a group of people indigenous to a region for thousands of years, or it may be compared to ancient DNA samples. As additional samples are collected and compared, the accuracy of these ethnicity composition reports improves.

Family Tree DNA has a massive collection of surname, lineage, and geographical projects that is continuously growing and discovering new information about ancestry and individuals’ connections to each other. Participation qualifications vary by project, but the one general requirement is a raw DNA file. One can be obtained through Family Tree DNA or imported from another testing service.

Additionally, there are plenty of sites where you can upload your data via an anonymous username and use free or nearly-free tools to compare your info to thousands of published journal articles. Many independent research groups and genetic hobbyists are continually compiling information. The result: anyone with a raw data file can acquire a report that contains data 23andme isn’t legally allowed to share. Meaning, it’s illegal to buy it, but you can get your the equivalent of a full genetic health profile for free with a simple online search.

Not everyone wants to know their genetic code and I support an individual’s right to eschew testing. At the same time, knowing the government will likely have your data soon if they don’t already, I highly recommend owning a copy of your own raw data. It isn’t unreasonable to think that access to affordable private DNA testing may be obstructed by the FDA when Gattaca the Precision Medicine Initiative moves out of its beta-testing phase. The government doesn’t like competition.

So go ahead, spit for science. Spit for your right to self-ownership. Or just go watch Boogie Nights. Called “a fireball in a time capsule” by Peter Travers of Rolling Stone, it will make you forget all about the government DNA database. Personally, I preferred Gattaca, but I do own a copy of my raw DNA file.

Comments 5

Leave a Reply

Your email address will not be published. Required fields are marked *