Skip to main content

Scientists are using A.I. to create artificial human genetic code

profile of head on computer chip artificial intelligence
Digital Trends Graphic / Digital Trends

Since at least 1950, when Alan Turing’s famous “Computing Machinery and Intelligence” paper was first published in the journal Mind, computer scientists interested in artificial intelligence have been fascinated by the notion of coding the mind. The mind, so the theory goes, is substrate independent, meaning that its processing ability does not, by necessity, have to be attached to the wetware of the brain. We could upload minds to computers or, conceivably, build entirely new ones wholly in the world of software.

This is all familiar stuff. While we have yet to build or re-create a mind in software, outside of the lowest-resolution abstractions that are modern neural networks, there are no shortage of computer scientists working on this effort right this moment.

What is altogether less familiar is the work being carried out by researchers at Estonia’s University of Tartu and France’s Paris-Saclay University.

Rather than just trying to re-create an approximation of the mind in software, they’ve turned to a different problem: Can you use an algorithm to generate genetic code for people that have never existed? Could you apply the same generative adversarial network (GAN) technology that allows A.I. models like BigSleep to spit out compellingly realistic generated images and use it, instead, to create fake DNA that, in the vein of Turing’s work, is indistinguishable from that of a flesh-and-blood person?

Artificial genetic data

“Creating artificial genetic data that are realistic enough, without directly copying the sequences, is a very hard problem,” Flora Jay, a researcher specializing in machine learning and population genetics at the University of Paris-Saclay University, told Digital Trends. “Genetic data is of high dimension, and you cannot just eyeball what’s important or not. We thus turned to cutting-edge techniques [being] applied to the computer vision, text, music, or protein world. These generative networks — GANs and [restricted Boltzmann machines] — are designed so that they can progressively and automatically learn how to create artificial genetic sequences.”

A GAN, a class of machine-learning framework coined by researcher (and current Apple employee) Ian Goodfellow, uses a combative, tug-of-war approach to improve its generative outcomes. It consists of two neural networks: A “generator” and a “discriminator” which pass outputs between one another.

GAN model
Yelmen et al. 2021

The generator’s job is to create something, be it an A.I. painting or a chunk of code representing an artificial genome in the form of ones and zeroes. The discriminator, like a bot version of J.K. Simmons’ perfectionist music instructor in the movie Whiplash, then critiques its efforts and sends this back to the generator. The generator learns from this feedback, while the discriminator similarly gets ever better at guessing what’s been created by the generator and what is the genuine article. Eventually, the generator is so good at creating fake versions of whatever it is attempting that the discriminator can be fooled. It’s no longer able to differentiate real from fake.

“One of the main problems here is assessing the quality of artificial genomes,” Burak Yelmen, a Ph.D. student at the University of Tartu’s Institute of Genomics, told Digital Trends. “You can look at an image and decide if it looks real, but this is not possible for genomes. [The] majority of the analyses we performed in our study was to see whether the artificial genome chunks we generated really looked like the real ones.”

Don’t worry, though. Despite a growing mass of articles about highly dubious gene tampering designed to rewrite the human code, this work is not about trying to “write” new parentless humans who could be created with the aid of supercomputers.

A chromosome emerges from random digital noise
Burak Yelmen

“To be clear, the objective of our work is to better understand and encode the existing genetic diversity of thousands or millions of people around the world, not to create artificial cells,” Jay said. “The neural networks are trained on this existing diversity, so the generated genomic regions do not carry additional novel mutations that could easily disrupt the functionality of a sequence — and they include, untouched, the segments that are conserved across human populations.”

Jay noted that, at the whole genome scale, it is “difficult to say” whether a specific combination of millions of generated nucleotides could indeed be “functional.” In other words, don’t expect to compile and run this code, expecting a fully formed person (or their blueprints) to emerge at the other end. Instead, the purpose is something altogether less sinister and, potentially, more useful.

All about data privacy

“There is an immense amount of data in biobanks and it keeps increasing every day,” said Yelmen. “However, genomic data is sensitive data and accessing these biobanks can be difficult for researchers due to ethical concerns. The main goal of our work is to create high-quality surrogates of existing genome banks and provide a solution to this accessibility barrier within a safe ethical framework. It is important to note that our study was a first step: There is still work to do.”

Added Jay: “The idea behind our study is to start investigating whether releasing artificial genomes instead of the real ones could preserve the privacy of genome donors, while providing useful information to the population genetics community. [Possible] applications of artificial genomes could range from better understanding of our evolutionary past to providing insights in medical genetics, including a wider range of diversity.”

In some ways, the work is reminiscent of the trend, seen a couple of years ago, in which GANs were used to create images of imaginary people, animals, and more as epitomized by the generative website ThisPersonDoesNotExist.com. Only this time, of course, it involves actual genetic code, rather than simple pictures.

A paper describing the project, titled “Creating artificial human genomes using generative neural networks,” was recently published in the journal PLOS Genetics.

Editors' Recommendations

Luke Dormehl
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
Emotion-sensing A.I. is here, and it could be in your next job interview
man speaking into phone

I vividly remember witnessing speech recognition technology in action for the first time. It was in the mid-1990s on a Macintosh computer in my grade school classroom. The science fiction writer Arthur C. Clarke once wrote that “any sufficiently advanced technology is indistinguishable from magic” -- and this was magical all right, seeing spoken words appearing on the screen without anyone having to physically hammer them out on a keyboard.

Jump forward another couple of decades, and now a large (and rapidly growing) number of our devices feature A.I. assistants like Apple’s Siri or Amazon’s Alexa. These tools, built using the latest artificial intelligence technology, aren’t simply able to transcribe words -- they are able to make sense of their contents to carry out actions.

Read more
Language supermodel: How GPT-3 is quietly ushering in the A.I. revolution
Profile of head on computer chip artificial intelligence.

OpenAI’s GPT-2 text-generating algorithm was once considered too dangerous to release. Then it got released -- and the world kept on turning.

In retrospect, the comparatively small GPT-2 language model (a puny 1.5 billion parameters) looks paltry next to its sequel, GPT-3, which boasts a massive 175 billion parameters, was trained on 45 TB of text data, and cost a reported $12 million (at least) to build.

Read more
Women with Byte: Vivienne Ming’s plan to solve ‘messy human problems’ with A.I.
women with byte vivienne ming adjusted

Building A.I. is one thing. Actually putting it to use and leveraging it for the betterment of humanity is entirely another. Vivienne Ming does both.

As a theoretical neuroscientist and A.I. expert, Ming is the founder of Socos Lab, an incubator that works to find solutions to what she calls “messy human problems” -- issues in fields like education, mental health, and other areas where problems don't always have clear-cut solutions.

Read more