In 2013, scientists at MIT and at UC Berkeley optimized a way to use bacterial gene sequences to cut and change DNA at precise locations. The genome-editing system, called CRISPR, is cheaper and simpler than previous methods, and it has led to breakthroughs in diagnostics and the creation of more accurate disease models. And because it can permanently modify a living organism’s DNA, CRISPR technology may one day allow physicians to treat genetic diseases—anything from congenital deafness to cancer—by correcting the mutations that cause them.
But around September 2015, Dr. Alex Pérez, a student in the Tri-Institutional MD-PhD Program, realized that there was a major flaw in the method scientists were using to determine the specificity with which CRISPR would cut a given genomic sequence—a flaw that sometimes caused experiments to fail. To solve the problem, Dr. Pérez led the development of an open source software package and web interface that ensures far greater precision. He worked under the mentorship of two researchers at Memorial Sloan Kettering Cancer Center—Dr. Christina Leslie, professor of computational and systems biology, and Dr. Andrea Ventura, associate professor of cancer biology and genetics, who also hold appointments at the Weill Cornell Graduate School of Medical Sciences—and collaborated with postdoctoral scientists Dr. Yuri Pritykin and Dr. Joana Vidigal. The resulting tool, released in September 2016 and dubbed GuideScan, is already being used by some 3,000 researchers around the world—and it won Dr. Pérez a spot on the 2018 Forbes “30 Under 30” list for science.
The basic method for CRISPR genome editing involves targeting segments of a genome with a small piece of RNA, called a “guide RNA,” and a protein, known as an endonuclease, that scans DNA for matching sequences of 20 or so base nucleotides. If it finds a target, the endonuclease will cut the DNA there, allowing the insertion or deletion of genetic information. But given that a human, for example, has a genome consisting of 3.3 billion base pairs, parts of which are repetitive, the chances of the same or similar sequences of 20 bases appearing at more than one place is rather high. “The question then becomes, for a given guide RNA, is it unique, or does it have any potential off-targets?” Dr. Pérez says, referring to these unintended matches.
To address this, researchers may use what’s known as a genome aligner—software that compares a piece of DNA with a reference genome based on genetic information from a representative sample of individuals. But genome aligners are optimized for speed, and a reference genome by definition does not match the individual genome a researcher is working with. It turns out that false negatives are common with genome aligners: the software might say that a given guide RNA would only identify one or two sites to cut in a genome, when in reality there were many more. “The previous tools said that the guides were unique and had no off-targets, but they did,” Dr. Pérez explains. “Researchers didn’t know that there were other sites that a given guide was hitting.” And if sites on two different chromosomes get cut accidentally, the cell may try to repair itself by binding half of one chromosome with half of the other—and this error, called translocation, can cause diseases or defects. “So when you cut at more than about 10 of these sites, you will kill the cell just by inducing massive DNA damage—and you won’t know why,” Dr. Pérez says. “It is an enormous problem.”
In spring 2017, Dr. Pérez, along with colleagues in Dr. Leslie’s and Dr. Ventura’s labs, published a paper in Nature Biotechnology announcing that they had developed GuideScan, a software that could guarantee the precision of guide RNAs. It works by taking a sequenced genome from an individual organism and creating a guide RNA database that can be queried instantaneously, and it guarantees that a given guide RNA will cut in unique locations. “The reaction to our paper was kind of intense,” Dr. Pérez says with a laugh, marveling at how quickly the new technology was adopted. “People tell us it’s been applied to all types of science. Neuroscientists, geneticists and some biotech companies are using it. People trust our tool because of the methodology behind it.”
Now out of the lab and completing his final year of medical school, Dr. Pérez is not where he imagined he would be when he was an undergraduate majoring in computational biology on Cornell’s Ithaca campus. His original plan was to get a PhD in computer science, but heartbreak at home turned his thoughts toward medicine. The husband of a close friend, a woman whom he considers a surrogate grandmother, was diagnosed with a devastating disease called multiple system atrophy. “It just shuts down your muscles, and they couldn’t do anything for him,” he recalls. “He died a slow, painful, scary death. It was terrifying for his family. Medicine gave a diagnosis, but there was nothing else they could do—and I realized I couldn’t let that stand. So at the last second, I changed my mind and went with an MD-PhD.”
Dr. Pérez—who ultimately plans to pursue a career as a physician-scientist, though he hasn’t yet chosen a specialty—has found that his programming skills are much in demand in the world of biomedical research. In addition to joining Dr. Leslie’s computational biology lab, he worked with Dr. Ventura, whose team uses CRISPR in experiments with mouse models of cancer. Dr. Ventura says he has been pleased by the feedback on GuideScan from colleagues around the world, and confesses that he was sorry to see Dr. Pérez return to medical school once the project was complete. Talented MD-PhD students with a strong interest in the computational side of research are a rare find, he says, and he’s hoping to recruit more. “These kinds of people are extremely valuable as physician-scientists,” Dr. Ventura says. “They can understand the hardcore computational aspects of medicine as well as the medicine itself.”
— Amy Crawford
This story first appeared in Weill Cornell Medicine, Fall 2018