Molecular biology

Molecular Biology as in seek of Quantum Advantage

In computational biology, statistical methods and machine learning are key techniques as the main objective is often to assimilate vast amounts of data. For instance, in genomics, annotation of gene information has made extensive use of hidden Markov models (HMMs). In drug discovery, a vast array of statistical models have been developed to estimate molecular properties, or to predict if a ligand will bind to a protein. In structural biology, deep neural networks have been used to both predict contacts, secondary structure and most recently 3D protein structures. Training and developing such models is often computationally intensive. A major catalyst of the recent advancements in machine learning was the realization that general purpose graphical processing units (GPUs) could significantly accelerate the training procedure. Quantum computing will execute exponentially faster the algorithms to train machine learning models and provide unprecedented boost to scientific applications.

As an example, the human genome contains 3 billion base pairs, which can be stored in 1.2 × 1010 classical bits, approximating to 1.5 gigabytes. A register of N qubits involves 2 amplitudes, which can each represent a classical bit, by setting the Kth amplitude to 0 or 1 with an appropriate normalization factor. Therefore, a human genome could be stored in around 34 qubits. More importantly, doubling the size of the register to 68 qubits leaves just about enough space to store the complete genome of every living human in the world.

Similarly, computing the full electronic wavefunction of an average drug molecule is expected to numerically take longer than the age of the universe on any current supercomputer using conventional algorithms, while even a modest-sized quantum computer may be able to solve this in a timescale of days.

What is de novo molecule design?

In computational biology, de novo protein structure prediction refers to an algorithmic process by which protein tertiary structure is predicted from its amino acid primary sequence. De novo methods tend to require vast computational resources, and have thus only been carried out for relatively small proteins.

What is protein folding?

Protein folding is a process by which a polypeptide chain folds to become a biologically active protein in its native 3D structure, the amino acids in the chain eventually interact with each other to form a well-defined, folded protein.

Protein misfolding is believed to be the primary cause of Alzheimer’s disease, Parkinson’s disease, Huntington’s disease, Creutzfeldt-Jakob disease, cystic fibrosis, Gaucher’s disease and many other degenerative and neurodegenerative disorders.

Many problems in computational biology can be formulated as finding the global minimum or maximum of a complicated, high-dimensional function. For example, it is believed that the native structure of a protein is the global minimum of its free energy hypersurface. In a different area, determining a community in a network of interacting proteins or biological entities is equivalent to finding an optimal subset of the nodes. Unfortunately, with the exception of a few simple systems, optimization problems are often very difficult, even NP-complete or NP-hard. Although there exist heuristics to find approximate solutions, these tend to provide only local minima, and in many cases, even the heuristics are intractable. The ability of quantum computers to accelerate such optimization problems or find better solutions has been explored in depth.

Quantum machine learning could change how we process and analyze biological data. Unfortunately, the current practical challenges are sizable. Yet the power of quantum algorithms may prove useful as scientific and technological developments, such as the emergence of self-driving laboratories, provide more and more data for the exploration of uncharted regions of the protein universe.

8 Responses

Leave A Comment

Your email address will not be published. Required fields are marked *