## Large Scale Parallel Computing Techniques

Computational chips have evolved into a new stage, where the higher clock frequency is replaced by multiple cores to increase the overall performance. Meanwhile, in the field of scientific computing, the growing demands for the high-precision calculation and the realistic system makes some molecular systems may need a super cluster with thousands of cores to finish the computational task. The parallelism of the computational tasks is an inevitable trend for scientific computations. When a heavy task is distributed on numerous cores, the dispatching, gathering, exchanging and communication among these subtasks in a reasonable and efficient way is really a problem especially on super large scale tasks. For computational hardware, such as CPU frameworks, GPU frameworks and their hybrid frameworks, their parallelisms methods are different; therefore, how to coordinate the hardware and software architectures to increase the efficiency of the whole system has become more and more important. Not only that, parallelisms traditional serial code or transplanting existing CPU based software to GPU platform to obtain a dramatic acceleration has become one of the hottest fields in modern scientific computational field. From the development history of many famous software packages, using GPU to accelerate the calculation has become an economical way to obtain a performance enhancement due to its higher cost-efficiency and energy-efficiency. To follow this trend, we developed several scientific software packages on GPU platform, which is found to be able to accelerate the research process greatly.

## Accelarated Sampling Techniques

For a given molecular system, we can calculate their thermodynamic properties via sampling techniques based on statistical mechanics. In early years, some small and simple molecular systems had obtained a well understanding using this technique. With the deepening of research, people become focusing on more realistic and complex system, such as various macromolecule systems, DNAs and proteins. These systems usually have a very complex phase transition behavior, large amount of degree of freedom, innumerable sub-stable state and rugged free energy surface leading to a very complex thermodynamic behavior which makes the traditional molecular sampling techniques such as Metropolis algorithm powerless. To overcome this problem, people become more and more focus on how to accelerate the traverse of the conformation space in order to make the investigation of complex system become feasible under current available computational resources. Through years of effort, series of method to enhance the sampling efficiency are developed, such as replica exchange method, multi-canonical ensemble sampling, meta-dynamics, biasing force method, integral-over-temperature sampling, Wang-Landau sampling etc. The main idea of these accelerated sampling method is to reform the sampling ensemble (or called as expanded ensemble method) and increase the sampling probability of low possibility conformation to collect enough conformation to do thermodynamic statistics. Among these techniques, we focus on a widely used algorithm Wang-Landau (WL) sampling algorithm, which provide an efficient way to calculate the relative energy density of state function directly which can be further used to estimate the partition function of the system. WL algorithm had attracted huge amount of attentions because of its simplicity in the expression and easy implementation. Although WL algorithm has obtained widely confirmations and applications, most of which are very small system. The reason is that the computation demanding of WL algorithm is very large based on its algorithm characteristic. Therefore, some larger system, such as polymers and bio-macromolecules, are still difficult to study using WL method. Not only that, the error saturation problem in the original WL algorithm makes the obtained density of state function has a precision limitation. The current improvement of WL algorithm is mainly on two directions, i.e., a higher precision and a faster speed. In our recent works on the thermodynamic transition of various polymer architectures, we developed a modified 1/T parallel WL algorithm, which obtained a well improvement both in the precision and the sampling speed. Aditionally, using GPU to accelerate the sampling process is also listed on our research agenda, which should greatly expand the application of WL algorithm to a broader area.

## Multiscale Molecular Simulation Techniques

Large scale molecular simulations are usually limited by the huge amount of calculation, which is caused by the complex molecular conformation and the numerous degrees of freedom. Practically, one has to consider a balance between the actual scale of the computation and hardware resources available. Coarse-grained (CG) molecular simulation method is believed to be able to solve the contradiction between computation demanding and limited hardware computability. The main idea of coarse-grained method is to reduce the number of degree of freedom of a molecular system and to obtain a higher computing speed without significant losses in precision. The design of coarse-grained interactions between CG beads has become a hot researching area in the multi-scale molecular simulation techniques. There are two major class of coarse-graining method, one is called the top-down method and another is called the bottom-up method. The top-down scheme means the form of the interaction function is predefined and by tuning their parameters one can fit them to the experiment or theoretical results. Due to the flexibility of the form of interaction function, the selection of them is various and empirical. Therefore, it is very suitable for simulating large scale system without too much precision requirement and as a more general model. While, the second CG scheme (bottom-up) focus on the detailed structure of the finer model or the rigorous force balance of the atomistic model. A rigorous mathematic proof enables them to have a consistence results under different coarse-graining level and a precisely structural, thermodynamical and kinetical reproducibility endows these CG algorithms promising application in studying various system with high precision requirement. In our current works, we focus on the reversibility, transferability of the coarse-grained interactions and developed an automatic coarse-graining engine to extract the bottom-up CG force field using an existing method called Iterative Boltzmann Inversion. We also applied some predefined force field (top-down scheme) to study some biophysical system to reveal the competition between weak interactions.