Research detail

•  QSAR/QSPR from t hree-dimensional descriptors
3D descriptors, such as orthogonal projection , g eneralized torsion angle , and so on were approached. 3D molecular structures and properties were generated by SYBYL. Similarity method, multiple linear regression, and BFGS quasi-Newton neural networks were performed for the studies on QSAR/QSPR .

•  QSAR/QSPR from Chiral descriptors extended by topological descriptors
The extended topological indexes were approached to overcome the shortcoming of conventional topological indexes, which only describe the connection among atoms. The new chiral descriptors were used for the studies on activities of chiral molecules.

•  Similarity comparison of DNA sequence by graphic invariable
G raphical representations of DNA primary sequences were generated, and then transferred as mathematics invariables to compare the similarities of ¦Â globin gene for different species.

•  Automatic assignment of absolute configuration from 1D NMR data by using chirality codes derived from atomic radial distribution function
Molecular and atomic level chirality codes were approached and applied that describe chirality by using atomic properties and are able to distinguish between enantiomers. The Cartesian coordinates were calculated from CORINA and the physicochemical atomic properties were calculated using PETRA . The counterpropagation neural network (CPG) with variable selection by genetic algorithm (GA) established a correlation between these chiral descriptors and NMR chemical shift difference between enantiomers. It can be used to assign the absolute configuration of enantiomer. These chirality codes were also applied to the studies on QSAR/QSPR.

•  Enantioselectivity prediction in reaction by explainable physicochemical atomic stereodescriptors (PAS) code
To overcome the fundamental weakness of Cahn-Ingold-Prelog (CIP) rules which were not designed to bear any intrinsic chemical meaning, PAS descriptors were developed that represent the chirality of an atomic chiral center on the basis of physicochemical properties of the ligands. PAS descriptors were successfully applied to qualitative structure-enantioselectivity relationships in organic reactions and biocatalysis by using neural networks, decision trees or Random Forests. The application of PAS code gives clues to find the atomic properties, bond properties or steric factors that can account for stereoselectivity in chemical reaction.

•  Classification of chemical reactions without assignment of reaction centers Classifications of chemical reactions were explored by Random Forests or neural network in three different situations, using different levels of information: a) pairs of reactants were classified according to the type of reaction they produce, b) products were classified according to the type of reaction from which they can be synthesized, and c) reactions were classified from the difference between the descriptors of the product and the descriptors of the reactants. In all cases MOL ecular M aps of A tom-level P roperties (MOLMAPs) were used as descriptors. They are generated by a self-organizing map and encode physicochemical properties of the bonds available in a molecule.

•  Prediction of mutagenicity with bond-level explanation

The prediction to mutagenicity (Ames test) was performed by Random Forests , decision trees, or Support Vector Machine with MOLMAP and general globe descriptors as input. MOLMAP descriptors are able to represent the properties of bonds in a molecular structure by a fixed-length code, and to compare molecules with no previous alignment or bond-to-bond mapping. They were shown to encode relevant information for mutagenicity prediction, and could reveal structural features linked to mutagenicity without explicitly encoding structural fragments. The predictions could be explained in terms of a similarity measure between query compounds and the compounds in the training set.

•  Ongoing research
Classification of enzyme from protein structure is the ongoing project. PDB (Protein data bank) files and corresponding information were obtained from web service RCSB PDB, SwissProt, etc. Reactions related to EC number were extracted from KEGG LIGAND database.