%0 Journal Article %A Jia, Jian’an %A Liang, Xiaotao %A Chen, Shipeng %A Wang, Hui %A Li, Huiming %A Fang, Meng %A Bai, Xin %A Wang, Ziyi %A Wang, Mengmeng %A Zhu, Shanfeng %A Sun, Fengzhu %A Gao, Chunfang %T Next-generation sequencing revealed divergence in deletions of the preS region in the HBV genome between different HBV-related liver diseases %D 2017 %J Journal of General Virology, %V 98 %N 11 %P 2748-2758 %@ 1465-2099 %R https://doi.org/10.1099/jgv.0.000942 %K next generation sequencing %K Hepatitis B virus %K support vector machine %K PreS %K variation %K Hepatocellular carcinoma %I Microbiology Society, %X In order to investigate if deletion patterns of the preS region can predict liver disease advancement, the preS region of the hepatitis B virus (HBV) genome in 45 chronic hepatitis B (CHB) and 94 HBV-related hepatocellular carcinoma (HCC) patients was sequenced by next-generation sequencing (NGS) and the percentages of nucleotide deletion in the preS region were analysed. Hierarchical clustering and heatmaps based on deletion percentages of preS revealed different deletion patterns between CHB and HCC patients. Intergenotype comparison also indicated divergence in preS deletions between HBV genotype B and C. No significant difference was found in preS deletion patterns between sera and matched adjacent non-tumour tissues. Based on hierarchical clustering, HCC patients were classed into two groups with different preS deletion patterns and different clinical features. Finally, the support vector machine (SVM) model was trained on preS nucleotide deletion percentages and used to predict HCC versus CHB patients. The prediction performance was assessed with fivefold cross-validation and independent cohort validation. The median area under the curve (AUC) was 0.729 after repeating SVM 500 times with fivefold cross-validations. After parameter optimization, the SVM model was used to predict an independent cohort with 51 CHB patients and 72 HCC patients and the AUC was 0.727. In conclusion, the use of the NGS method revealed a prominent divergence in preS deletion patterns between disease groups and virus genotypes, but not between different tissue types. Quantitative NGS data combined with a machine learning method could be a powerful approach for prediction of the status of different diseases. %U https://www.microbiologyresearch.org/content/journal/jgv/10.1099/jgv.0.000942