About KORSAIR


Identifying variant allele frequency of population-specific human genome is one of the primary procedures for studying human genetics. For this purpose, researchers used to employ human genome variant databases contributed internationally. However, as these databases are Caucasian population-centric, it is inevitable to suffer reduced power while studying genetic variants from the non-Caucasian population, including the Korean population.

We developed KOrean Reference SNP And Indel Repository(KORSAIR), consisting of genetic variants from 944 normal Korean human genome samples. We gathered normal samples from large Korean genome projects to construct a genetic variant resource representing the normal Korean population. Korean genome samples of autism spectrum disorder(ASD) study contributed by IBS, KAIST, and SNUBH and samples of the Korean reference genome study of the Korea Disease Control and Prevention Agency (KDCA).

Specifically, we collected normal parental genomic data (802 individual genomic data) among samples used in the ASD study paper recently registered on bioRxiv (link). Additionally, we also included samples(398 samples) from Korean Reference Genome(KRG) project(link).

We carefully tested and checked the quality of genome sequence data from these two studies with VerifyBamID and Qualimap. We filtered out samples that don't satisfy our quality criteria (VerifyBamID FREEMIX ≤ 0.03, Genome depth of coverage ≥ 24). Consequently, all samples(802 samples) from the ASD study passed the requirements. Besides, out of 398 KRG samples, we selected 142 samples satisfying the strict quality criteria.

We called genetic variants with the final 944 samples passed our filtering criteria. We analyzed genomic data with the Genome Analysis Toolkit's germline variant calling pipeline, consisting of the joint-genotyping process with Genome Analysis Toolkit's HaplotypeCaller and GenotypeGVCFs.

Currently, KORSAIR provides 32,793,282 SNVs and 7,192,742 indels variants identified through the variant calling pipeline. KORSAIR also provides supplemental variant information annotated with VEP, including global population allele frequency from gnomAD, deleteriousness scores from SIFT, PolyPhen, and CADD. In the future, it plans to add WGS data collected by the National pilot Project of Bio Big Data Construction.

KORSAIR provides these cautiously genotyped Korean genome variants through our web interface(link). Researchers can query variants by Gene symbol, variant location, dbSNP ID, genomic region, or Ensembl identifiers through the web interface and more programmatic RESTful APIs. The queried variants can be browsed on the KORSAIR web portal and downloaded in the CSV format file. For researchers who need to annotate their variants on local machines, KORSAIR variants are downloadable in VCF format with no restriction.


Contributing Orginizations
KISTI
KAIST
KNCC
IBS
SNUBH
Contributing Projects
Korean Autism trio genome project
Eunjun Kim
Hee Jeong Yoo
Jeong Ho Lee
Jung Kyoon Choi
KCDC Korean genome project
Seong Beom Cho
KORSAIR development
Variant analysis
Jung Woo Park
Yongseong Cho
Junehawk Lee
Web implementation
Yongseong Cho
Junehawk Lee
Chanseok Jeong
Jihyub Moon
Genome data collection
Junehawk Lee
Hyojeong Paik
Jihyum Moon
Hyojin Kang
Funding