The cost of DNA sequencing has decreased due to advancements
in Next Generation Sequencing. The number of sequences
obtained from the Illumina platform is large, use of
this platform can reduce costs more than the 454 pyrosequencer.
However, the Illumina platform has other challenges,
including bioinformatics analysis of large numbers
of sequences and the need to reduce erroneous nucleotides
generated at the 3-ends of the sequences. These erroneous
sequences can lead to errors in analysis of microbial communities.
Therefore, correction of these erroneous sequences
is necessary for accurate taxonomic identification. Several
studies that have used the Illumina platform to perform metagenomic
analyses proposed curating pipelines to increase
accuracy. In this study, we evaluated the likelihood of obtaining
an erroneous microbial composition using the MiSeq
250 bp paired sequence platform and improved the pipeline
to reduce erroneous identifications. We compared different
sequencing conditions by varying the percentage of control
phiX added, the concentration of the sequencing library, and
the 16S rRNA gene target region using a mock community
sample composed of known sequences. Our recommended
method
corrected erroneous nucleotides and improved identification
accuracy. Overall, 99.5% of the total reads shared
95% similarity with the corresponding template sequences
and 93.6% of the total reads shared over 97% similarity. This
indicated that the MiSeq platform can be used to analyze microbial
communities at the genus level with high accuracy.
The improved analysis method recommended in this study
can be applied to amplicon studies in various environments
using high-throughput reads generated on the MiSeq platform.