Select your database (Default: FUNGI DB)
Introduction
We published the old dbCAN-seq database in 2018 to provide
pre-computed
CAZyme and CGC (CAZyme gene cluster) sequence and annotation data for 5,349 bacterial isolate genomes.
In the past five years, numerous microbiomes have been sequenced and hundreds of thousands of
metagenome assembled genomes (MAGs) from various ecological environments are now available in the
public databases such as the MGnify database of European
Bioinformatics Institute
and the IMG/M database of Joint Genome Institute. Currently, no databases
collect
CAZymes and CGCs from microbiome MAGs and provide them on the web.
In the meantime, the CAZyme bioinformatics field continues to develop.
It is now possible to infer carbohydrate substrates for CAZymes and CGCs,
which is of a huge interest to applied microbiome.
To provide an comprehensive CAZymes and CGCs database for the community, we collect MAGs from
four ecological environments to update dbCAN-seq database.
In this update, we have made the following major and significant advances:
(i) ~498,000 CAZymes and ~169,000 CAZyme gene clusters (CGCs) from 9,421 MAGs of four ecological (human gut, human oral, cow rumen, and marine) environments;
(ii) Glycan substrates for 41,447 (24.54%) CGCs inferred by two novel approaches (dbCAN-PUL homology search and eCAMI subfamily majority voting);
(iii) A redesigned CGC page to include the graphical display of CGC gene compositions, the alignment of query CGC and subject PUL (polysaccharide utilization loci) of dbCAN-PUL to illustrate the substrate inference, and the eCAMI subfamily table to support substrate predicted by eCAMI subfamilies;
(iv) A statistics page to organize all the data for easy CGC access according to substrates and taxonomic phyla; and
(v) A batch download page.
It should be noted that all predicted substrate assignments for CGCs in dbCAN-seq need experimental validation. It is our hope that these predicted CGCs and substrates in the microbiomes of four ecological environments could facilitate the experimental characterization of new polysaccharide utilization loci (PULs) by the carbohydrate community.