dbCAN Logo

๐Ÿ“Š Overview Summary

6,031 Species Representative Genomes
13,886 Total Genomes
414,379 CAZymes
121,883 CAZyme Gene Clusters (CGCs)

๐ŸŒ Genomes Distribution by Continent

Geographic Distribution Analysis: This distribution shows the global representation of genomes in our database, with Asia and Africa having the highest representation.
Asia Africa Europe North America Oceania South America NA Total
Representative Genomes 2,105 34.9% 1,706 28.3% 1,095 18.2% 371 6.2% 105 1.7% 16 0.3% 633 10.5% 6,031
Total Genomes 5,829 42.0% 4,055 29.2% 1,650 11.9% 554 4.0% 184 1.3% 36 0.3% 1,578 11.4% 13,886

๐Ÿงฌ CAZymes & CGCs Analysis

414,379 Total CAZymes
121,883 CGCs
210,038 CAZymes in CGCs
50.7% CAZymes in Clusters
68.7 Avg CAZymes/Genome
20.2 Avg CGCs/Genome

๐Ÿ”ฌ Gene Clusters & CGC Families

Gene Clusters Count
PULs 602
iPULs 756
CGCs 121,883
Total 123,241
CGC Families Count
Single-substrate 238
Multi-substrate 21
Families without Substrate 11,055
Total 11,314
58,579 CGCs clustered into families
4,640 CGCs assigned to Substrate
7.9% Assignment Rate

๐Ÿฅ— Gut Metagenome Samples

Diet(De Filippis et al., 2019)
Diet Type # of Samples
Vegan 10
Omnivore 10
Vegetarian 10
Diet(Huang et al., 2024)
Diet Type # of Samples
Vegan 10
Omnivore 10
Flexitarian 10
IBD(Lloyd-Price et al., 2019)
Conditions # of Samples
CD 10
UC 10
non-IBD 10

๐Ÿงช Structure Similarity (Foldseek alignment TM-score) of UnKnown Proteins (total 42,002)

Database with best hit (alntmscore)
โ‰ฅ0.5 โ‰ฅ0.6 โ‰ฅ0.7 โ‰ฅ0.8 โ‰ฅ0.9
AFDB 9868 (23.5%) 9752 (23.2%) 9009 (21.4%) 6075 (14.5%) 1937 (4.6%)
CAZyme3D-Whole 8825 (21.0%) 8590 (20.5%) 7805 (18.6%) 5324 (12.7%) 1805 (4.3%)
CAZyme-ID50 9074 (21.6%) 8825 (21.0%) 8013 (19.1%) 5417 (12.9%) 1546 (3.7%)
SwissProt 9289 (22.1%) 9173 (21.8%) 8488 (20.2%) 5692 (13.6%) 1738 (4.1%)
PDB 8272 (19.7%) 8177 (19.5%) 7461 (17.8%) 5226 (12.4%) 943 (2.2%)