##gff-version 3
##sequence-region NZ_DS499729 1 8289
# conversion-by bp_genbank2gff3.pl
# organism Anaerostipes caccae L1-92
# Note Anaerostipes caccae L1-92 strain DSM 14662 Scfld_03_10, whole genome shotgun sequence.
# date 25-APR-2021
NZ_DS499729	GenBank	region	1	8289	.	+	1	ID=NZ_DS499729;Dbxref=BioProject:PRJNA224116,taxon:411490;Name=NZ_DS499729;Note=Anaerostipes caccae L1-92 strain DSM 14662 Scfld_03_10%2C whole genome shotgun sequence.,REFSEQ INFORMATION: The reference sequence is identical to DS499729.1.  Anaerostipes caccae (GenBank Accession Number: AJ270487) is a  member of the division Firmicutes. It is an acetate-converting  butyrate-producing colon bacteria that is involved in metabolic  cross-feeding with Bifidobacterium species (Falony et. al. (2006),Belenguer et. al. (2006)). The sequenced strain was obtained from  Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ)  (DSM 14662).    We have collected 9.8X coverage in plasmid end reads (pOTW13 and  pJAZZ vectors) and 454 reads. We have performed one round of  automated sequence improvement(pre-finishing),along with manual  improvement that includes breaking apart any mis-assembly,and  making manual joins where possible. Manual edits also are made  where the consensus appears to be incorrect. All low quality data  on the ends of contigs is removed. Contigs are ordered and oriented  where possible.    Sequencing/Assembly: The genomic DNA was purified from liquid  culture derived from a single bacterial colony. A hybrid sequencing  strategy that utilized reads from both 454 GS-20 and ABI 3730xl  sequencers was devised and implemented to generate the draft genome  sequences. 454 reads were assembled using Newbler (454 Life  Sciences) into 454 de novo contigs. These de novo contigs were  converted in silico to 800 base paired reads ('superreads') with  400 base overlaps with neighboring superreads. Finally,PCAP  (Huang,et al,Genome Research,13:2164,(2003)) was used to  assemble the super-reads and the conventional 3730xl capillary  reads.    This sequenced strain is part of a comprehensive,sequence-based  survey of members of the normal human gut microbiota. A joint  effort of the WU-GSC and the Center for Genome Sciences at  Washington University School of Medicine,the purpose of this  survey is to provide the general scientific community with a broad  view of the gene content of 100 representatives of the major  divisions represented in the intestine's microbial community. This  information should provide a frame of reference for analyzing  metagenomic studies of the human gut microbiome. Further details of  this effort are described in a white paper entitled 'Extending Our  View of Self: the Human Gut Microbiome Initiative (HGMI)'  (http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/HGMIS  eq.pdf). These studies are supported by National Human Genome  Research Institute.    For answers to your questions regarding this assembly or project,or any other GSC genome project,please visit our Genome Groups web  page (http://genome.wustl.edu/genome_group_index.cgi) and email the  designated contact person.  Anaerostipes caccae (GenBank Accession Number: AJ270487) is a  member of the division Firmicutes. It is an acetate-converting  butyrate-producing colon bacteria that is involved in metabolic  cross-feeding with Bifidobacterium species (Falony et. al. (2006),Belenguer et. al. (2006)). The sequenced strain was obtained from  Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ)  (DSM 14662).    We have collected 9.8X coverage in plasmid end reads (pOTW13 and  pJAZZ vectors) and 454 reads. We have performed one round of  automated sequence improvement(pre-finishing),along with manual  improvement that includes breaking apart any mis-assembly,and  making manual joins where possible. Manual edits also are made  where the consensus appears to be incorrect. All low quality data  on the ends of contigs is removed. Contigs are ordered and oriented  where possible.    Sequencing/Assembly: The genomic DNA was purified from liquid  culture derived from a single bacterial colony. A hybrid sequencing  strategy that utilized reads from both 454 GS-20 and ABI 3730xl  sequencers was devised and implemented to generate the draft genome  sequences. 454 reads were assembled using Newbler (454 Life  Sciences) into 454 de novo contigs. These de novo contigs were  converted in silico to 800 base paired reads ('superreads') with  400 base overlaps with neighboring superreads. Finally,PCAP  (Huang,et al,Genome Research,13:2164,(2003)) was used to  assemble the super-reads and the conventional 3730xl capillary  reads.    This sequenced strain is part of a comprehensive,sequence-based  survey of members of the normal human gut microbiota. A joint  effort of the WU-GSC and the Center for Genome Sciences at  Washington University School of Medicine,the purpose of this  survey is to provide the general scientific community with a broad  view of the gene content of 100 representatives of the major  divisions represented in the intestine's microbial community.    Coding sequences were predicted using GeneMark v3.3 and Glimmer2  v2.13. Intergenic regions not spanned by GeneMark and Glimmer2 were  blasted against NCBI's non-redundant (NR) database and predictions  generated based on protein alignments. tRNA genes were determined  using tRNAscan-SE 1.23 and non-coding RNA genes by RNAmmer-1.2 and  Rfam v8.0. Gene names are generated at the contig level and may not  necessarily reflect any known order or orientation between contigs.  This information should provide a frame of reference for analyzing  metagenomic studies of the human gut microbiome. Further details of  this effort are described in a white paper entitled 'Extending Our  View of Self: the Human Gut Microbiome Initiative (HGMI)'  (http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/HGMIS  eq.pdf). These studies are supported by National Human Genome  Research Institute.    For answers to your questions regarding this assembly or project,or any other GSC genome project,please visit our Genome Groups web  page (http://genome.wustl.edu/genome_group_index.cgi) and email the  designated contact person.    Annotation was added to the contigs in February 2008.    This is a reference genome for the Human Microbiome Project. This  project is co-owned with the Human Microbiome Project DACC.  Product names were updated in August 2012.  The annotation was added by the NCBI Prokaryotic Genome Annotation  Pipeline (PGAP). Information about PGAP can be found here:  https://www.ncbi.nlm.nih.gov/genome/annotation_prok/    \n##Genome-Annotation-Data-START##\nAnnotation Provider :: NCBI RefSeq\nAnnotation Date :: 04/25/2021 09:13:49\nAnnotation Pipeline :: NCBI Prokaryotic Genome\nAnnotation Pipeline (PGAP)\nAnnotation Method :: Best-placed reference protein\nset,GeneMarkS-2+\nAnnotation Software revision :: 5.1\nFeatures Annotated :: Gene,CDS,rRNA,tRNA,ncRNA,\nrepeat_region\nGenes (total) :: 3,533\nCDSs (total) :: 3,441\nGenes (coding) :: 3,342\nCDSs (with protein) :: 3,342\nGenes (RNA) :: 92\nrRNAs :: 5,6,7 (5S,16S,23S)\ncomplete rRNAs :: 5,4,4 (5S,16S,23S)\npartial rRNAs :: 2,3 (16S,23S)\ntRNAs :: 70\nncRNAs :: 4\nPseudo Genes (total) :: 99\nCDSs (without protein) :: 99\nPseudo Genes (ambiguous residues) :: 0 of 99\nPseudo Genes (frameshifted) :: 72 of 99\nPseudo Genes (incomplete) :: 23 of 99\nPseudo Genes (internal stop) :: 12 of 99\nPseudo Genes (multiple problems) :: 8 of 99\nCRISPR Arrays :: 7\n##Genome-Annotation-Data-END##;comment1=REFSEQ INFORMATION: The reference sequence is identical to DS499729.1.  Anaerostipes caccae (GenBank Accession Number: AJ270487) is a  member of the division Firmicutes. It is an acetate-converting  butyrate-producing colon bacteria that is involved in metabolic  cross-feeding with Bifidobacterium species (Falony et. al. (2006)%2C  Belenguer et. al. (2006)). The sequenced strain was obtained from  Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ)  (DSM 14662).    We have collected 9.8X coverage in plasmid end reads (pOTW13 and  pJAZZ vectors) and 454 reads. We have performed one round of  automated sequence improvement(pre-finishing)%2C along with manual  improvement that includes breaking apart any mis-assembly%2C and  making manual joins where possible. Manual edits also are made  where the consensus appears to be incorrect. All low quality data  on the ends of contigs is removed. Contigs are ordered and oriented  where possible.    Sequencing/Assembly: The genomic DNA was purified from liquid  culture derived from a single bacterial colony. A hybrid sequencing  strategy that utilized reads from both 454 GS-20 and ABI 3730xl  sequencers was devised and implemented to generate the draft genome  sequences. 454 reads were assembled using Newbler (454 Life  Sciences) into 454 de novo contigs. These de novo contigs were  converted in silico to 800 base paired reads ('superreads') with  400 base overlaps with neighboring superreads. Finally%2C PCAP  (Huang%2C et al%2C Genome Research%2C 13:2164%2C (2003)) was used to  assemble the super-reads and the conventional 3730xl capillary  reads.    This sequenced strain is part of a comprehensive%2C sequence-based  survey of members of the normal human gut microbiota. A joint  effort of the WU-GSC and the Center for Genome Sciences at  Washington University School of Medicine%2C the purpose of this  survey is to provide the general scientific community with a broad  view of the gene content of 100 representatives of the major  divisions represented in the intestine's microbial community. This  information should provide a frame of reference for analyzing  metagenomic studies of the human gut microbiome. Further details of  this effort are described in a white paper entitled 'Extending Our  View of Self: the Human Gut Microbiome Initiative (HGMI)'  (http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/HGMIS  eq.pdf). These studies are supported by National Human Genome  Research Institute.    For answers to your questions regarding this assembly or project%2C  or any other GSC genome project%2C please visit our Genome Groups web  page (http://genome.wustl.edu/genome_group_index.cgi) and email the  designated contact person.  Anaerostipes caccae (GenBank Accession Number: AJ270487) is a  member of the division Firmicutes. It is an acetate-converting  butyrate-producing colon bacteria that is involved in metabolic  cross-feeding with Bifidobacterium species (Falony et. al. (2006)%2C  Belenguer et. al. (2006)). The sequenced strain was obtained from  Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ)  (DSM 14662).    We have collected 9.8X coverage in plasmid end reads (pOTW13 and  pJAZZ vectors) and 454 reads. We have performed one round of  automated sequence improvement(pre-finishing)%2C along with manual  improvement that includes breaking apart any mis-assembly%2C and  making manual joins where possible. Manual edits also are made  where the consensus appears to be incorrect. All low quality data  on the ends of contigs is removed. Contigs are ordered and oriented  where possible.    Sequencing/Assembly: The genomic DNA was purified from liquid  culture derived from a single bacterial colony. A hybrid sequencing  strategy that utilized reads from both 454 GS-20 and ABI 3730xl  sequencers was devised and implemented to generate the draft genome  sequences. 454 reads were assembled using Newbler (454 Life  Sciences) into 454 de novo contigs. These de novo contigs were  converted in silico to 800 base paired reads ('superreads') with  400 base overlaps with neighboring superreads. Finally%2C PCAP  (Huang%2C et al%2C Genome Research%2C 13:2164%2C (2003)) was used to  assemble the super-reads and the conventional 3730xl capillary  reads.    This sequenced strain is part of a comprehensive%2C sequence-based  survey of members of the normal human gut microbiota. A joint  effort of the WU-GSC and the Center for Genome Sciences at  Washington University School of Medicine%2C the purpose of this  survey is to provide the general scientific community with a broad  view of the gene content of 100 representatives of the major  divisions represented in the intestine's microbial community.    Coding sequences were predicted using GeneMark v3.3 and Glimmer2  v2.13. Intergenic regions not spanned by GeneMark and Glimmer2 were  blasted against NCBI's non-redundant (NR) database and predictions  generated based on protein alignments. tRNA genes were determined  using tRNAscan-SE 1.23 and non-coding RNA genes by RNAmmer-1.2 and  Rfam v8.0. Gene names are generated at the contig level and may not  necessarily reflect any known order or orientation between contigs.  This information should provide a frame of reference for analyzing  metagenomic studies of the human gut microbiome. Further details of  this effort are described in a white paper entitled 'Extending Our  View of Self: the Human Gut Microbiome Initiative (HGMI)'  (http://www.genome.gov/Pages/Research/Sequencing/SeqProposals/HGMIS  eq.pdf). These studies are supported by National Human Genome  Research Institute.    For answers to your questions regarding this assembly or project%2C  or any other GSC genome project%2C please visit our Genome Groups web  page (http://genome.wustl.edu/genome_group_index.cgi) and email the  designated contact person.    Annotation was added to the contigs in February 2008.    This is a reference genome for the Human Microbiome Project. This  project is co-owned with the Human Microbiome Project DACC.  Product names were updated in August 2012.  The annotation was added by the NCBI Prokaryotic Genome Annotation  Pipeline (PGAP). Information about PGAP can be found here:  https://www.ncbi.nlm.nih.gov/genome/annotation_prok/    \n##Genome-Annotation-Data-START##\nAnnotation Provider :: NCBI RefSeq\nAnnotation Date :: 04/25/2021 09:13:49\nAnnotation Pipeline :: NCBI Prokaryotic Genome\nAnnotation Pipeline (PGAP)\nAnnotation Method :: Best-placed reference protein\nset%3B GeneMarkS-2+\nAnnotation Software revision :: 5.1\nFeatures Annotated :: Gene%3B CDS%3B rRNA%3B tRNA%3B ncRNA%3B\nrepeat_region\nGenes (total) :: 3%2C533\nCDSs (total) :: 3%2C441\nGenes (coding) :: 3%2C342\nCDSs (with protein) :: 3%2C342\nGenes (RNA) :: 92\nrRNAs :: 5%2C 6%2C 7 (5S%2C 16S%2C 23S)\ncomplete rRNAs :: 5%2C 4%2C 4 (5S%2C 16S%2C 23S)\npartial rRNAs :: 2%2C 3 (16S%2C 23S)\ntRNAs :: 70\nncRNAs :: 4\nPseudo Genes (total) :: 99\nCDSs (without protein) :: 99\nPseudo Genes (ambiguous residues) :: 0 of 99\nPseudo Genes (frameshifted) :: 72 of 99\nPseudo Genes (incomplete) :: 23 of 99\nPseudo Genes (internal stop) :: 12 of 99\nPseudo Genes (multiple problems) :: 8 of 99\nCRISPR Arrays :: 7\n##Genome-Annotation-Data-END##;date=25-APR-2021;host=Homo sapiens;isolation_source=biological product [ENVO:02000043];mol_type=genomic DNA;organism=Anaerostipes caccae L1-92;strain=DSM 14662;type_material=type strain of Anaerostipes caccae
NZ_DS499729	GenBank	gene	1	8	.	-	1	ID=ANACAC_RS03030;Name=ANACAC_RS03030;old_locus_tag=ANACAC_00492
NZ_DS499729	GenBank	mRNA	1	8	.	-	1	ID=ANACAC_RS03030;Parent=ANACAC_RS03030
NZ_DS499729	GenBank	CDS	1	8	.	-	1	ID=ANACAC_RS03030;Parent=ANACAC_RS03030;Name=ANACAC_RS03030;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_006566000.1;old_locus_tag=ANACAC_00492;product=DUF1700 domain-containing protein;protein_id=WP_006566000.1;transl_table=11;translation=length.203
NZ_DS499729	GenBank	exon	1	8	.	-	1	Parent=ANACAC_RS03030
NZ_DS499729	GenBank	gene	1	318	.	-	1	ID=ANACAC_RS03035;Name=ANACAC_RS03035;old_locus_tag=ANACAC_00493
NZ_DS499729	GenBank	mRNA	1	318	.	-	1	ID=ANACAC_RS03035;Parent=ANACAC_RS03035
NZ_DS499729	GenBank	CDS	1	318	.	-	1	ID=ANACAC_RS03035;Parent=ANACAC_RS03035;Name=ANACAC_RS03035;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_006566001.1;old_locus_tag=ANACAC_00493;product=PadR family transcriptional regulator;protein_id=WP_006566001.1;transl_table=11;translation=length.105
NZ_DS499729	GenBank	exon	1	318	.	-	1	Parent=ANACAC_RS03035
NZ_DS499729	GenBank	gene	477	1448	.	-	1	ID=ANACAC_RS03040;Name=ANACAC_RS03040;old_locus_tag=ANACAC_00494
NZ_DS499729	GenBank	mRNA	477	1448	.	-	1	ID=ANACAC_RS03040;Parent=ANACAC_RS03040
NZ_DS499729	GenBank	CDS	477	1448	.	-	1	ID=ANACAC_RS03040;Parent=ANACAC_RS03040;Name=ANACAC_RS03040;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_008394375.1;old_locus_tag=ANACAC_00494;product=carbohydrate kinase;protein_id=WP_006566002.1;transl_table=11;translation=length.323
NZ_DS499729	GenBank	exon	477	1448	.	-	1	Parent=ANACAC_RS03040
NZ_DS499729	GenBank	gene	1445	2908	.	-	1	ID=ANACAC_RS03045;Name=ANACAC_RS03045;old_locus_tag=ANACAC_00495
NZ_DS499729	GenBank	mRNA	1445	2908	.	-	1	ID=ANACAC_RS03045;Parent=ANACAC_RS03045
NZ_DS499729	GenBank	CDS	1445	2908	.	-	1	ID=ANACAC_RS03045;Parent=ANACAC_RS03045;Name=ANACAC_RS03045;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_006566003.1;old_locus_tag=ANACAC_00495;product=glycoside hydrolase family 32 protein;protein_id=WP_006566003.1;transl_table=11;translation=length.487
NZ_DS499729	GenBank	exon	1445	2908	.	-	1	Parent=ANACAC_RS03045
NZ_DS499729	GenBank	gene	2919	4832	.	-	1	ID=ANACAC_RS03050;Name=ANACAC_RS03050;old_locus_tag=ANACAC_00496
NZ_DS499729	GenBank	mRNA	2919	4832	.	-	1	ID=ANACAC_RS03050;Parent=ANACAC_RS03050
NZ_DS499729	GenBank	CDS	2919	4832	.	-	1	ID=ANACAC_RS03050;Parent=ANACAC_RS03050;eC_number=2.7.1.211;Name=ANACAC_RS03050;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_009288910.1;old_locus_tag=ANACAC_00496;product=sucrose-specific PTS transporter subunit IIBC;protein_id=WP_006566004.1;transl_table=11;translation=length.637
NZ_DS499729	GenBank	exon	2919	4832	.	-	1	Parent=ANACAC_RS03050
NZ_DS499729	GenBank	gene	5003	6037	.	-	1	ID=ANACAC_RS03055;Name=ANACAC_RS03055;old_locus_tag=ANACAC_00497
NZ_DS499729	GenBank	mRNA	5003	6037	.	-	1	ID=ANACAC_RS03055;Parent=ANACAC_RS03055
NZ_DS499729	GenBank	CDS	5003	6037	.	-	1	ID=ANACAC_RS03055;Parent=ANACAC_RS03055;Name=ANACAC_RS03055;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_006566005.1;old_locus_tag=ANACAC_00497;product=hypothetical protein;protein_id=WP_006566005.1;transl_table=11;translation=length.344
NZ_DS499729	GenBank	exon	5003	6037	.	-	1	Parent=ANACAC_RS03055
NZ_DS499729	GenBank	gene	6034	6588	.	-	1	ID=ANACAC_RS03060;Name=ANACAC_RS03060;old_locus_tag=ANACAC_00498
NZ_DS499729	GenBank	mRNA	6034	6588	.	-	1	ID=ANACAC_RS03060;Parent=ANACAC_RS03060
NZ_DS499729	GenBank	CDS	6034	6588	.	-	1	ID=ANACAC_RS03060;Parent=ANACAC_RS03060;Name=ANACAC_RS03060;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_009288911.1;old_locus_tag=ANACAC_00498;product=sigma-70 family RNA polymerase sigma factor;protein_id=WP_039946252.1;transl_table=11;translation=length.184
NZ_DS499729	GenBank	exon	6034	6588	.	-	1	Parent=ANACAC_RS03060
NZ_DS499729	GenBank	gene	6716	7705	.	-	1	ID=ANACAC_RS03065;Name=ANACAC_RS03065;old_locus_tag=ANACAC_00499
NZ_DS499729	GenBank	mRNA	6716	7705	.	-	1	ID=ANACAC_RS03065;Parent=ANACAC_RS03065
NZ_DS499729	GenBank	CDS	6716	7705	.	-	1	ID=ANACAC_RS03065;Parent=ANACAC_RS03065;Name=ANACAC_RS03065;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_007713560.1;old_locus_tag=ANACAC_00499;product=LacI family DNA-binding transcriptional regulator;protein_id=WP_182483054.1;transl_table=11;translation=length.329
NZ_DS499729	GenBank	exon	6716	7705	.	-	1	Parent=ANACAC_RS03065
NZ_DS499729	GenBank	gene	7846	8289	.	+	1	ID=ANACAC_RS03070;Name=ANACAC_RS03070;old_locus_tag=ANACAC_00500
NZ_DS499729	GenBank	mRNA	7846	8289	.	+	1	ID=ANACAC_RS03070;Parent=ANACAC_RS03070
NZ_DS499729	GenBank	CDS	7846	8289	.	+	1	ID=ANACAC_RS03070;Parent=ANACAC_RS03070;Name=ANACAC_RS03070;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_009288913.1;old_locus_tag=ANACAC_00500;product=MarR family transcriptional regulator;protein_id=WP_006566008.1;transl_table=11;translation=length.147
NZ_DS499729	GenBank	exon	7846	8289	.	+	1	Parent=ANACAC_RS03070
NZ_DS499729	GenBank	gene	8286	8289	.	+	1	ID=ANACAC_RS03075;Name=ANACAC_RS03075;old_locus_tag=ANACAC_00501
NZ_DS499729	GenBank	mRNA	8286	8289	.	+	1	ID=ANACAC_RS03075;Parent=ANACAC_RS03075
NZ_DS499729	GenBank	CDS	8286	8289	.	+	1	ID=ANACAC_RS03075;Parent=ANACAC_RS03075;Name=ANACAC_RS03075;Note=Derived by automated computational analysis using gene prediction method: Protein Homology.;codon_start=1;inference=COORDINATES: similar to AA sequence:RefSeq:WP_009288914.1;old_locus_tag=ANACAC_00501;product=flavodoxin;protein_id=WP_006566009.1;transl_table=11;translation=length.182
NZ_DS499729	GenBank	exon	8286	8289	.	+	1	Parent=ANACAC_RS03075