Help

Citation

Please cite:
Robb SM, Gotting K, Ross E, Sánchez Alvarado A., SmedGD 2.0: The Schmidtea mediterranea genome database., Genesis 2015 Jul [Epub ahead of print]

Version

This is SmedGD 2.0

Smed Unigene Help

Definition

  • Smed Unigenes are amino acid sequences that represent a cluster of similar transcript sequences. This amino acid sequence enables a series of protein motif predictions to be calculated and displayed in the Smed Unigene GBrowse instance and in the Smed Unigene Gene Page. Tools have been generated to download the amino acid sequence and the clustered transcript nucleotide sequences.
  • Smed Unigenes have identifiers that begin with ‘SMU’.
  • Every Smed Unigene has a gene page that can be found by performing a Smed Unigene protein domain or GO term search or by selecting ‘SmedGD Unigene’ from the Pop up balloon on any of the protein homology tracks in the Smed Unigene Browser.

Putative Function

  • The putative function of each Smed Unigene is determined bioinformatically. Each of the associated transcripts have been used to search Swissprot and enSEMBL databases. All of the hit descriptions (for each Smed Unigene associated transcript) is examined for common phrases (more than one word) across transcripts. The longest and most common phrase that appears in more than one transcript hit descriptions is used as the putative function. “Cannot be determined” indicates that no common phrase could be found in more than one transcript

Multiple Assemblies

  • Multiple Genome Assemblies as well as a Smed Unigene browser are available for viewing and search with GBrowse. Links can be found in the top menu from within the “GBrowse” tab. Links to each Browser can also be found within any of the Browsers from the dropdown menu entitled “Data Source”.

    Additionally each Browser can be accessed from these links:

Gbrowse Help

General Help

  • See GBrowse Help Page Here

User Accounts

  • Create an account to save information you would like to return to at a later date, such as Snapshots

Bookmark this

  • Find the ‘Bookmark this’ function in the Gbrowse File menu.
  • This function generates a URL which can then be bookmarked with your Web Browser.
  • This URL contains information about the reference sequence and tracks in your current view

Snapshots

  • To save the current browser view for your future reviewing, including the specific reference sequence and track selection simply select ‘Save Snapshot’
  • To save the current browser view for others first select ‘Bookmark this’ in the ‘File’ menu, then select ‘Save Snapshot’
  • Snapshots can be retrieved and sent to other from the ‘Snapshot’ tab

Custom Tracks

  • Upload your own track from a file or URL.
  • This track can be in a variety of formats, gff3, bed, wig, etc.
  • See the custom track help page for information about file formats
  • Save your custom track as ‘Casual’ and share the link with others
  • Save track as ‘Private’ and only you can view it
  • Save track as ‘Public’ and it will appear in the ‘Community Tracks’ tab for everyone to see.
  • Read this manuscript written by Lincoln Stein explaining how to share Next-Gen seq data, “Using GBrowse 2.0 to visualize and share next-generation sequence data

4 comments

  1. Fonteneau says:

    Hey!
    Is ti possible to use the searchable function in batch with multiple SMU code?

    • Sofia Robb says:

      Hi Eric,

      We are working on a search function to retrieve a list of links to the SMUnigene gene pages. But in the mean time, we have made a tab delimited file available on our downloads page. Please let me know if this has all the SMUnigene annotation information you are looking for.

      Sofia

  2. Damian Kao says:

    Hi Thanks for the great resource. How was similarity determined when generating your Smed Unigene dataset? And which transcript dataset (Asxl or Sxl MAKER?) did you use for generating the unigene dataset?

    • Sofia Robb says:

      Hi Damian,
      We are very happy our tool is useful to you! Information on how the unigenes were created can be found in our SmedGD 2.0 publication and on the help page.

      This is from SmedGD 2.0:

      Smed Unigenes
      To produce a consistent gene set that can be used in multiple genome assemblies, we created a set of non- redundant sequences we call Smed Unigenes (with identifiers that begin with ‘SMU’). We began with four Trinity (Grabherr, 2011) assembled transcriptomes from S. mediterranea: Intact Whole Animal Asxl worms (GCZZ00000000); Intact Whole Animal Sxl worms (GDAG00000000); Pooled Samples from an Asxl regeneration time course (unpublished data); and pooled samples from a Sxl regeneration time-course (Xiang et al., 2014). We translated each transcriptome assem- bly into putative coding sequences with Transdecoder, provided by the Trinity suite, and combined the sequen- ces into a single redundant set. To remove the redun- dancy of the set we used CD-HIT (Fu et al., 2012) to cluster all sequences with an identity of greater than or equal to 95%. The resultant sequence set was then fil- tered for contaminate with Seqclean (seqclean, 2011). To maximize the utility of the Smed Unigenes, sequen- ces were annotated with the best BLASTx (Camacho et al., 2009) hit to SwissProt (UniProt, 2015) and NCBI’s NR database, using an e-value cutoff of 0.001. We also identified PFAM (Finn et al., 2014) domains present in Smed Unigene sequences using hmmscan, version 3.1b1, from the HMMER package (Finn et al., 2011) with an e-value cutoff of 0.01. We also annotated these transcripts with tmhmm, version 2.0c, (Krogh et al., 2001) signalP, version 4.1, (Petersen et al., 2011) and ncoils (Lupas, 1996). Any sequence below 300 base pairs in length that did not have any annotations attached were discarded. The final Smed Unigene set consists of 32,615 sequences with an average nucleotide length of 1,061. Additionally, tools have been gen- erated in SmedGD 2.0 to download the amino acid sequence and the constituent transcript nucleotide sequences. Besides having been aligned to all available genome assemblies, the Smed Unigene sets have search- able protein domain feature along with homology infor- mation and a comprehensive gene page.

      Sofia

Leave a Reply

Your email address will not be published. Required fields are marked *