Download cn
Transcript
User Manual Zhou Du ([email protected]) Version 1.0 1. What is agriGO? The agriGO is designed to automate the job for experimental biologists to identify enriched Gene Ontology (GO) terms in a list of microarray probe sets or gene identifiers (with or without expression information) and it is also a GO‐related database. The agriGO specially focus on agricultural species. 2. Why use agriGO? The agriGO provides heavy support to agricultural species. Not only limited to SEA analysis, GSEA which is achieved using PAGE method is also available. Furthermore we have BLAST4ID tool for ID transfer or annotation. And search as well as download function is accessible. The agriGO can give out rich outputs like graphical result, bar chart result and hierical tree which composing a comprehensive understanding of biological meaning of user's input data. 3. What data argiGO contains? We currently support on 35 species including 280 datatypes. Please check the data statistics page for detail information. We will continue adding more species and datatypes. 4. How agriGO prepare its data? Raw GO annotation data is generated using BLAST, Pfam, InterproScan by agriGO or obtained from B2G‐FAR center or from Gene Ontology. Arabidopsis genome data is from TAIR. Rice TIGR genome data is from Rice Genome Annotation Project. Rice KOME data is from KOME database. Rice Gramene data is from Gramene center. Populus genome data is collected from JGI. Soybean and Sorghum genome data is compiled from phytozome. Grape genome data is compiled from Genoscope. Medicago genome data is from Medicago truncatula sequencing resources. Maize genome data is from MaizeSequence.org. Castor bean genome data is from Castor Bean Genome Database. Brachypodium distachyon genome data is from Ensembl. Bovien genome data is from Bovine Genome Database. Silkworm genome data is from SilkDB. M. grisea genome data is from Magnaporthe grisea Database. affymetrixmetrix CSV files and array sequences are from NetAffx. 5. How to use tools p provided b by agriGO? edure Quick introducction to anaalysis proce 1. Choose tooll and set parrameters You should cho oose one too ol to go forrward. At th he right sidee, several frrames containing anno otation text are interactiive. The conttent will chaange depending on exactt parameterss you chosse. You can m make the help frames sho ow or hidden n by using HEELP buttons aat top‐right o of the pagee. 2. Submit yourr job and perform analyssis Afteer submitting your job, thee agriGO will pre‐check the validity of yyour upload d data. If your jjob is subm mitted successsfully, a job ID will be givven. Since th he analysis prrocess could take a while, you mayy close the waaiting page an nd use the jo ob ID to checkk the work latter. Please no ote that results of yourr jobs will be stored on ou ur server for TTHREE DAYS.. After 3 dayss all informattion of the job b will be d deleted. If you u want elongation contactt me. 3. Explore resu ults The agriGO provvides different ways to browse results of differeent tools. So ome of them m are flexible but you may need so ome specific ssetting to maake them to castor to you ur own demaands. And detailed intrroduction to tthese tools in n the manual in the follow wing will help you to achievve it. A. How to use e Singular EEnrichment Analysis (SSEA) analysiis? SEA is a tradition nal and widely used meth hod. It is simp ple to use an nd simple to understand. User onlyy needs to prrepare a list o of gene/prob be names, an nd enrichmen nt GO terms will be found d out after statistical teest from pre‐‐calculated baackground orr customized one. STEP P 1: To u use SEA analyysis, you should firstly select the type e of your queery list, either single names or nam mes with GO accession. Iff you choose using suppo orted speciess in agriGO, yyou only nee ed to provvide a list of ssequence identifiers. It should be note ed that you w would better sselect speciess and checck all allowed d ID types of corresponding species, then submit yyour IDs. Only allowed IDs are suitaable to be an nalysis in thiss type modee. And you caan mix your IDs from diffferent types.. Just ensu ure they are aallowed IDs. If yo ou choose cusstomized mode, you are n no long limite ed by agriGO‐‐owned speciies any more. You can use any IDs yyou have, butt only be noticed IDs should attach with GO accession! mit” to perforrm analysis n now and simply skip follo owing OK, theoreticallyy you can just click “Subm ps. Neverthelless, if you want w set mo ore advanced d parameterss, then keep p on readingg this step man nual. STEP P 2: Now w you can seet the backgground or reeference. The ere three tyypes: suggestted backgrou unds, customized referrence and cusstomized ann notated reference. The default paraameter is using suggested backgroun nds. For each h species, aggriGO will givve all posssible the baackground tyypes. To tho ose species without a relatively co ompleted profile, backkgrounds from m neighbored d organisms aare suggested. Users can select based on their pracctical need d, otherwise use customizzed referencee. ted backgrou In th he case that you do not want any off suggested pre‐computa p und, you can n use customized referrence instead d. NOTE: IDs iin reference list should from the samee species thatt one seleccted above fo or query list. Also o you can usee any IDs if you choose customized annotated reeference mode, howeverr, the pricee is to attach with GO accession to obttain such free edom. ☺ You can paste dirrect or upload d your file, fo or latter, pleaase make suree the file no b bigger than 4MB. STEP P 3: The advanced op ptions are op ptional but quite important. These op ptions are default hidden,, and need d to one clickk to make them visible. In n SEA analysis, there are tthree statistical test meth hods: hypeergeometric, chi‐square and fisher testt. Wheen the input//query list is ccompared wiith the previo ously computted backgrou und, or is a su ubset of reeference list,, choose hyp pergeometricc or fisher. When W both off your queryy list numberr and reference list number are quite small, you u may betterr choose fisheer test. When n the input/q query list h has few or no o intersection ns with the reeference list, the Chi‐square tests are m more approprriate. Nextt you can choose method d to do the multi‐test ad djustment. Seeven adjustm ment methods are available here, including: Yekutieli (FDR under depen ndency), Bon nferroni, Hocchberg, Hoch hberg (FDR R), Hommel, Holm, False Discovery Raate. Though I would sugggest perform adjustment test, you truly can tu urn off it and d use no adjust. While you y choose no adjust, th hen you mayy set significant level b below higherr. Terms undeer the cutoff of the signifficant level w will be highligh hted, and emphasized in analysis reesults, and it w will affect your test outpu ut. Minimum numbeer of mapping entries meeans that GO annotations that do not appear in at least the selected num mber of entrries will not be b shown. In n other word d, higher you set the num mber, more entries neeeded to makee one GO term m appear in tthe analysis reesult. Gene ontology type: t Plant GO G slim is a cut‐down ve ersion of thee GO ontologies containiing a subsset of the terms in the whole GO for plant. Last,, if you provvide a mail ad ddress, a nottification will be send wh hen the analysis is completed with h the link to the results. Providing P a email e address is optional to SEA analyysis, because e it is veryy fast. Greeeting! You caan now click ssubmit to perform the an nalysis. You can always geet interactive help from m the right heelp frames, aand a detailed tutorial in this manual, if you still h have any question then n contact me directly. In th he following w we will discusss the outputts of the SEA analysis. Singgular Enrich hment Analysis (SEA) Results R Partt 1: A brrief summaryy of your job will be given. The job ID iis useful with hin 3 days. A file containin ng all entitties in the query list that can be ann notated by GO G associated d with descrriptions is ab ble to dow wnload. Partt 2: In th his part, you u can browse the hieratical graph re esult. Note that t the graphical result was geneerated as sep parate graph hs for each of o the three GO categoriees, namely b biological pro ocess, moleecular function and cellular componeent. After select the category, uses caan specified their favo orite output fo ormat, graph h rank directio on and font ssize. The resu ult format meeans which ou utput form mat you prefferred. The rank r direction is used to o define the direction in your outputt, for instaance the direection in the example imaage is ‘top to o bottom’. An nd the font size is self‐eviident that user can set smaller size if there are m many nodes in their resultt. Clickk the ‘generaate image’ bo ottom after yyou set all parameters weell. The graph hical result w will be pressented accorrding to you ur own settings. The grraphical resu ult is a GO hieratical im mage conttaining all staatistically sign nificant termss. Thesse nodes in tthe image aree classified in nto ten levelss which are aassociated wiith correspon nding speccific colors. Th he smaller off the term’s q q‐value, the m more significaant statistically, and the no ode’s color is darker an nd redder (No ote: q‐value h here means tthat the valuee of the multtiple‐test adju usted p‐vaalue). Inside tthe box of the significant terms, the in nformation in ncludes: GO tterm, q‐value e, GO desccription, item m number mapping the GO O in the queryy list and bacckground, and d total numb ber of querry list and baackground. But when thosse term who ose q‐value iss higher than the cutoff se et by the u user, only GO O information n will be given n in the box. To b better undersstand the graaphical resultt, investigatio on of the ann notation diagram is suggested. If usser chooses PNG or JPG or GIF result format, lin nkage to thee term’s detaail is available by clickking those blo ocks. Partt 3: The terms selectted here aree children terms of root one (or called secondarry level term ms) or significant terms of secondaryy level terms. Thus, the baar chart givess user a brief f portray since e the GO terms are reelatively geneeral descriptiion. Similar to t the procedure of grap phical result, user shou uld specified their param meters beforee create the GO abundan nce chart. Usser can try these t settiing to obtain n favorite vieew of the chaart bar. Note e the setting you used will be recorde ed in yourr cookie and these settinggs will be default ones in your future jobs. In otheer word, you may try sseveral times and make yo our last attem mpt as your own features. Heree the bar chaart is using gllass bar stylee, default colors, GO anno otation as X llegend, 14pxx font and 300 for X leegend rotatio on. Here are some tips: 1. 1 Have a glaance of all fo our bar styless and selecct one you like, 2. Use HEX format to define colors and th here is a weebsite we alrready sugggested, 3. If you y prefer GO annotation n as X legend d content, yo ou may use ssmaller font once therre are too maany words, 4.. 270 to 315 is suggested for X legend rotation, in which 270 m means vertical, and 315 5 means 45 degree slope, and you can n try other nu umber which h may satisfy your tastee but seems ssomehow strrange to me ☺ ☺. The bar chart is created based on scriptss from Open Flash Chart. It is powerful. You can drag bord ders to resizee and adjust tthe image sizze and ratio. And bars aree accessible tto term’s dettailed inforrmation. A ‘SSave as Imagge’ bottom is existed butt only useful when you are using FirreFox brow wser, and if yyou can also use your ‘Print Screen’ b bottom on yo our keyboard or other too ols to dow wnload this im mage. Partt 4: In th his part, detailed informattion is given. All GO signifficant terms w will presented in the follo owing tablee. And you caan browse th he GO terms using tree trraversing mode (we will d discussed it laater), or caan browse alll GO terms in n the similar ttype table, orr just the dataa. Userr can select terms to draw d graphiccal result or create bar chart. Pleasse note thatt the paraameters used d in graph or chart generaating is fetch hed from your cookie, and d your cookie e will be set s or changeed when you u generate grraphical results or GO ab bundant chartt which has been men ntioned in parrt 2 and 3. W While it will maake you a bit trouble if you would like adjust the im mages creaated here to rredo the part 2 or 3 work o once more to o change the settings. Clickk the checkbo ox left to ‘GO term’ can seelect all GO te erms at one ttime. You can click thee GO name to o collapse/exttend ontologgy terms in tree traversingg mode. A bo ottom that can make all significant sselected or not is available and those selected term ms can be used in draw wing graphicaal results or to create baar chart. Pleaase note thatt at least onee significant term shou uld be included in graph generation, o otherwise the graphical rresult will be some kind b blank and meaningless. Click on thee number will lead you to tterm’s detail information.. The term’s detaiil page is as following. f Th he agriGO will give all enttries can be annotated to o the term m besides a brief summaryy. And for eacch entry the aannotation in ncludes: GO tterms, GO source, desccription. B. How to use e Parametriic Analysis o of Gene Sett Enrichmen nt (PAGE)? PAGE method is argued by Kim m [BMC Bioin nformatics 20 005, 6:144]. U Using Central Limit Theore em in statiistics, this method m is sim mple and effiicient. Differe ent to SEA, it takes exprression level into acco ount, and can n deal with a llong list of geenes/probese ets. STEP P 1: Firsttly, you should choose the species forr your query data. Pleasee make sure tthat identifie ers in yourr input should be one of datatypes in nside the righ ht information table. If yo our identifiers are not stored in agrriGO, there iss another tw wo ways: one is provided your own GO O annotation n file, the o other is to usse our BLAST4 4ID service. P 2: STEP In PA AGE analysis,, user should pay more atttention to inp put data. As p presented in the followingg imagge, as least tw wo rows must be provided d. The first ro ow is sequencce identifiers,, and followin ngs are n numerical value. The num merical value iis fold change e (FC) or log2 2‐transformed d FC value (laatter prefferred) of thee identifiers' eexpression un nder differentt condition. Iff you do not have expresssion dataa, then SEA m may be the altternative choice. In aggriGO’s exam mple, there are 3 rows in th his example. First row is A ATH1 probeseet name, the seco ond row is expression fold change (FC) value of cold d treatment to CK(cold/CK K) after half hour. Third d row is exprression FC of cold/CK afterr 24 hour cold d treatment. Only 600 pro obesets are in n the quicck example fo or the fast loaad of the HTM ML page. To o obtain a full view of PAGE method, you u can dow wnload the fulll example filee and exploree the followin ng analysis prrocedure. STEP P 3: Nextt you can choose method d to do the multi‐test ad djustment. Seeven adjustm ment methods are available here, including: Yekutieli (FDR under depen ndency), Bon nferroni, Hocchberg, Hoch hberg (FDR R), Hommel, Holm, False Discovery Raate. Though I would sugggest perform adjustment test, you truly can tu urn off it and d use no adjust. While you y choose no adjust, th hen you mayy set significant level b below higherr. Terms undeer the cutoff of the signifficant level w will be highligh hted, and emphasized in analysis reesults, and it w will affect your test outpu ut. Minimum numbeer of mapping entries meeans that GO annotations that do not appear in at least the selected num mber of entrries will not be b shown. In n other word d, higher you set the num mber, more entries neeeded to makee one GO term m appear in tthe analysis reesult. Gene ontology type: t Plant GO G slim is a cut‐down ve ersion of thee GO ontologies containiing a subsset of the terms in the whole GO for plant. If yo ou can also upload u your own custom mized GO ann notation file once your id dentifiers are e not acceepted directlyy by agriGO. TThe file’s sizee is limited to 4MB. OK, now you can n click submitt to start analysis now. Yo ou may explorre output of analysis results in the ffollowing parrt of manual. Paraametric Analysis of Ge ene Set Enriichment (PA AGE) Resultt Resu ults generated by PAGE an nalysis have m many similar points to SEA A analysis, th hus it is sugge ested to browse SEA reesult introdu uction part firrstly. And only unique feaatures to PAG GE results will be explained. Since PAGE tool caan analysis seeveral rows aat one time, and terms in n each row w will be calcu ulated, each row has its significant GO O terms. Num mber of significant GO term ms for each ro ow is listed in the brieff summary paart. The number of tterms is deteermined by th he row you sselected whicch is colored by red. A simple colorful model n named CM fo or short is available. The color used in n the CM is ssame to the color used d in graphicaal result in which w red color system means m up regulated and b blue means down d regu ulated. And each e block present the term’s Z‐score for the row w. You can sselect row(s)) and term m(s) to generaate further im mages. The term’s detaiiled informattion is generrated if you click c the num mber. This paage may be a bit simp ple because itt is quite possible that theere are too m many entries m mapping to th he GO. In graphical resu ult part, user can choose one or two rows to draw w the imagee. If two rows are seleccted, a third color system m (purple co olors) will be used in dem monstrating tthose terms have diffeerent regulatiion direction in two rows. The following exxample presents two rowss in one grap ph. You can ccheck the annotation diaggram belo ow the resultt. There are three color systems: red d means up regulated teerms, blue means m dow wn regulated aand purple presents the term is regulaated in differeent direction in tow rows.. And if th he term has same s regulatted direction in both row ws, it will havve double bo orders. In the e box ‘r1=1e‐10’ meanss the q‐value of the term in row1 is 1e‐10, and ‘zs’ presents Z‐sccore. In baar chart geneeration part, ZZ‐score is thee statistical vaalue in PAGE calculation, mean value is the meaan of the value of all enttries in the row. r Mean change c is meean minus sttandard deviation whicch presents tthe change of expression when compaaring to the w whole row background. W While userr can set two color values for up‐regulaation terms and down‐reggulation term ms. As mentioned m beefore, Z‐scoree which is biigger than 0 or smaller th han 0 will bee presented using u diffeerent colors w which set by u user. But if you choosee mean valuee, they are in tthe same color since all m mean is biggerr than 0. C. How to use e BLAST4ID tool? The BLAST4ID too ol is not an analysis tool, but an associiated one useed mainly forr two purpose es: 1. Tran nsfer your ID Ds which are not availablle to agriGO to availablee ones, 2. use blast searcch to anno otate your seequences with h GO. To u use BLAST4ID D, user should d set target d database at ffirst, and then E‐value cuttoff. The proggram shou uld be correctly selected d based on sequence s typ pes of user’ss input and target datab base. Generally speaking, all array ssequences arre nucleotide and other geenome sequeences are pro otein. The process may take a lon ng while thu us the E‐mail address sh hould be given for the Email E notiffication. The result interfaace is a bit sim mple☺, but eenough for ussage. Dow wnloadable teext result: D. How to use e search too ol? Searrch tools in aggriGO are eassy to understtand and use.. Here are som me tips: 1. Unless you con ntact me and ask for elonggation, the jo ob ID is availaable within 3 days. here is a shorrt‐cut at top‐rright corner ffor job search hing. 2. Th 3. In n advance seaarch, you havve to define th he species firrstly. 4. Yo ou can eitherr search singlee one or a listt (no more th han 100) of seequence iden ntifiers. 5. In nput IDs are ccase insensitivve, but will bee agriGO's format in the o output. 6. FAQ What is agriGO? The agriGO is designed to automate the job for experimental biologists to identify enriched Gene Ontology (GO) terms in a list of microarray probe sets or gene identifiers (with or without expression information) and it is also a GO‐related database. The agriGO specially focus on agricultural species. What is GO? "The Gene Ontology (GO) project provides a controlled vocabulary to describe gene and gene product attributes in any organism. The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species‐independent manner. There are three separate aspects to this effort: first, we write and maintain the ontologies themselves; second, we make cross‐links between the ontologies and the genes and gene products in the collaborating databases, and third, we develop tools that facilitate the creation, maintainence and use of ontologies." Definition from http://www.geneontology.org/ What is updated in agriGO compare with EasyGO? The agriGO is a successor of EasyGO, and it go further. 1. We create new website interface. The database structure and scripts of agriGO are redesigned. Both page loading speed and analysis speed of agriGO now are improved because of the change. 2. The agriGO service is especially focus on agricultural species. It supports species is extended to 35 including as much as 280 datatypes. 3. We added new analysis tools for new agriGO, such as PAGE analysis and BLAST4ID tool. 4. The result output and information are richer compare with EasyGO. 5. The agriGO could also work as a GO database with search and download service. What are the unique features of agriGO compare with other GO webserver/database? The agriGO provides heavy support to agricultural species. Not only limited to SEA analysis, GSEA which is achieved using PAGE method is also available. Furthermore we have BLAST4ID tool for ID transfer or annotation. And search as well as download function is accessible. The agriGO can give out rich outputs like graphical result, bar chart result and hierical tree which composing a comprehensive understanding of biological meaning of user's input data. What is SEA analysis? SEA analysis means Singular enrichment analysis which is tranditional but widely used. SEA analysis is designed to identify enriched Gene Ontology (GO) terms in a list of microarray probe sets or gene identifiers. Finding enriched GO terms corresponds to finding enriched biological facts, and term enrichment level is judged by comparing query list to a background population from which the query list is derived. Which statistics method should I choose in SEA tool? When the input list is compared with the previously computed background, or is a subset of reference list, choose hypergeometric or fisher, for latter only when your query number is quite small. When the input list has few or no intersections with the reference list, the Chi‐square tests are more appropriate. What is PAGE analysis? PAGE is Parametric Analysis of Gene Set Enrichment [Kim et. 2005 BMC Bioinfomatics]. PAGE method is using Central Limit Theorem in statistics, this method is simple and efficient. Different to SEA, it takes expression level into account, and can deal with a long list of genes/probesets. PAGE use a two‐tailed test to count Z score, and the caculation of p‐value will be: if Z score >= 0: p‐value is 2 * (1 ‐ x) if Z score < 0: p‐value is 2 * x What is BLAST4ID? The BLAST4ID tool is not an analysis tool, but an associated one used mainly for two purposes: 1. Transfer your IDs which are not available to agriGO to available ones, 2. use blast search to annotate your sequences with GO. Which tool should I choose? It will depend on what data you have. If you only have a list of identifiers or only interested about them, SEA will be your choice. And if you like take expression data into count and would like compare several dateset then you may try PAGE. The BLAST4ID is only an associated tool, use it if you really need it. Why graphical/chart image does not display on my PC? The bar chart result need flash player to browse correctly. And you may need different tool to display different format graphical result, for example: Adobe reader, SVG brower. Contact me if you install related tool but still can not see the results. How many datatypes are supported by agriGO? We currently support on 35 species including 280 datatypes. Please check the data statistics page for detail information. We will continue adding more species and datatypes. How agriGO obtains its data source? Raw GO annotation data is generated using BLAST, Pfam, InterproScan by agriGO or obtained from B2G‐FAR center or from Gene Ontology. Arabidopsis genome data is from TAIR. Rice TIGR genome data is from Rice Genome Annotation Project. Rice KOME data is from KOME database. Rice Gramene data is from Gramene center. Populus genome data is collected from JGI. Soybean and Sorghum genome data is compiled from phytozome. Grape genome data is compiled from Genoscope. Medicago genome data is from Medicago truncatula sequencing resources. Maize genome data is from MaizeSequence.org. Castor bean genome data is from Castor Bean Genome Database. Brachypodium distachyon genome data is from Ensembl. Bovien genome data is from Bovine Genome Database. Silkworm genome data is from SilkDB. M. grisea genome data is from Magnaporthe grisea Database. affymetrixmetrix CSV files and array sequences are from NetAffx. How often does agriGO update? Normally we will update our database every 3 months, but if we will update agriGO if some important data source is newly available. Improvement and updating to agriGO tools are irregulated. Can I check result from old version by new agriGO ? Sorry, but no. Because we reconstructed the database and redesigned the website organization, analysis result from EasyGO is not supported in agriGO. How to make agriGO add new customized datatype? User can contact the agriGO administrator by email ([email protected]) to discuss more details.