Download Georeferenced database of genetic diversity (GD)² User Manual

Transcript
Georeferenced database of genetic diversity
(GD)²
User Manual
Properties
Version
5
Date
2012 September
Project
Evoltree Title
(GD)² User Manual
Author
François Ehrenmann
Validation
Antoine Kremer
Table of contents
Objectives.............................................................................................................................................3
Organisation and technical informations..............................................................................................3
Technical informations.....................................................................................................................3
Organisation.....................................................................................................................................3
Login page...................................................................................................................................3
Home page and main parts of (GD)²...........................................................................................5
Functionalities......................................................................................................................................7
Viewer..............................................................................................................................................7
Populations and sample on a map...............................................................................................7
Genetic data on map..................................................................................................................13
Markers......................................................................................................................................27
Exportation.....................................................................................................................................31
Importation....................................................................................................................................33
Import populations....................................................................................................................35
Import individuals.....................................................................................................................37
Import markers..........................................................................................................................39
Import .......................................................................................................................................41
measures)...................................................................................................................................41
Administration....................................................................................................................................51
Add a new Project..........................................................................................................................51
Add a new dataset..........................................................................................................................53
Add or delete rights of access on data...........................................................................................53
Add rights..................................................................................................................................55
Delete rights..............................................................................................................................55
I. Objectives
This database contains genetic and georeferenced passport data of different genetic units that are traditionally analyzed by geneticists and ecologists.
Genetic data include marker data (Microsatellites, Single nucletotide polymorphisms, PCR DNA fragments etc...) coming either from diploid or haploid tissue for wich genetic parameters have been calculated (allelic or nucleotide frequencies, diversity statistics).
Passport data include all other data than genetic, ranging from geographic coordinates to ecological or physiographic data of the location of the genetic units.
Genetic units are either single individuals or populations on which the genetic and passport data are recorded.
Specific objectives of the applications are :
– 1: Insert genetic and georeferenced data of natural populations
– 2: View geographic distribution of data
– 3: Export data to different format (CSV, GIS format, population genetic software format) for data analysis
II. Organisation and technical informations
1. Technical informations
(GD)² is implemented in Ruby On Rails 2 and is optimized for Mozilla Firefox. Database is managed by PostgreSQL (version 8.4) and PostGIS. Maps are provided by Google MapsTM.
2. Organisation
1) Login page
Authentication is necessary to access to (GD)². Login and password are provided by the administrator and are defined as follow:
– first letter of the first name in upper­case followed by the name (with the first letter in upper­
case too)
– an example : for Mr John Doe, it's JDoe.
2) Home page and main parts of (GD)²
The home page allows access to the two main parts of the software. The menu on the top provides access to viewer, importation functionalities, data administration and to saved files.
(a) “Population” part
This part enables access to all functions concerning populations stored in (GD)².
(b) “Genetic Data” part
The genetic part comprises functionalities regarding the genetic data : single tree genotypes or aggregated diversity statistics (allelic frequencies, diversity parameters) stored for populations, dataset summary and markers.
These two categories are also available using the “Viewer” menu.
II. Functionalities
1. Viewer
Main function of (GD)², the Viewer permits to display different informations on a geographical map. This function is accessible from the menu or by clicking on “Population on map” button or on “Genetic data on map” button in the home page.
1) Populations and sample on a map
(a) See all populations
All available populations can be displayed on a map by clicking on “See all populations” button. Under the map, a list of these populations, sorted by country, is displayed with their main characteristics. By clicking on population's name in the list, you will focus on this population on the map.
Map obtained with function “See all populations” (b) Advanced search
This function allows to search for a particular population. To do so, the following items must be selected :
– the category (fungi, insects or trees)
– the genus (one or more)
– the species (one or more)
– if you want a population or a sample (check boxes “population” or “sample”)
– Click on the “Select species” button to display results.
From this first selection, two lists are obtained. They permit :
– to select some populations to display on a map.
– to select countries so that all populations located in the country can be displayed.
Under the map, like the “See all populations” function, a list of the populations is available, sorted by country. Two links are available: “View individuals”, which allows to display individuals contained in each populations , and “View individual data”, which allows the display of available genotyping data for a given population.
(c) See your populations
From this part of the application you can also visualized the populations that you have inserted. To do so, the following items must be selected :
– the category (fungi, insects or trees)
– if you want a population or a sample (check boxes “population” or “sample”)
NB: The display for populations can change according the level zoom. When the number of populations to display is large, one icon actually represents several populations.. The information window who displays by clicking on the icon will contain a list with the eight first populations and the complete number of populations left. All the populations can be displayed if you further focus on the icon.
2) Genetic data on map
This function allows to display the allelic frequencies, the haplotypic frequencies, the genotypic frequencies or the different diversity indices on a geographical map.
(a) Plot Allelic/Haplotypic frequencies
The display of allelic (or haplotypic) frequencies is realised for a given species and a given marker. So, must be selected :
– a category
– one or more genus
– one or more species
From this selection, then must be selected :
– one or more dataset : datasets are organized into two sections, corresponding to the different compartment chloroplastic and nuclear. Datasets in these two sections can be linked or unlinked (two datasets are linked when the study uses the same names and descriptions of markers
– the marker
Then just select the desired alleles (or haplotypes) for the display. Click on each allele number or click on "Select all alleles" button to select and display on map.
Change color by clicking on the colored square, remove selected choice by clicking the number of the allele. You can also display all populations including those not containing selected alleles.
On the map, frequencies are represented by pie charts . The list of the populations is available as usual, with the possibility to download the pie chart for each population. Click on population name in the list to visualize it on the map (see below).
NB: if all alleles (or haplotypes) are not selected in the form, in the pie chart, the black parts will represent the proportion of non selected alleles.
(b) Plot genotypic frequencies
The display of genotypic frequencies is realised for a given marker and genotype(s). So, must be selected :
– a category
– one or more genus
– one or more species
From this selection, then must be selected :
– one dataset
– a marker
Then just select the desired genotype(s) for the display. On the map, frequencies are represented by pie charts . The list of the populations is available as usual, with the possibility to download the pie chart for each population. Click on population name in the list to visualize it on the map (see below).
(c) Plot genetic diversity measures
The display of genetic diversity measures is realised for a given species, a given marker and a given diversity indice.
Available diversity indices in (GD)² are :
– Ho (Observed heterozygosity)
– He (Expect heterozygosity)
– P (Polymorphism rate)
– A (Allelic diversity)
– Fis, Fit (Inbreeding coefficients)
Thus, select :
– the category
– the genus
– the species
– check one option for the diversity indices :
➢ “Diversity indices for all markers in dataset” for indices calculated on a set of markers
➢ “Diversity indices for one marker in dataset” for indices calculated for one marker
Depending the checked option, it's possible to choose a type of marker (first option) or a marker (second option) for given dataset, and the index that will be displayed (check boxes).
We will obtain a map with circles that picturing the populations. Their diameter is proportional to the index's value.
The list of the populations is available as usual, with value of diversity indice selected, sorted by country. Click on population name in the list to visualize it on the map (see below).
3) Markers
To view available markers in (GD)², the button “markers” on the home page and in the “Viewer” menu take to a menu where it is possible to :
– display all available markers
– do an advanced search
(a) All markers
All available markers are listed, sorted by marker type.
(b) Advanced search
To look for a marker is possible by selecting :
– the compartment (nuclear, cytoplasmic, mitochondrial,chloroplastic)
– the marker type (cpSSR, EST_SSR, RFLP, Isozyme)
–
the result is displayed in a list, and for each line, a link (View data) allows to localize all populations having data for the marker
2. Exportation
Exportation is actually available for individual data, on allelic and (or) haplotypic frequencies. Exportation is accessible from the “Viewer”, when the geographical map is displayed with the frequencies distribution. “Export data” button is located under the “continue” button.
By clicking on it, there is two possibilities :
– export all data of the dataset
– export the previously displayed data
The exportation's format is in CSV.
By clicking on the chosen option, the exportation process is launched.
It's confirmed by a message on the top of the window
The file is then available in “My files” section of the web application. It can be downloaded or deleted.
3. Importation
The (GD)2 database (Georeferenced Database of Genetic Diversity) comprises georeferenced passport data and genetic data. Genetic data can be inserted at the individual level or at the population level. For example, if you have already allelic frequencies and diversity statistics available, they can be inserted and will refer to the population level. If you have single tree genotypic data, they can be inserted at the individual level.
The importation function is accessible from the menu by clicking on “Import data”.
The importation concerns the following items :
– populations/samples
– individuals
– markers
– genetic data
Some basic rules before inserting new data : •
•
•
insert populations before their individuals
insert markers before the concerned data insert populations and their individuals before their data All items upstream the arrows must be inserted before the items downstream the arrows
1) Import populations/samples
By clicking on “Add Populations/Samples”, we access to a simple form : just select the csv file and click on “insert” to upload the populations.
The csv file is presented below:
Respect the title of the columns (case and spelling). Latitudes and longitudes are in decimal degrees. Try to have as much as possible different names for your populations.
2) Import individuals
By clicking on “Add individuals”, we access to a simple form : just select the csv file and click on “insert” to upload individuals.
Individuals are inserted in the database AFTER their population. The csv file format is described below :
It is important that the population name in this file are the same that in the previous file : in other words, check that the file for populations and the file for individuals, contains the same population names.
3) Import markers
By clicking on “Add markers”, we access to a simple form : just select the csv file and click on “insert” to upload markers.
Marker's types available in (GD)² are the following :
– EST_SSR (nuclear, chloroplastic, mitochondrial)
– gSSR (nuclear, chloroplastic)
– cpSSR (chloroplastic)
– RFLP (nuclear, chloroplastic, mitochondrial)
– SNP (nuclear)
– Isozyme (cytoplasmic)
– RAPD (nuclear)
The allele types can be “dominant” or “codominant”. For the genetic measures, there are two possible options: “allelic” or “haplotypic”. A column titled “reference” can be added for isozymes. The column “biblio” can be added providing a pdf file of the publication where the data were analysed for the first time.
4) Import genetic data ( measures)
To import genetic data, click on “Add genetic data”. You can import measures by project or import diversity measures in (GD)² database.
(a) Genotyping data
The importation is made from a csv file. Click on “Insert genotyping data”, then choose a project and a dataset to insert the data.
NB : The data inserted in a dataset belonging to the project Evoltree, will be visible from the eLab Wizard (Evoltree portal)
The csv file format is described below:
For the haplotypes, it's possible to define the color that will be used for the pie charts for the display of haplotypic frequencies in the column “legend_color”.
(b) Diversity data
It involves data collected on the population level comprise frequencies and diversity statistics (diversity indices, allelic and haplotypic frequencies, ...)
●
Insert Allelic/Genotypic frequencies
To access to it : Add Genetic Data > Insert Diversity data > Insert Allelic/Genotypic Frequencies. Then select a project and a dataset. The csv file format is explained below. The mentioned markers and populations in the file must exist in the database. So please, check their spelling.
The marker names in the second line and population names in the second column should be the same than those already introduced during the previous steps, when the population and marker data were introduced. So please, check their spelling.
●
Insert values for diversity index
Diversity indices refer either to a given marker or a set of markers used in a study (all used markers in a dataset) . To access to this insertion : Add Genetic Data > Insert Diversity Data. You can insert data for one population or for many populations in (GD)² database. ­ Insert data for one population
This menu allows to import diversity measures for one population in (GD)² database. Click on your project, choose a dataset, a population, a marker and seize values for indices (He, Ho, P, A, Fis, Fit).
­ Insert data for many populations
This menu allows to import diversity measures for many populations in (GD)² database. Click on your project, choose a dataset and select your csv file containing data.
Available indices in (GD)² are the following :
–
Ho : observed heterozygosity –
He : expected heterozygosity –
P : polymorphism rate –
A : allelic diversity –
Fit, Fis : inbreeding coefficients –
Ne : efficiency number of alleles The csv file format is explained below. III. Administration
The administration part of the web application is reachable from the menu.
Data in the (GD)2 database are organized in projects and datasets. A dataset must contain data that are compatible, that can be studied together. It is recommanded to create a dataset for one study (example : all data that are involved in a publication subject). A dataset belongs to a project.
Users can create a project, and add to this project some datasets in order to insert data. When data are imported in the database, only the user's laboratory can access to these data by default. If he want to share his data with another partner, he can grant some rights to another laboratory. 1. Add a new Project
To create a new project, click on the “Add a new Project” button. A simple form allows to create a new project and to view existing projects in the database.
2. Add a new dataset
To create a new dataset, click on the “Add a new Dataset” button. A simple form allows to create a new dataset and to view existing datasets, by project, in the database.
NB : all tied datasets (that involves all included data) to the Evoltree project, will be visible from the eLab Search Wizard of the official portal of the Network of Excellence Evoltree (http://www.evoltree.eu).
3. Add or delete rights of access on data
When a user import his data in the database, only his laboratory can access to these data. If he want to share his data with another partner, he can grant some rights to another laboratory. To access to this function : Administration > Add or delete rights of access on Data.
1) Add rights
When you click on “Add rights”, we access to a form.
Then you can select :
– the dataset
– the data type (genotyping or diversity)
– the partner laboratory
These informations are validated by clicking on “Save changes”. The chosen partner laboratory can access to the data from the viewer.
2) Delete rights
To delete rights on data, click on “Delete rights”. A list displays all laboratories for which you have granted some rights, according to dataset and data type.
We check in this list the rights to delete then we validate by clicking on “Delete selected rights”. This deletion takes effect immediately.
3) Change password
To change your password, click on “Change password”. All informations about your account are displayed (last name, first name, e­mail, etc...). Type, re­type your new password and click on the “Save changes” button. This modification takes effect immediately. Logout, and then login with your new password.
IV. Acknowledgements
For remarks or any suggestions, thanks to get in touch with Audrey Jacques­Gustave ([email protected]) or Frederic Raspail ([email protected]).