Title: GSAD: A genome size in the Asteraceae database
Abstract: The Asteraceae are one of the largest families of angiosperms, comprising 24,000 to 30,000 species in over 1,600 to 2,000 genera (1 and references therein). It has a worldwide distribution, with the exception of Antarctica and includes many economically important species which are used, for example, as foods, medicines, and ornamentals. Asteraceae species are the target of many evolutionary studies and more recently they have also become the focus of new genome sequencing programs. New model species for evolutionary-developmental (evo-devo) research have been selected within the Asteraceae such as Gerbera, Helianthus, and Senecio, whereas Tragopogon is the focus of intensive studies on polyploidization mechanisms (2). The first evo-devo studies in the Asteraceae have been very promising despite complications arising from the genetic and epigenetic changes associated with polyploidy which is very frequent in the family. The term "C-value" was coined by Swift (3) to define the gametic nuclear DNA content (genome size) expressed in picograms. Nowadays, genome size research covers a large and diverse range of biological fields and extends across all plant groups. For example, studies have been carried out on genome size nomenclature (4), to improve methodological aspects (5) and to find possible explanations of how and why genome size changes occur in plants (6). Data on nuclear DNA amounts are interesting not only per se but are also of practical use. For instance, the success of techniques such as AFLPs and nuclear microsatellites are influenced by genome size, while the choice of a species for possible genome sequencing or evo-devo project is also determined, in part, by genome size. Interest in genome size has increased over the years and this has led to the development of several related databases (e.g., for plants7-9). Following on from our own research studies on genome size in the Asteraceae family and given that the family is one of the most intensely studied from many aspects, we have developed a genome size database focused specifically on the Asteraceae (which we have named the "Genome size in the Asteraceae database", GSAD). It is hoped that this will become a significant tool for comparative research and for future genome size studies. The GSAD aims to make all genome size data for Asteraceae easily and widely accessible, in a user-friendly internet database. At the specific and infraspecific level, the nuclear DNA content for a taxon is listed under the taxonomic and nomenclatural treatment used by the author(s) of the source reference, which is provided in all cases. For tribal and subtribal classification, we have followed the most recent revision of the family by Funk et al. (1). Currently, the number of source references is 86 covering the years from 1967 to 2010. The following fields are accessible in the database: (1) species (including subspecies if any); (2) specific taxonomic authority; (3) infraspecific taxonomic authority (if any); (4) chromosome number (if known); (5) n/2n (gametic/somatic chromosome number); (6) genome size, including (6.1) 1C-value expressed in pg; (6.2) 2C-value expressed in pg; (6.3) 1Cx-value expressed in pg; (6.4) 2C-value expressed in Mbp; (7) life cycle (if known); (8) ploidy level (including ploidy levels inferred from DNA contents); (9) internal standard used to estimate the nuclear DNA amount; (10) genome size estimation method; (11) source reference. Also, advanced search options allow selecting tribe, subtribe, and genus. The chromosome number and ploidy level given (if known) are taken from the source publication of the genome size assessment. A quick search can be made by entering the name of a genus or species in the "Search" box on the home page. Using the default settings this will display the available C-values, the method used to estimate genome size, and the source references (Fig. S2, Supporting Information). Additional information (such as taxonomic authority, chromosome numbers, life cycle, ploidy levels, etc.) can also be obtained by selecting the appropriate boxes listed below the "Search" box. Alternatively, an "Advanced" search can be made where the user can specify particular search criteria (Fig. S3, Supporting Information). This option allows users to get results from very specific searches or to generate large volumes of data which match specific criteria based on taxonomic, karyological, or technical aspects. In addition to searching the database links to other genome size or plant chromosome number databases are listed, as well as other plant genetic resources databases. There is also the option for the user to submit their own data (even unpublished if they wish to include them in the database), together with any associated comments (either at the "Comments" or "Contact" tabs or directly to the email addresses listed there). However, it is noted that unpublished data will not be listed in the database; instead only the name of the newly assessed species will be given and the person/people involved. Nevertheless, it will indicate that a species is being worked on, providing researchers with the option to contact the appropriate research group for further information if necessary, and avoid potential duplication of effort. The GSAD currently contains information on 1,780 entries, representing genome size estimations for 110 genera and 820 species, including 185 infraspecific taxa. This corresponds to ∼6% of the family at the generic level and 3% at the specific level, whereas, at the suprageneric level, there are data for about 30% of the recognized tribes. The first genome size estimates in the Asteraceae were made in 1967 using biochemical approaches. However, most estimates listed in the database have been made using flow cytometry (63.48%), with only 27.59% of estimates using Feulgen cytodensitometry-based approaches (including cytophotometry, microspectrophotometry, and scanning densitometry) and just 0.17% using chemical measurements. The methodology used for the remaining 8.76% of measurements is unknown. Information about the internal standards used in flow cytometric studies are given in Table 1. Although species of Asteraceae are not amongst the most commonly used calibration standards, 10 out of the 28 species listed belong to Asteraceae (although they were only used in 6.12% of the measurements). The GSAD is the first database focused on a single botanical family, the Asteraceae. Given that there is much research focused on many diverse aspects of Asteraceae biology, the database is likely to be a valuable tool. In 1977, Solbrig (10) remarked that studying the DNA content of plant nuclei, (which at that time was noted as "a new technique"), could help solve problems of karyotype evolution, and he gave some examples where this kind of data had already revealed itself useful in the Asteraceae. The large number of papers and data published since then on this subject concerning this family has confirmed this assertion. Analyzing the number of genome size estimates for Asteraceae reported per year shows that in the last decade there has been a huge increase in the rate at which data are being generated (Fig. 1A), illustrating the increasing importance of this type of data to research studies. Indeed, a cumulative frequency plot (Fig. 1B) shows that 40% of all published estimates were made in the last decade. Given these observations, there is clearly a need to facilitate access to this growing pool of data, and the GSAD has been designed with this in mind—enabling better exploitation of available Asteraceae genome size information for studies focused on all aspects of Asteraceae biology. Quantitative analysis of the rate at which genome size data for Asteraceae are being generated (A) Number of assessments per year in the period 1967–2009, (B) cumulative number of assessments throughout the years. Data taken from the GSAD. The database therefore fills a gap in the tools available for analyzing Asteraceae, paralleling the situation for chromosome numbers, a field intimately linked to genome size. For chromosome numbers, the general Index to Plant Chromosome Numbers (http://mobot.mobot.org/W3T/Search/ipcn.html) has been complemented with a database specific for Asteraceae,—i.e., the "Index to Chromosome Numbers in Asteraceae" (http://www.lib.kobe-u.ac.jp/infolib/meta_pub/G00000 03asteraceae) by Watanabe (11). In the same way, our database complements the plant genome size collections cited above. Given that Watanabe's and our database deal with very closely related genetic aspects of the Asteraceae, we suggest the possibility of creating further links between them in the future. Some plant nuclear DNA content databases are general, although sometimes focused on particular divisions of plant groups (http://data.kew.org/cvalues), whereas others point to geographical areas, such as that recently published for the Balkan flora (9). The GSAD will directly tell researchers—without the need to look at numerous data scattered throughout the scientific literature—what has been done in the field of nuclear DNA content assessment in the Asteraceae taxa they are interested in. Since information on specific groups in which work is currently in progress will be provided, it will help to avoid duplication of efforts. In addition, it will also be useful for detecting gaps as well as hotspots in the knowledge of genome size in the family. As currently compiled, the GSAD provides the first step toward a comprehensive database of genome size data in the Asteraceae. Our aim is to update it frequently with a new release every year, possibly increasing to twice a year if a sufficient amount of data is produced. We welcome receiving any published, in press or even unpublished data that any author may wish to be uploaded into the database. In parallel with the general increase of the information contained in the database, we aim to identify and target specific gaps to increase knowledge of genome size within the family still further. The authors thank researchers who contributed Asteraceae genome size assessments and studies. The authors also thank Samuel Pyke, who improved the English language, Francisco Gálvez, for implementing the database, and Alba Anadon, for data introduction. I.S.-J. and D.V. benefited from FPU predoctoral grants, O.H. from a MICINN postdoctoral grant of the Spanish government, and S.G. from the JAE-Doc program of the CSIC. Additional Supporting Information may be found in the online version of this article. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article. Teresa Garnatje*, Miguel Ángel Canela , Sònia Garcia [email protected] [email protected] , Oriane Hidalgo§, Jaume Pellicer¶, Ismael Sánchez-Jiménez**, Sonja Siljak-Yakovlev , Daniel Vitales , Joan Vallès , * Institut Botànic de Barcelona (IBB-CSIC-ICUB), Passeig del Migdia s.n., Parc de Montjuïc, 08038 Barcelona, Catalonia, Spain, Department of Managerial Decision Sciences, IESE Business School, Universitat de Navarra, Av. Pearson 21, 08032 Barcelona, Catalonia, Spain, Institut Botànic de Barcelona (IBB-CSIC-ICUB), Passeig del Migdia s.n., Parc de Montjuïc, 08038 Barcelona, Catalonia, Spain, § Department of Environmental and Plant Biology, Ohio University, Athens, Ohio 45701, ¶ Jodrell Laboratory, Royal Botanic Gardens, TW9 3AB Kew, Richmond, Surrey, United Kingdom, ** Institut Botànic de Barcelona (IBB-CSIC-ICUB), Passeig del Migdia s.n. Parc de Montjuïc, 08038 Barcelona, Catalonia, Spain, Université Paris Sud, Laboratoire d Evolution et Systématique, UMR8079 CNRS-UPS-AgroParis-Tech, Bât. 360, 91405 Orsay Cedex, France, Laboratori de Botànica Facultat de Farmàcia, Universitat de Barcelona Av. Joan XXIII s.n., 08028 Barcelona, Catalonia, Spain.