arb: a software environment for sequence...

8
Chapter 46 ARB: A Software Environment for Sequence Data Ralf Westram, Kai Bader, Elmar Pr ¨ usse, Yadhu Kumar, Harald Meier, Frank Oliver Gl ¨ ockner, and Wolfgang Ludwig 46.1 INTRODUCTION Comparative sequence analysis of evolutionary conserved marker molecules nowadays is the standard procedure for assigning organisms to phylogenetic groups and/or taxonomic units. The current prokaryotic taxonomic framework is mainly based on rRNA-based phylogenetic conclusions [Ludwig and Klenk, 2001; Ludwig et al., 2009]. This approach provides the basis for identification or new description in pure culture investigations or culture-independent studies of complex environmental samples [Amann et al., 1995]. Furthermore, comparative analysis of appropriate markers allows assigning contigs to taxa in metagenomics studies. Powerful interoperating bioinformatics tools are prerequisites for sound utilization of the data flood for identification and phylogenetic inference in the genomics era. Such tools were missing or only available as stand- alone programs when the ARB project was initiated about 16 years ago [Ludwig et al., 2004]. Given this situation, two major goals were formulated in the early days of the ARB project and are maintained to the present: (1) the maintenance of a structured integrative secondary database combining processed primary structures and any type of additional data assigned to the individual sequence entries and (2) a comprehensive selection of software tools directly interacting with one another as well as the central database which are controlled via a common graphical interface. Initially, the ARB package was designed for handling and analyzing rRNA data. Later, it was extended by developing and/or including software tools for man- aging protein sequences as well as contigs and genomes. Handbook of Molecular Microbial Ecology, Volume I: Metagenomics and Complementary Approaches, First Edition. Edited by Frans J. de Bruijn. © 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc. Currently, the ARB project is maintained by members of the institutions with which the authors of this chapter are affiliated. The ARB package [Ludwig et al., 2004; Ludwig, 2005; Kumar et al., 2005; Kumar et al., 2006] as well as expert-curated rRNA databases [Pruesse et al., 2007] are freely available via http://www.arb-home.de and http://www.arb-silva.de. 46.2 THE ARB SOFTWARE PACKAGE The ARB software package provides a set of cooperating tools for database maintenance and managing as well as data handling and analysis. These tools directly interact with a central database of processed sequence and vari- ous types of sequence associated meta data. A common graphical user interface allows data access, modification, and analysis. The database structure as well as the mode and parameters of interaction of the software tools are customizable by the user to a large extent. 46.2.1 The ARB Main Window After database selection and ARB program start, the ARB main window provides the turnip for accessing the various software tools and facilities of the ARB package via the respective menus and buttons (Fig. 46.1). Furthermore, a user-selected tree is shown in radial or (two differ- ent) dendrogram formats. Primary data and metadata can be visualized at the terminal nodes. Compression of the view is possible by depicting user defined (phylogenetic) 399

Upload: others

Post on 29-Aug-2019

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ARB: A Software Environment for Sequence Datamagnum.mpi-bremen.de/biblio/outside/PDFs/2011/Genomics/Prusse_2_2011.pdf · replacement tool), ACI (ARB command interpreter), and RGE

Chapter 46

ARB: A Software Environmentfor Sequence Data

Ralf Westram, Kai Bader, Elmar Prusse, Yadhu Kumar, HaraldMeier, Frank Oliver Glockner, and Wolfgang Ludwig

46.1 INTRODUCTION

Comparative sequence analysis of evolutionary conservedmarker molecules nowadays is the standard procedurefor assigning organisms to phylogenetic groups and/ortaxonomic units. The current prokaryotic taxonomicframework is mainly based on rRNA-based phylogeneticconclusions [Ludwig and Klenk, 2001; Ludwig et al.,2009]. This approach provides the basis for identificationor new description in pure culture investigations orculture-independent studies of complex environmentalsamples [Amann et al., 1995]. Furthermore, comparativeanalysis of appropriate markers allows assigning contigsto taxa in metagenomics studies.

Powerful interoperating bioinformatics tools areprerequisites for sound utilization of the data flood foridentification and phylogenetic inference in the genomicsera. Such tools were missing or only available as stand-alone programs when the ARB project was initiated about16 years ago [Ludwig et al., 2004]. Given this situation,two major goals were formulated in the early days ofthe ARB project and are maintained to the present: (1)the maintenance of a structured integrative secondarydatabase combining processed primary structures and anytype of additional data assigned to the individual sequenceentries and (2) a comprehensive selection of software toolsdirectly interacting with one another as well as the centraldatabase which are controlled via a common graphicalinterface. Initially, the ARB package was designed forhandling and analyzing rRNA data. Later, it was extendedby developing and/or including software tools for man-aging protein sequences as well as contigs and genomes.

Handbook of Molecular Microbial Ecology, Volume I: Metagenomics and Complementary Approaches, First Edition. Edited by Frans J. de Bruijn.© 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc.

Currently, the ARB project is maintained by membersof the institutions with which the authors of this chapterare affiliated. The ARB package [Ludwig et al., 2004;Ludwig, 2005; Kumar et al., 2005; Kumar et al., 2006]as well as expert-curated rRNA databases [Pruesse et al.,2007] are freely available via http://www.arb-home.de andhttp://www.arb-silva.de.

46.2 THE ARB SOFTWAREPACKAGE

The ARB software package provides a set of cooperatingtools for database maintenance and managing as well asdata handling and analysis. These tools directly interactwith a central database of processed sequence and vari-ous types of sequence associated meta data. A commongraphical user interface allows data access, modification,and analysis. The database structure as well as the modeand parameters of interaction of the software tools arecustomizable by the user to a large extent.

46.2.1 The ARB Main WindowAfter database selection and ARB program start, the ARBmain window provides the turnip for accessing the varioussoftware tools and facilities of the ARB package via therespective menus and buttons (Fig. 46.1). Furthermore,a user-selected tree is shown in radial or (two differ-ent) dendrogram formats. Primary data and metadata canbe visualized at the terminal nodes. Compression of theview is possible by depicting user defined (phylogenetic)

399

Page 2: ARB: A Software Environment for Sequence Datamagnum.mpi-bremen.de/biblio/outside/PDFs/2011/Genomics/Prusse_2_2011.pdf · replacement tool), ACI (ARB command interpreter), and RGE

400 Chapter 46 ARB: A Software Environment for Sequence Data

Figure 46.1 The ARB main window. Buttons in top and left panels provide access to the various ARB tools. Phylogenetic groups areindicated by brackets, and condensed groups are represented by rectangles along with numbers of terminal nodes hidden. NDS (node displaysetup)-controlled database field entries at terminal nodes indicate the names, accession numbers, strain designations of the respective organisms(master entries), and first authors of the respective bibliography.

groups as triangles or rectangles in radial trees ordendrograms, respectively. Alternatively, these data canbe shown by simple listing. Datasets for further analysescan be selected by mouse button directed “marking” ofthe respective internal or terminal nodes. Opening a slavewindow for tree comparisons is also possible. The respec-tive trees can be exported to xfig—a simple open sourcegraphics program (http://www.xfig.org)—for furthermodification and/or transformation into various formats.

46.2.2 The Central DatabaseThe central component of the ARB package is a specialhierarchical and highly compressed database. Duringoperation, it is loaded in the main memory ensuring rapidaccess by the peripheral software tools. The sequencesrepresenting organisms, genes, or gene products arestored in individual database fields. Different sequences(genes, contigs, nucleic acid, and protein sequences) ofthe same organism can be stored in individual containers

(alignments) assigned to the same master entry (organ-ism). A unique identifier (short_name) is automaticallygenerated and assigned to each master entry under thecontrol of a “name server.” Following the ARB conceptof an integrative database, any type of additional datacan be assigned to the individual master entry and storedwithin default or user defined database fields. Besidesa set of default database fields, additional ones can becreated, deleted, and renamed by the user. The metadatacan either be intrinsic parts of the database or linked toit via local networks or the internet. In the latter casethe path to the respective file or the URL of an externaldatabase—optionally including commands and searchstrings—have to be defined using the ARB WWW(world wide web) tool. The default hierarchy of thedatabase entries is according to the phylogeny of theorganisms derived from the respective sequence data.However, it can also be changed according to othercriteria defined by database field entries. This hierarchyis used by special algorithms for highly effective data

Page 3: ARB: A Software Environment for Sequence Datamagnum.mpi-bremen.de/biblio/outside/PDFs/2011/Genomics/Prusse_2_2011.pdf · replacement tool), ACI (ARB command interpreter), and RGE

46.2 The ARB Software Package 401

compression. Different protection levels (0–6) can beassigned to the individual database fields. Database aswell as security management is facilitated by this tool.Data import and export is possible in various commonflat file formats. Default or user-defined parsing filterscontrol the storage or extraction of data and features intoand from defined ARB database fields, respectively. Aversatile merge tool allows data merging and exchangingbetween different ARB databases. A similar tool can beused for exporting of data subsets in the ARB format.

46.2.3 Data Accessand VisualizationMultiple alternative ways provide data access, selection,visualization, modification, and analysis using the ARBpackage. As mentioned above (see Section 46.2.1), thetree or list shown in the ARB main window can be usedfor browsing the data. Phylogenetic trees generated byintrinsic ARB tree reconstruction tools or imported fromexternal sources are stored in the database and can be visu-alized in different formats within the ARB main window.Any (combination of) database field entries can be visu-alized at the terminal nodes of the tree currently shown(Fig. 46.1). Selection and order of data entries, the resultsof data analysis, or extraction to be visualized are definedby the NDS (node display settings) tool. Irrespective ofthe visualization mode used, the ARB SRT (search andreplacement tool), ACI (ARB command interpreter), andRGE (regular expressions) tools can be used for extractionof combinations of (sub)strings as well as for analysis ofdatabase field entries, respectively.

A powerful search tool allows simple (stringsand combination of strings) and complex (default oruser-defined algorithms) searches in one or more (up tothree) of the database fields. The matching master entriesare shown in a hit list along with restricted informationon the respective hits. Selecting from this list providesaccess to the information in all or user-defined selectionsof database fields.

The “info” window displayed starting the standardtool for data visualization lists the database fields alongwith the respective stored information for one masterentry. Database field selection and order in this list can becustomized by the user. Furthermore, editing of the filedentries is possible using this tool. Multiple windows canbe opened allowing simultaneous data access for differentmaster entries. Besides this standard procedure, raw andprocessed data visualization is possible via “user masks.”The layout of the visualization windows (i.e., selection,size, and positioning) of database field entries can be cus-tomized by the user. Furthermore, simple algorithms formodifying and analyzing of database field entries (SRT,ACI, RGE) can be included when designing “user masks.”

46.2.4 Sequence EditorsA powerful editor provides versatile user access toprimary structure (nucleotide or amino acid sequences)visualization, arrangement, and modification (Fig. 46.2).The set of sequences to be displayed can be interactivelydefined as well as stored in user-defined “configurations.”The arrangement of the primary structures dependson the tree displayed in the ARB main window or istaken from a “configuration” selected while starting theeditor. The original data as well as virtually transformed(e.g., purine-pyrimidine, in silico translated amino acidsequences, or simplified amino acid presentation) dataare displayed in user-defined color codes. Keyboardcustomization is possible for data entry and modification.Two different editing modes can be selected. The “Align”mode allows inserting/removing alignment gaps andmoving sequence characters or stretches, while characterchanges are only possible after switching to the “Edit”mode. The rights to overcome protection of the indi-vidual sequence entries can be given for the two modesindependently. This helps to prevent unwanted characterchanges when manually modifying the sequence data orthe alignment. A set of hot keys in combination with(alignment, sequence, reference, or helix specific) cursorpositioning facilities support easy navigation. Block oper-ations are available for modifying the respective primarystructure or alignment regions. Sets of search stringscan be defined and optionally stored. Perfect or partialmatches can be visualized within the displayed sequencesby user-defined background colors (Fig. 46.1). Virtualcompression—removal of alignment gaps common toall or a certain fraction of the displayed sequences—ispossible. This makes data inspection and editing moreconvenient in case of large insertions occurring inonly part of the sequences. Groups of sequences canbe interactively defined or are automatically shown ifdefined according to the tree selected while starting theeditor. Consensus sequences are determined for eachdefined group of sequences according to default or userdefined criteria and optionally visualized along withor instead of the individual sequences. This consensuscan be edited, and changes made concern any sequencein the group. A special feature of the editor is thesimultaneous secondary structure check if rRNA (gene)data are visualized. Symbols indicating the presenceor absence as well as the character of base pairingsare shown below the individual nucleotide symbolsand immediately refreshed during sequence editing.A (three-domain) consensus secondary structure maskestablished according to commonly accepted secondarystructure models [Cannone et al., 2002] functions as aguide for this tool.

Page 4: ARB: A Software Environment for Sequence Datamagnum.mpi-bremen.de/biblio/outside/PDFs/2011/Genomics/Prusse_2_2011.pdf · replacement tool), ACI (ARB command interpreter), and RGE

402 Chapter 46 ARB: A Software Environment for Sequence Data

Figure 46.2 The ARB primary structure editor. Buttons in top and left panels provide access to the various editor-associated ARB tools.Subwindows in the upper part indicate cursor positioning, error messages, and search strings. SAI (sequence-associated information) lines showthe E. coli reference sequence as well as secondary structure mask and helix numbering. Condensed groups as shown in Figure 46.1 arerepresented by the respective consensus. The “Probe” search string is highlighted in the respective primary structures. Positional base pairing(∼,−,+,=) or consensus secondary structure violation (#) is indicated below the base symbols.

The ARB (nucleotide) secondary structure editor fitsany sequence selected by cursor positioning in the pri-mary structure editor into the common consensus model(Fig. 46.3). The layout of the structure—that is, colorcoding of base paired, nonpaired, and loop positions aswell as the arrangement, shape, and size of helices andloops—can be customized according to the user’s prefer-ences. Any of the search strings or SAIs (“sequence asso-ciated information”; see above) activated in the primarystructure editor can be visualized by background colors inthe secondary structure model [Kumar et al., 2005]. Thestructure can be exported to xfig—a simple open sourcegraphics program (http://ww.xfig.org)—for further modi-fication and/or transformation into various formats.

Three-dimensional (3-D) presentation of the respec-tive sequence optionally with search string and SAI visual-ization is also possible [Kumar et al., 2006]. Color codingcan be customized as described for the secondary structureeditor. The 3-D structure is based on x-ray structure datafor the rRNA molecules of Escherichia coli [Ban et al.,2000; Tung et al., 2002].

The primary structure editor contains a “proteinviewer” component allowing in silico translation andvirtual presentation of database inherited nucleic acid

sequences in selected or all frames. Two- and three-letteras well as user-defined color code presentation is possi-ble. This tool helps when performing primary structurequality checking and optimizing the respective alignment.For further analyses of the in silico translated aminoacid, sequences have to be stored in a separate proteinsequence alignment (database field; see Section 46.2.1).The respective nucleic and amino acid alignments can besynchronized (see Section 46.2.8).

46.2.5 Profiles, Masks, and FiltersConservation or base composition profiles, higher-orderstructure masks, and filters including or excluding partic-ular alignment positions are important tools for sequencedata analyses, especially for phylogenetic inference [Lud-wig and Klenk, 2001; Peplies et al., 2008]. The ARBpackage provides tools for determining such profiles basedupon the full database or user-defined subsets. These pro-files, masks, and filters are stored in the central database asso-called SAIs (sequence associated information) and canbe visualized and modified by the primary structure editor.The filter selection tool not only allows us to choose setsof particular filters but also allows to perform a fine tuning

Page 5: ARB: A Software Environment for Sequence Datamagnum.mpi-bremen.de/biblio/outside/PDFs/2011/Genomics/Prusse_2_2011.pdf · replacement tool), ACI (ARB command interpreter), and RGE

46.2 The ARB Software Package 403

Figure 46.3 The ARB secondary structure editor. Buttons in top and the left panels provide access to the editor associated layout tools. The“Probe” search string is highlighted (see Fig. 46.1).

with respect to the inclusion or exclusion of alignmentpositions in case of multiple character filters. Besides SAIsderived from the primary structures, any other informa-tion that can be assigned to sequence/alignment positionsor regions can be stored and used as SAIs. Examplesare rRNA–protein interaction sites or “in situ” accessi-bility maps for FISH (fluorescence in situ hybridization)[Amann et al., 1995; Kumar et al., 2005] probes.

46.2.6 Phylogenetic TreeingSoftware tools for nucleotide and amino acid sequence-based tree reconstruction according to the three mostcommonly used approaches (i.e., distance matrix,maximum likelihood, and maximum parsimony-basedprocedures) are incorporated in the package. Theycooperate as intrinsic tools with the respective ARBcomponents and database elements such as alignmentand filters.

The central treeing tool of the package—ARBparsimony—is a special development for the handlingof several thousand sequences (more than 400.000 in the

current small subunit (SSU Ref) rRNA SILVA database[Pruesse et al., 2007). New sequences are successivelyadded to an existing tree according to the parsimonycriterion. A special software component superimposesbranch lengths to the parsimony generated tree topology.These branch lengths reflect the significance of theindividual “tetra-furcations” by expressing the differenceof the most and the two less parsimonious solutionswhen performing NNI (nearest-neighbor interchange ofadjacent branches or sub trees). These relative distancesare normalized according to a distance matrix deducedfrom primary structure comparison. Thus branch lengthsin ARB-parsimony-generated trees in the first instancevisualize the significance of topologies, while in thesecond instance they reflect a degree of estimatedsequence divergence. A special feature of ARB parsi-mony allows adding sequences to an existing tree withoutpermitting any changes in the initial tree. This enablesthe user to include partial, low-quality or preliminaryaligned sequences without perturbing the topology ofan optimized tree based upon optimally aligned full-

Page 6: ARB: A Software Environment for Sequence Datamagnum.mpi-bremen.de/biblio/outside/PDFs/2011/Genomics/Prusse_2_2011.pdf · replacement tool), ACI (ARB command interpreter), and RGE

404 Chapter 46 ARB: A Software Environment for Sequence Data

and high-quality data. Another peculiarity of this treeingsoftware concerns the tree optimization by performingcycles of NNI (nearest-neighbor interchange) and KL[Kernigham and Lin, 1970] topology modifications.These optimizations not only can be performed for thecomplete tree but also can be confined to user-selectedsubtrees. Thus tree optimization is possible by applyingthe appropriate filters for the respective phylogeneticlevels and groups.

The ARB-neighbor tool for generating distancematrix trees is an accelerated and improved version ofthe respective component of Felsenstein’s [Felsenstein,1989] PHYLIP package.

Selected stand-alone tools of the former package canbe used in the ARB environment in combination with allrespective ARB features.

The various facilities of the currently most power-ful maximum likelihood program RAxML [Stamatakis,2006] can also be operated from the ARB user interfaceapplying parameters and filters generated by the respec-tive ARB features. Besides RaxML, also TREE-PUZZLE[Schmidt et al., 2002] and PhyML [Guindon and Gas-cuel, 2003] versions can be used for ARB controlled treereconstruction.

A “concatenation” tool allows merging alignmentsof different genes or gene products for multiple marker-based phylogenetic studies. The full spectrum of filter andparameter setting is available for analyzing or controllingthe influences of the individual markers in the concate-nated set.

46.2.7 The Positional Tree ServerOnce established, the ARB PT server (positional tree)allows rapid and exact searching for sequence identity orpeculiarity. Thus, it represents the central tool for fastsearching of closest relatives for automated sequencealignment or to define diagnostic sequence stretchesfor primer and probe design. Establishing a prefix treeserver of any oligonucleotide sequence up to 100-mersoccurring in the underlying database and assignmentof the individual oligonucleotides to the sequencesor organisms containing them is the basis for theseprocedures. PT-server-based analyses do not rely uponaligned sequences. The PT server is not provided withthe ARB program package or ARB database. It has tobe established for the respective database locally. ThePT server is used for rapid finding of the most similarreference sequences indicating the closest relative ofthe query organism. This also helps finding appropriatetemplates for adding new sequences to existing align-ments (see Section 46.2.8). The PT server is also usedfor finding (taxon- or group-specific) diagnostic sequence

stretches for probe and primer design and evaluation (seeSection 46.2.9).

46.2.8 Sequence Alignmentand Quality ChecksFor de novo-generating nucleic or amino acid sequencealignments, ClustalW [Thompson et al., 1994] was addedto the peripheral tools of the ARB package. However, inthe context of database maintenance, new sequence entrieshave to be integrated in an already existing database ofaligned sequences. For this purpose the ARB fast alignerwas developed. This aligner uses a (set of) selected alignedreference sequences as template(s) for rapid integration ofa (set of) unaligned sequence(s). Individual entries—thatis, sequences or consensus defined by the user or auto-matically determined by PT-server-based search for mostsimilar reference sequences—are used as template.

In case of protein coding nucleic acid sequences, thealignment usually is optimized on the amino acid level(given that the phylogenetic information is stored there)[Peplies et al., 2008]. The underlying nucleic acid align-ment can then be adapted to the amino acid alignment bya back-translation based tool taking into consideration allknown codon usages.

Once a reasonable data set of high quality andoptimally aligned primary structures is reasonablystructured (grouped) according to the results of carefulphylogenetic analyses, further sequence and alignmentquality checking is possible using the respective ARBtools. A component of the primary structure editor takesinto account SAIs (sequence associated information; seeSection 46.2.4) expressing positional variability as well asphylogenetic tree topologies for estimating reasonabilityof a certain monomer (nucleotide or amino acid) at acertain alignment position. The degree of “(miss)-fit” isoptionally indicated by user defined background colorsin the editor window. Another tool determines a qualityscore for the individual sequences by estimating degreesof deviation from group specific primary and secondarystructure consensus, conservation profiles, sequence sizes,and completeness.

46.2.9 Probe Designand EvaluationTaxon- or gene-specific probes or primers certainly playa central role in many molecular biological researchand analysis projects—for example, the identificationand detection of organisms in complex environmen-tal samples or expression studies within the scopeof genome projects. The ARB “Probe Design” and“Probe Match” tools are searching the PT server toidentify short (10–100 monomers) diagnostic sequence

Page 7: ARB: A Software Environment for Sequence Datamagnum.mpi-bremen.de/biblio/outside/PDFs/2011/Genomics/Prusse_2_2011.pdf · replacement tool), ACI (ARB command interpreter), and RGE

References 405

stretches that are evaluated against the background ofall sequences in the database the PT server has beenbuilt from. In principle, no alignment of the sequencedata is needed for specific probe design. However,in the case of taxon-specific probes, alignment andphylogenetic analyses are necessary for defining groupsof phylogenetically (taxonomically) related organismsas the targets of specific probes. The design of taxon-specific oligonucleotide probes with ARB is performedin three steps. First, the (group of) target organism(s),gene(s), or sequence(s) has to be defined (“marked,”see Section 46.2.1). Second, potential target sites aresearched by the “Probe Design” tool with the aid ofa PT server. The results are shown in a ranked list ofproposed targets, probes, and additional information. Theranking is according to in silico-predicted probe quality.Third, the proposed oligonucleotide probes are evaluatedagainst the whole database by using the program “ProbeMatch.” Local alignments are determined between theprobe target sequence(s) and the most similar referencesequences (optionally from 0 to 5 mismatches) in therespective database. Furthermore, these sequence stringscan automatically be visualized in the primary andsecondary structure editors (see Section 46.2.4). Aspecial advancement is the ARB multiprobe softwarecomponent. It determines sets of up to five probesoptimally identifying the target group. Color-codedvisualization of target master entries (see Section 46.2.2)and matching probe combinations is possible in the ARBmain window.

46.2.10 Further Useful ARB ToolsA large fraction of sequences in the currently availablerRNA sequence databases [Pruesse et al., 2007] comprisesclusters of highly similar to identical primary structuresmost often retrieved by culture independent environmentalstudies. Commonly, such “sequence clouds” are repre-sented as OTUs (operational taxonomic units) in furtherdata analyses. Such OTUs are defined either manuallyor by applying respective software tools [Schloss et al.,2009]. Using the ARB package OTUs can be defined andautomatically grouped in the selected tree by a newlydeveloped component. The OTU definition according touser provided parameters is deduced from the topology ofa selected tree and reassessed using distance methods. A(best) representative is proposed by the software.

ARB can also function as a simple genome viewerallowing comparison of annotated contigs or genomes.Data access is possible by “search” and “info” tools,alternatively via genome maps similarly as described inSection 46.2.2. Extraction of (sets of) genes into ARBgene databases can also be managed by this ARB facility.

46.2.11 Availability and TrainingThe ARB software has been designed for Linux operatingsystems. Tested versions for SuSE and Ubuntu Linuxdistributions are available at http://www.arb-home.deand http://www.arb-silva.de. The binaries, source code,and some documentation are provided in the downloadarea of these web pages. The latter URL also providesaccess to the current release of the SILVA LSU and SSUrRNA databases. Furthermore, there is a mailing list ofthe world wide ARB users community. Subscription isneeded for those interested in joining ([email protected]). Basic and advanced ARB training coursesare offered by the Ribocon company in Bremen (Ger-many, http://www.ribocon.com). Mac users interestedin ARB should contact http://www.haloarchaea.com/resources/arb/.

46.3 CONCLUDING REMARKS

The ARB software package provides a powerful andcomprehensive set of directly cooperating software toolsfor managing and analyzing integrative databases ofsequences. It is in use worldwide. The ARB software anddatabase maintaining teams try to keep it up to date andcompatible with the ongoing hardware developments.Given more than 16 years of ARB development by dif-ferent computer scientists and a large number of studentsof computer science, a huge and heterogeneous sourcecode would have to be cleaned and at least partiallyredesigned. However, it is difficult to get funding orsponsoring of software redesign.

INTERNET RESOURCES

ARB software (http://www.arb-home.de)

ARB databases (http://www.arb-silva.de)

AcknowledgmentsARB software and database maintenance is partially sup-ported by the Deutsche Forschungsgemeinschaft and theBayerische Forschungstiftung.

REFERENCES

Amann R, Ludwig W, Schleifer KH. 1995. Phylogenetic identifi-cation and in situ detection of individual microbial cells withoutcultivation. Microbiol. Rev . 59:143–169.

Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. 2000. Thecomplete atomic structure of the large ribosomal subunit at 2.4 Aresolution. Science 289:905–920.

Page 8: ARB: A Software Environment for Sequence Datamagnum.mpi-bremen.de/biblio/outside/PDFs/2011/Genomics/Prusse_2_2011.pdf · replacement tool), ACI (ARB command interpreter), and RGE

406 Chapter 46 ARB: A Software Environment for Sequence Data

Cannone JJ, Subramanian S, Schnare MN, Collett JR, D’SouzaLM, et al. 2002. The comparative RNA Web (CRW) site: An onlinedatabase of comparative sequence and structure information for ribo-somal, intron, and other RNAs. BMC Bioinform . 3:2.

Felsenstein J. 1989. PHYLIP—Phylogeny inference package (version3.2). Cladistics 5:164–166.

Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithmto estimate large phylogenies by maximum likelihood. Syst. Biol .52:696–704.

Kernigham BW, Lin S. 1970. An efficient heuristic procedure for par-titioning graphs. Bell Syst. Tech. J . 49:291–307.

Kumar Y, Westram R, Behrens S, Fuchs B, Gloeckner FO, et al,2005. Graphical representation of ribosomal RNA probe accessibilitydata using ARB software package. BMC Bioinform . 6:61.

Kumar Y, Westram R, Kipfer P, Meier H, Ludwig W. 2006. Evalu-ation of sequence alignments and oligonucleotide probes with respectto three-dimensional structure of ribosomal RNA using ARB softwarepackage. Bioinformatics 7:240–251.

Ludwig W, Klenk HP 2001. Overview: A phylogenetic backbone andtaxonomic framework for prokaryotic systematics. In Garrity, G. M.ed. Bergey’s Manual of Systematic Bacteriology , 2nd ed. Vol. 1. NewYork: Springer, pp. 49–65.

Ludwig W, Strunk O, Westram R, Richter L, Meier H, et al. 2004.ARB: A software environment for sequence data. Nucleic Acids Res .32:1363–1371.

Ludwig W. 2005. Bioinformatics and web resources for the micro-bial ecologist. In Osborn AM, Smith CJ. eds. Molecular MicrobialEcology . Abingdon: Taylor and Francis, pp. 345–371.

Ludwig W, Schleifer KH, Whitman WB. 2009. Revised road mapto the phylum Firmicutes . In Whitman WB, ed. Bergey’s Man-ual of Systematic Bacteriology , 2nd ed, Vol. 3. New York: Springer,pp. 1–13.

Peplies J, Kottmann R, Ludwig W, Glockner FO. 2008. A standardoperating procedure for phylogenetic inference (SOPPI) using (rRNA)marker genes. Syst. Appl. Microbiol . 31:251–257.

Pruesse E, Quast C, Knittel K, Fuchs B M, Ludwig W, et al.2007. SILVA: A comprehensive online resource for quality checkedand aligned ribosomal RNA sequence data compatible with ARB.Nucleic Acids Res . 35:7188–7196.

Schloss PD, Westcott S L, Ryabin T, Hall JR, Hartmann M,et al. 2009. Introducing mothur: Open source, platform-independent,community-supported software for describing and comparing micro-bial communities. Appl. Environ. Microbiol . 75:7537–7541.

Schmidt HA, K. Strimmer M, Vingron, von Haeseler A. 2002.TREE-PUZZLE: Maximum likelihood phylogenetic analysis usingquartets and parallel computing. Bioinformatics 18:502–504.

Stamatakis A. 2006. RAxML-VI-HPC: Maximum likelihood-basedphylogenetic analyses with thousands of taxa and mixed models.Bioinformatics 22:2688–2690.

Thompson JD, Higgins DG, Gibson DJ (1994) CLUSTAL W: improv-ing the sensitivity of progressive multiple sequence alignment. Com-put. Appl. Biosci . 8:189–191.

Tung C S, Joseph S, Sanbonmatsu KY. 2002. All-atom homologymodel of the Escherichia coli 30S ribosomal subunit. Nat. Struct.Biol . 9:750–755.