joinmap manual

JoinMap® 3.0

Software for the calculation of genetic linkage maps

Completely revised edition by J.W. van Ooijen and R.E. Voorrips

Wageningen, October 2001

JoinMap is a registered trademark of Plant Research International B.V. in theBENELUX and in the U.S.A.. Other brand and product names are registeredtrademarks of their respective holders.

Biometris is part of Wageningen University and Research Centre, and is positionedwithin the Expertise Group Plant Sciences. This expertise group incorporates theDepartment of Plant Sciences of Wageningen University, Plant ResearchInternational B.V. and Applied Plant Research.

Address:

E-mail: [email protected] (technical information & support)[email protected] (sales)

Web: www.joinmap.nlwww.biometris.nl

Mail: Biometris,attn. JoinMap,P.O. Box 100,6700 AC Wageningen,The Netherlands

Copyright (C) 1995-2001 Plant Research International B.V. All rights reserved

mailto:[email protected]

mailto:[email protected]

http://www.joinmap.nl/

http://www.biometris.nl/

ContentsIntroduction 1

Installation 1

Overview 2

A genotype data population 2

A pairwise data population 7

A map 8

Map integration 8

Final remarks 8

How to cite JoinMap 3.0 ? 9

Acknowledgement 9

Using JoinMap 11

Controlling the program 11

The JoinMap project 12

The population node 13

LOD score 15

The pairwise data population node 16

The grouping node 16

The group node 16

The pairwise data population group node 18

Map integration 19

The mapping node and the mapping procedure 19

The map node 21

The pairwise data population map node 22

External maps 22

Tutorial 23

Data files 31

General 31

Data file characteristics 32

Locus genotype file 33

Pairwise data file 39

Map file 41

Translation file 42

Non-JoinMap locus genotype file 43

Default file name extensions 46

Tables, examples and references 47

List of figures 47

List of tables 47

List of examples 48

References 48

Index 49

Introduction 1

IntroductionJoinMap® is a computer program for the calculation of genetic linkage maps inexperimental populations of diploid species. The present version is a completelyrevised edition based on version 2.0 (Stam & Van Ooijen, 1995). It has mostfunctionality of that version, some of the very infrequently used parts were notincorporated, while a lot of functionality was added by creating a new powerfuluser interface based on MS-Windows®. Through this interface, many features arecontrolled in the normal MS-Windows way, thereby allowing an intuitive use of thesoftware. A very important enhancement is the way the software takes care of thedata; all the user has to do is provide the file with genotype data while the softwaredeals with all subsequent actions on the data, such as selecting subsets of loci or ofindividuals. This relieves the user of the many edit actions needed under version 2.0and because of that, invites the user to a better exploration of his or her data.Another significant improvement is the way the grouping of loci into linkagegroups can be studied, allowing a much better visual insight into associationsbetween loci; also, the grouping calculations were sped up very significantly. Afinal, but not the least, enhancement is the incorporation of a map chartingcomponent, with which the user can create high quality charts of the calculatedmaps.

Installation

JoinMap version 3.0 is a program for the MS-Windows platform. It was tested torun under Windows 95, and Windows NT 4.0, and is further expected to runflawlessly under Windows 98, ME and 2000. It comes with an InstallShield®

installation program that does most of the work. Start the SETUP.EXE programfrom the set of installation files, e.g. by double-clicking on it from within WindowsExplorer or My Computer. Choose the settings that you are prompted for and letSETUP.EXE finish. After this process the license file JOINMAP.LIC will bepresent in the program directory (typically C:\Program Files\JoinMap). This copy

2 Introduction

of the license file allows you to use the software under certain limitations, such as amaximum of two linkage groups, no printing or exporting. A purchased version ofJoinMap comes with a separate personal license file, which is also namedJOINMAP.LIC. By replacing the installed license file in the program directory withyour personal license file you will gain full access to the software.

Apart from the length of names (maximum of 20 characters for population, locusand linkage group names) there are no limits built into the software, memory forstoring data is allocated dynamically only for the amount needed. So, your projectsize is limited only by the amount of RAM in your PC, for which a size of 64 MB isrecommended for reasonably sized projects.

Overview

Start the program using the Windows Start menu. When the program runs you willsee a window that is divided into several main parts: on the top the menu and thetool bar with buttons, on the left side there is the navigation panel, on the right sidethe contents-and-results panel, and on the bottom the status bar (Figure 1). Once dataare loaded the navigation panel will contain a tree, like in Windows Explorer, in whicheach node will represent an item, such as a population, a linkage group or a map.The contents-and-results panel will contain a set of tabbed pages (tabsheets), inwhich contents and results of analyses will be displayed concerning the nodeselected in the navigation tree. When a node becomes selected its correspondingmenu item is activated, e.g. for a population node the Population menu and for agroup node the Group menu. The formats of data files used by JoinMap aredescribed thoroughly in the Data Files chapter. Some example data files are presentin the DemoData subdirectory of the program directory.

A genotype data population

In JoinMap 3.0 your work is organised into a project. You create a new project oropen an existing project using the File menu. Once a project is opened, you loaddata into the project. This can be done with the Load Data option in the File menu.With this option you can load several types of data files into the project, amongwhich the most important one is the locus genotype file (also called loc-file), whichcontains the genotype codes for the loci of a single segregating population like anF2. Such a population is sometimes referred to as a genotype data population, incontrast to a pairwise data population which will be addressed later. You can load

Introduction 3

Figure 1. User interface

more populations into a project, if you like. When the population is loadedsuccessfully a population node will appear in the navigation tree and the contents-and-results panel will contain several tabsheets (Figure 2). The Info tabsheet will displaya summary on the data loaded into the project. The Data tabsheet will show aneatly formatted version of the loaded data file. The Loci and Individuals tabsheetsallow exclusion of loci and/or individuals from calculations and actions. The othertabsheets are initially empty; they will be filled with results of correspondingcalculations using the genotypes of the currently selected (i.e. not excluded) set ofloci and individuals. Clicking on the (Re-)calculate button on the tool bar will startthe calculations, and after successful completion the tabsheet will be filled with itsresults. Values of parameters used in calculations can be modified in the CalculationOptions from the Options menu.

The Locus genot. freq. tabsheet will display the genotype frequencies for each locus inorder to study segregation distortion. The Individual genot. freq. tabsheet will showthe genotype frequencies for each individual. The Similarity of loci and Similarity ofindividuals tabsheets will show the fractions of identical genotypes (the calculationsinclude the missing genotypes). The LOD Groupings (text) and LOD Groupings (tree)tabsheets will show the grouping of loci. Both tabsheets are different views of thesame analysis; the text view is more suitable for printing, while the tree view has amore attractive visual appearance allowing user interaction and must be used forcreating group nodes in the navigation tree necessary for calculating linkage maps.The grouping is based upon the test for independence (translated into a LODscore) and is done at several significance thresholds. Loci determined to besignificantly associated at the current LOD threshold with at least one member of a

Menu and tool barTabs

Navigation panel withnavigation tree

Contents-and-results panelwith tabsheets

Status bar

4 Introduction

Figure 2. A population node (jm20demo) with its various tabsheets

group will be in the same group. The tree view will show nodes representinglinkage groups with names that consist of three fields: "LOD/nr(size)", in whichLOD represents the LOD threshold under which the group was formed, nrrepresents the group number at that LOD threshold (the largest group gets thesmallest number), and size is the number of loci in the group. When you select acertain node in the LOD Groupings tree (by clicking on it), the loci of that groupare displayed in the list on the right-hand side of the tabsheet. Once you havedecided which groups from the LOD Groupings tree you want to use forcalculating the linkage map, you need to select their nodes by right-clicking. A nodeselected this way will become red (or magenta for the current node) (Figure 3).When you have selected all required groups, you subsequently use the Create Groups

Figure 3. The creation of groups for mapping

Introduction 5

Figure 4. A grouping node with five group nodes

for Mapping option from the Population menu. If successful, this action will producein the navigation tree a grouping node (as a child node of the population node) andfor each group a group node (as child nodes of the grouping node) (Figure 4).

The grouping node has a single tabsheet showing an overview of the division of lociover the groups. Each group node has several tabsheets. The Data tabsheet will firstshow a brief instruction how to calculate the pairwise recombination frequencies.For the sake of brevity pairwise recombination frequencies are called linkages. Aftersuccessful calculation of the linkages the Data tabsheet will show the originalgenotype data only for the loci in the group. The Loci tabsheet shows the loci in thegroup and allows exclusion of them from further processing. The Fixed orderstabsheet is the place where you can specify fixed orders for use in the mapcalculations of the group. The remaining tabsheets contain information on thelinkages, one for the Weak, another for the Strong linkages, a third for Suspect linkagesand a final for Maximum linkages.

From a group node a map can be calculated with the Calculate Map option from theGroup menu. The map is calculated based on the selected set of loci. Uponsuccessful completion a mapping node will be produced in the navigation tree (as achild node of the group node) and for each resulting map a map node (as child nodesof the mapping node). The mapping node has a single tabsheet containing theSession log of the calculations, allowing you to study the details of the procedure.

The mapping procedure is basically a process of building a map by adding loci oneby one, starting from the most informative pair of loci. For each added locus thebest position is searched and a goodness-of-fit measure is calculated. When thegoodness-of-fit reduces too sharply (too large a jump), or when the locus gives riseto negative distances, the locus is removed again. This is continued until all loci

6 Introduction

Figure 5. The navigation tree after the map calculations for Group 2

have been handled once. This is the end of the so-called first round. Subsequently, allloci previously removed are attempted to be added to the map a second time. Thiscan be successful since the map will contain more loci than at the first attempt. Butit may also be unsuccessful again through too large a jump or negative distances, sothat a locus will be removed once more. This is the second round. After that, all locipreviously removed are attempted a final time to be added to the map, nowignoring the requirements of maximum allowed reduction in goodness-of-fit andno negative distances. This results in a final or third round. Of course, when all lociare fitted in the first or second round, there will not be subsequent rounds. Theresults at the end of each round are represented by a map node (Figure 5).

A map node has several tabsheets, the first three are different representations of themap itself, (1) the Map chart, (2) the map as a list, and (3) the map in plain text, a.o.each serving its own exporting goal, e.g. for (1) MS-PowerPoint, (2) MS-Excel, (3)MapQTL and MapChart (JoinMap's companion software for QTL analysis andenhanced map charting, respectively). The Mean chisquare contribs. tabsheet shows foreach locus the average contribution to the goodness-of-fit. The Genotype probabilitiestabsheet will show (after clicking on the (Re-)calculate button) the genotypes withlow probability conditional on the map and conditional on the genotypes of theneighbouring loci. The next two tabsheets present these probabilities averaged overindividuals and over loci, respectively. The final tabsheet will show the Locus genotypefrequencies similar to this tabsheet for the population node, but here the loci areordered according to the map.

Introduction 7

A pairwise data population

For the case in which the population type is not handled directly by JoinMap (seeTable 1), or if you only have the recombination frequencies between pairs of lociwith their LOD scores (e.g. from literature), you can organise the available pairwiserecombination frequencies into a pairwise data file, which can be loaded into JoinMapand used for map calculations. Such a data set is referred to as a pairwise datapopulation. When the data are loaded successfully it will be represented by apopulation node in the navigation tree (with its icon in different colours than that of agenotype data population node). Again, the Info tabsheet will show a summary ofthe data. The Pairs tabsheet presents all loaded pairwise data. The Loci tabsheetgives a list of the loci and allows for exclusion of loci from calculations and actions.The two LOD Groupings tabsheets are identical to those of a genotype datapopulation. From the LOD Groupings tree you can create in the same way asdescribed above a grouping node with its group nodes, i.e. by right-clicking on thenodes in the LOD Groupings tree and using the Create Groups for Mapping optionfrom the Population menu.

The grouping node is identical to that of a genotype data population. The group nodesare somewhat different except for the Loci and Fixed orders tabsheets. The data ofthe group are now based of the pairwise data rather than original genotype data.Therefore, the first tabsheet is the Linkages tabsheet giving all the pairwise data forthe loci in the group. It also allows exclusion of specific pairs from the furthercalculations. Because the pairwise data can come from multiple sources, you can doa heterogeneity test for which the results will be presented in the Heterogeneity test(list) tabsheet and the significant results in detail in the Heterogeneity test (text)tabsheet.

From a group node a map can be calculated with the Calculate Map option from theGroup menu. The map is calculated based on the selected set of loci and theselected set of pairs, and follows the same procedure as that for a genotype datapopulation. Upon successful completion a mapping node will be produced in thenavigation tree and for each resulting map a map node. The mapping node has theSession log of the calculations (see above). The map node has the three tabsheetsrepresenting the map itself and it has the Mean chisquare contribs. tabsheet (seeabove). Because the original genotypes are not available, there is no possibility tocalculate the conditional genotype probabilities or the locus genotype frequencies,hence there are no tabsheets for this.

8 Introduction

A map

You can load map files into a project, allowing you to compare an external mapwith a map calculated for a segregating population in the project. Loaded maps arerepresented as map nodes in the root of the navigation tree. Maps (i.e. maps loadedinto the project as well as maps from a segregating population in the project) canbe combined into a single map node, displaying multiple linkage groups side by sidein the chart. To do this, you need to right-click on the map nodes and apply theCombine Maps function from the Join menu.

Map integration

When you have more than one segregating population of a species in whichgenotypes of some or all loci are determined in multiple populations, you cancombine the data from the separate populations in order to calculate an integratedmap. To do this, you must load each population into the same project. Thenavigation tree should have groupings and group nodes for each population. Thegroups that relate to the same linkage group with overlapping locus scores can becombined by right-clicking on the group nodes and applying the Combine Groups forMap Integration function from the Join menu. The pairwise recombinationfrequencies of the selected groups will be combined into a combined group node in thenavigation tree. Such a combined group node is identical to a group node of apairwise data population (see above), except that an Info tabsheet is added.Therefore, we can refer to the group node in the pairwise data population sectionabove for a description of tabsheets of and actions with the combined group node.

Final remarks

A genetic map is as good as the data that were used to construct it. With real datayou will discover sooner or later that, depending on the quality of the raw data,maps produced by JoinMap may slightly, or even seriously, vary with the parametersettings and the selection of subsets of loci and individuals. No mapping programcan ever produce the ultimate genetic map. Whenever data are being added toexisting data, maps will slightly change, if not with respect to order, then mostlikely with respect to map distance. Essentially the calculation of a genetic linkagemap is a statistical estimation procedure. As such the mapping algorithm ofJoinMap reflects a balance between statistical rigour and computational speed, and

Introduction 9

thus it bears the advantages and disadvantages of a compromise. The new userinterface of JoinMap is designed to allow the user a better exploration of his or herdata in order to let him or her arrive at good quality maps.

How to cite JoinMap 3.0 ?

Van Ooijen, J.W. & R.E. Voorrips, 2001.JoinMap® 3.0, Software for the calculation of genetic linkage maps.Plant Research International, Wageningen, the Netherlands.

Acknowledgement

JoinMap 3.0 is based on its previous versions. Of course, we gratefullyacknowledge the original work of Piet Stam on the core mapping algorithm (Stam,1993). Several people have contributed to the development of the present version,especially concerning the user interface. We thank these people, of which we like tomention especially our colleagues Sjaak van Heusden, Chris Maliepaard and Ericvan de Weg, all three of Plant Research International, and Rob Maijers of SERC,Utrecht.

10 Introduction

Using JoinMap 11

Using JoinMapThe program can be started in the various ways of MS-Windows, by using the Startmenu, by double-clicking on the JOINMAP.EXE file from within Windows Exploreror My Computer, or by double-clicking on a project file. The latter way is establishedonly after running the program a first time. When the program runs you will see awindow that is divided into several main parts: on the top the menu and the toolbar with buttons, on the left side there is the navigation panel, on the right side thecontents-and-results panel, and on the bottom the status bar (Figure 1). Once data areloaded the navigation panel will contain a tree, like in Windows Explorer, in whicheach node will represent an item, such as a population, a linkage group or a map.The contents-and-results panel will contain a set of tabbed pages (tabsheets), inwhich contents and results of analyses will be displayed concerning the nodeselected in the navigation tree. When a node becomes selected its correspondingmenu item is activated, e.g. for a population node the Population menu and for agroup node the Group menu. The formats of data files used by JoinMap aredescribed thoroughly in the Data Files chapter. Some example data files are presentin the DemoData subdirectory of the program directory.

Controlling the program

Because JoinMap is an MS-Windows program, you can expect the many features tobe controlled in the normal MS-Windows way with the mouse and the keyboard.Below is a summary of some normal and special keys and key combinations:

alt-key key being any underlined character shown in the program: as usual, goto the associated part of the window or perform the associated action

ctrl-N go to the navigation treectrl-R go to the results panelctrl-T go to the tabs of the results panel

12 Using JoinMap

Tab rotate focus through all visual elements (usually: navigation tree, tabs,tabsheet)

Esc close tabsheet-info window, orcancel calc./env. options windows, orcancel calculations of LOD groupings, linkages, or map

Break cancel calculations of LOD groupings, linkages, or mapF1 show the help fileF2 edit the name of the selected node in the navigation treeF9 (re-)calculatealt-F4 exit program

In the navigation tree, map nodes or group nodes can be selected for combining byright-clicking on the nodes, after which they become red (or magenta for thecurrent node). When two or more nodes are selected this way the Join menubecomes active, and its options Combine Maps or Combine Groups for Map Integrationcan be activated. A similar way of selection must be used on the nodes in the LODGroupings (tree) tabsheet (see below), in order to create group nodes in thenavigation tree, which are needed to calculate the maps. Both trees can also becontrolled with the keyboard (after clicking in the tree window); the up and downarrows let you move up and down in the tree, the right and left arrows expand andcollapse branches, the space bar toggles the special selection of nodes forcombining them.

The tabsheet on display can be exported to file , printed and copied to theclipboard using the corresponding File menu options or tool bar buttons. Fileexport and copying to clipboard are useful for taking the data or charts to, forinstance, MS-Excel®, MS-PowerPoint® and MapQTL®. In some instances there issome extra information available on a displayed tabsheet. In such cases the i-button

in the tool bar is highlighted. Clicking this button or selecting the Info on TabSheet option from the File menu will show this information. The various listsshown can be sorted on the data in a certain column by clicking on the header ofthat column; clicking a second time on the header sorts in the opposite direction.The fonts that are used by the program can be modified using the EnvironmentOptions from the Options menu. The parameters of calculations can be set using theCalculation Options from the Options menu.

The JoinMap project

In JoinMap 3.0 your work is organised into a project. You create a new project oropen an existing project using the File menu. The whole of a JoinMap project

Using JoinMap 13

consists physically of (a) the project file with extension .jmp, and (b) the project datadirectory with the same name as the project file, but with the extension .jmd. Theproject data directory resides in the same directory as the project file; it will containall (many) internal data files. When backing up a JoinMap project, always take theproject file as well as the project directory with all its files.

Once a project is opened, you load data into the project. This must be done withthe Load Data function in the File menu (or with a tool bar button ). With thisfunction you can load three types of data files into the project, and you can loadmore than one data file. The most important one is the locus genotype file (also calledloc-file), which contains the genotype codes for the loci of a single segregatingpopulation. These data may also be formatted according to the MAPMAKER rawdata format. Such a data set is referred to as a genotype data population. For the case inwhich the population type is not handled directly by JoinMap, or if you only havethe recombination frequencies between pairs of loci with their LOD scores (e.g.from literature; the data may be from more populations), you can organise theavailable pairwise recombination frequencies into a pairwise data file (also called pwd-file), which can be loaded into JoinMap and used for map calculations. Such a dataset is referred to as a pairwise data population. When such population data sets areloaded successfully, they will be represented by a population node in the root of thenavigation tree, the icon of the pairwise data population in different colours thanthat of a genotype data population. The third type of data file that you can loadinto a project, is a map file. This will allow you to compare an external map with amap calculated for a segregating population in the project. A map file can containmore than one linkage group. Loaded maps are represented as map nodes in the rootof the navigation tree.

The population node

When a genotype data population is loaded successfully a population node will appearin the root of the navigation tree and the contents-and-results panel will containseveral tabsheets (Figure 2). The Info tabsheet will display a summary on the dataloaded into the project. The Data tabsheet will show a neatly formatted version ofthe loaded data file. The Loci and Individuals tabsheets allow exclusion of lociand/or individuals from calculations and actions. The Loci tabsheet shows theassigned numbers that will be used for the loci in all child nodes of the populationnode. The other tabsheets are initially empty; they will be filled with results ofcorresponding calculations. Clicking on the (Re-)calculate button on the tool bar,or pressing F9, will start the calculations, and after completion the tabsheet will befilled with the results.

14 Using JoinMap

The Locus genot. freq. tabsheet will display the genotype frequencies for each locus inorder to study segregation distortion. The segregation is tested against the normalexpectation ratios with a normal classification of genotypes using the chisquare test(Tables 7, 8). For some situations you can change the classification for which thetest must be done, for instance with dominance in an F2 you want to test against a3:1 ratio rather than a 1:2:1 ratio. To do this you must first select the records in thelist that you want to modify, and then apply the Set X2-test Classification for SelectedLoci function from the Population menu and pick the appropriate choice from thedialog. (Tip: for easy selection you can sort the list on an appropriate column, forinstance sorting on the genotype c column in an F2 will pool the loci that have cscores). The Individual genot. freq. tabsheet will show the genotype frequencies foreach individual. It is normal that some individuals will resemble the one parent,some the other, while many will be intermediate, so there is no chisquare test here.But you may use it for instance to detect individuals that have many missing values.Based upon the chisquare values or the numbers of missing genotypes you canmake a selection of records in these tabsheets, and by subsequently using thePopulation menu option Exclude Selected Items the corresponding loci or individualswill be checked as Excluded in the Loci or Individuals tabsheet, respectively, whilethe current tabsheet is recalculated (NB: the other tabsheets will not be recalculatedautomatically).

The Similarity of loci and Similarity of individuals tabsheets will show the fraction ofidentical genotypes (the calculations include the missing genotypes) for fractionsabove 0.95 (default). The 0.95 threshold value can be modified with the CalculationOptions in the Options menu. By using the Population menu function ExcludeIdenticals the second locus (column Locus2) or individual (column Individual2) inpairs with a similarity of exactly 1 will be checked as Excluded in the Loci orIndividuals tabsheet, respectively. Doing this for loci will result in fastercalculations, while you can be certain that identical loci will map at the identicalposition. For individuals this is not a normal action, though it is available. Forindividuals this tabsheet is intended to reveal identical individuals which should bevery rare under high density maps and thus indicate possible errors.

The LOD Groupings (text) and LOD Groupings (tree) tabsheets will show the groupingof loci using the genotypes of the currently selected (i.e. not excluded) set of lociand individuals. Both tabsheets are different views of the same analysis, but the textview is more suitable for printing, while the tree view is used for creating groupnodes in the navigation tree necessary for calculating linkage maps. Each node inthe tree represents a group of linked loci. The grouping is based upon the test forindependence in a contingency table (translated into a LOD score) and will be doneat several significance levels (thresholds) of LOD score as indicated by the LODGroupings threshold parameters in the Calculation Options from the Options menu.

Using JoinMap 15

Loci determined to be significantly associated at the current LOD threshold with atleast one member of a group will be in the same group. The tree structure arisesbecause at increasing LOD thresholds, groups of loci fall apart (branch) intounlinked subgroups. The tree view will show nodes representing linkage groupswith names that consist of three fields: "LOD/nr(size)", in which LOD representsthe LOD threshold under which the group was formed, nr represents the groupnumber at that LOD threshold (the largest group gets the smallest number), andsize is the number of loci in the group. When you select a certain node in the LODGroupings tree (by clicking on it), the loci of that group are displayed in the list onthe right-hand side of the tabsheet. Because the tree can become very large, thebranches in the tree that do not branch any further below a certain node willautomatically be shown collapsed at this node. Clicking on the boxed + symbol atthe node expands the branch. Once you have decided which groups from the LODGroupings tree you want to use for calculating the linkage map, you need to selecttheir nodes by right-clicking. A node selected this way will become red (or magentafor the current node) (Figure 3). When you have selected all required groups, yousubsequently use the Create Groups for Mapping option from the Population menu. Ifsuccessful, this action will produce in the navigation tree a grouping node (as a childnode of the population node) of and for each group a group node (as child nodes ofthe grouping node) (Figure 4).

LOD score

The LOD score calculated by JoinMap for the recombination frequency is basedon the G2 statistic for independence in a two-way contingency table:

G2 = 2 Σ o log(o/e)

with o the observed and e the expected number of individuals in a cell, log thenatural logarithm, and Σ the sum over all cells. Under the null hypothesis thestatistic has a chisquare distribution with as degrees of freedom (df) the number ofrows minus one multiplied by the number of columns minus one. The test forindependence is not affected by segregation distortion like the LOD scoreemployed normally in linkage analysis (i.e. the 10-log likelihood ratio comparing theestimated value of recombination frequency with 0.5), thus leading to less incidenceof spurious linkage. Because pairs can differ in numbers of cells in the contingencytable the degrees of freedom will differ as well. Therefore the G2 statistic with morethan one df is transformed into a G2 statistic with one df, using an approximationbased on equality of P-values. Finally the value is multiplied by 0.217 (=0.5*log10(e)) to get to the normal LOD scale. When there is no segregation

16 Using JoinMap

distortion in a backcross (and DH, DH1, HAP, HAP1) this LOD score is equal tothe usual linkage analysis LOD score. This property is used in JoinMap to calculatefrom a recombination frequency and its LOD score the (virtual) numbers ofrecombinant and non-recombinant gametes.

The pairwise data population node

When a pairwise data population is loaded successfully it will be represented by apopulation node in the navigation tree, with its icon in different colours than that of agenotype data population, and it will have a different set of tabsheets: the Info, theLoci and the two LOD Groupings tabsheets are identical to those of a genotype datapopulation, the Pairs tabsheet presents all loaded pairwise data. The grouping isbased directly on the LOD scores in the Pairs tabsheet.

The grouping node

The grouping node has a single tabsheet showing an overview of the division of lociover the groups. Group number 0 is used for all ungrouped loci, which are lociexcluded on the Loci tabsheet and loci in groups not selected from the LODGroupings tabsheet when creating the grouping. Thus, a grouping is fullyconsistent in such a way that any locus is present in one group only or isungrouped. Loci can be moved from one group to another (also to and from group0) by selecting them in the tabsheet and using the Move Selected Loci option from theGrouping menu. This action will adjust the Grouping tabsheet and the affectedgroup nodes.

NB: The set of selected (not excluded) individuals at the time of creating thegrouping is fixed for all actions on the grouping node and all its child nodes. If youwant to change the set of individuals at a later stage, you must create a newgrouping node.

The group node

The group node of a genotype data population has several tabsheets. The Datatabsheet will first show a brief instruction how to calculate the pairwiserecombination frequencies. For the sake of brevity recombination frequencies are

Using JoinMap 17

called linkages. After successful calculation of the linkages the Data tabsheet willshow the original genotype data, but only for the loci in the group and for theindividuals selected (not excluded) from the population at the time of creating thegrouping (parent) node. When linkage phases are to be determined (for populationtypes DH, HAP and CP), they will be given in the Data tabsheet. On the Locitabsheet the loci in the group are shown and can be marked for exclusion. Onceloci are excluded the linkages are automatically recalculated and all tabsheets,including the Data tabsheet, are adjusted accordingly; however, existing child nodesare not adjusted.

The Fixed orders tabsheet is the place where you can specify fixed orders for use inthe map calculations of the group. Each fixed order should start with an "@" at thebeginning of a line and can be followed by an unlimited series of locus names,separated by spaces and newlines, the succession in the series defines the order.Any unknown locus name will be skipped, so you need not adjust any fixed orderwhen excluding a locus on the Loci tabsheet. Fixed orders are only effective, ofcourse, when they consist of three or more loci. The session log of the mapcalculations (see below) will give an overview of the fixed orders that were used, sothat you can verify the use of the Fixed orders tabsheet. Often, fixed orders will bederived from other mapping projects; therefore, the session log gives the resultingmap also in the fixed order format, so that this can be copied from the Session logtabsheet and pasted into the Fixed order tabsheet (and possibly modified). Thefixed orders tabsheet can be cleared using the Clear Fixed Orders function in theGroup menu.

The remaining tabsheets contain information on the linkages. The linkages areestimated with maximum likelihood, which sometimes comes down to usingexplicit formulas (population types BC1, DH, DH1, DH2, HAP, HAP1),sometimes to using iterative EM (F2, CP), and sometimes to using Brent'snumerical method (RIx) (cf. Maliepaard et al, 1997; Press et al, 1988). Linkages arecalculated on all pairs of loci; since there will be very many pairs (the number ofloci over two), it is usually not very interesting to have all linkages available.Therefore, two separate tabsheets are provided, one for the Weak and another forthe Strong linkages. The thresholds for what weak and what strong should be are setwith the Calculations Options in the Options menu. Linkages can be estimated aslarger than 0.5; such values cannot be turned into map distances and are substitutedwith the value 0.499. The cause of such estimates often is random sampling;however, larger values (especially combined with larger LOD scores) indicatepossible errors in the coding scheme of one of the loci in the pair, e.g. the a 's wereused instead of b 's and vice versa. Therefore, the Suspect linkages tabsheet will showpairs that have a recombination frequency larger than 0.6 (or whatever set for thisthreshold in the Calculations Options). Finally, the Maximum linkages tabsheet will

18 Using JoinMap

show for each locus its two (or the number set in the Calculations Options) mostclosely linked loci, based on recombination frequency.

From a group node a map can be calculated with the Calculate Map function fromthe Group menu, or by pressing the corresponding tool bar button . The map iscalculated based on the selected (not excluded) sets of loci and individuals. Uponsuccessful completion a mapping node will be produced in the navigation tree (as achild node of the group node) and for each resulting map a map node (as child nodesof the mapping node) (Figure 5).

The pairwise data population group node

The group node of a pairwise data population is somewhat different except for theLoci and Fixed orders tabsheets. The data of the group are now based on the pairwisedata rather than original genotype data. Therefore, the first tabsheet is the Linkagestabsheet giving all the pairwise data for the loci in the group. It also allowsexclusion of specific pairs from the further calculations. In case the pairwise datacome from multiple populations, you can do a test heterogeneity of recombinationrates between populations. The results will be presented in the Heterogeneity test (list)tabsheet and the significant results in detail in the Heterogeneity test (text) tabsheet (thesignificance threshold for this can be set in the Calculation Options). The mapcalculation is started similar to the genotype data population group node. The mapis calculated based on the selected set of loci and the selected set of pairs, andfollows the same procedure as that for a genotype data population.

The heterogeneity test is done in the following way. For each pair of loci the(virtual) numbers of recombinant and non-recombinant gametes can be calculatedfrom its recombination frequency and LOD score. Of pairs for whichrecombination rates were estimated in multiple populations, the total number ofrecombinant and non-recombinant gametes over all populations can be calculatedby totalling the numbers of the individual populations; from this you can obtain themean recombination frequency. The heterogeneity is tested by comparing the(observed) numbers of recombinants and non-recombinants in the individualpopulations with the expected numbers based on the mean recombinationfrequency using a standard G2 statistic (which has a chisquare distribution underthe null hypothesis, with as degrees of freedom the number of populations minusone).

Using JoinMap 19

Map integration

When you have more than one segregating population of a species in whichgenotypes of some or all loci are determined in multiple populations, you cancombine the data from the separate populations in order to calculate an integratedmap. To do this you must load each population into the same project. First youshould calculate and study the individual maps for each population, of course, butwe will skip that here. The navigation tree should have groupings and group nodesfor each population. The groups that relate to the same linkage group withoverlapping locus scores can be combined by right-clicking on the group nodes andapplying the Combine Groups for Map Integration function from the Join menu. Thepairwise recombination frequencies and LOD scores of the selected sets of loci(and selected sets of pairs in the case of pairwise data populations) will becombined into a combined group node in the navigation tree. Such a combined groupnode is identical to a group node of a pairwise data population (see above), exceptthat an Info tabsheet is added. Therefore, we can refer to the pairwise datapopulation group node section above for a description of tabsheets of and actionswith the combined group node.

The map calculations are based on mean recombination frequencies and combinedLOD scores. For each pair of loci the (virtual) numbers of recombinant and non-recombinant gametes in the individual populations are calculated from theestimated recombination frequencies and corresponding LOD scores. The totalnumbers of recombinant and non-recombinant gametes over all populations can becalculated by totalling the numbers of the individual populations. From this youcan obtain the mean recombination frequency and the combined LOD score.

The mapping node and the mapping procedure

The mapping node has a single tabsheet containing the Session log of the mapcalculations, allowing you to study the details of the procedure. The mappingprocedure is basically a process of building a map by adding loci one by one,starting from the most informative pair of loci. For each added locus the bestposition is searched by comparing the goodness-of-fit of the resulting map for eachtested position. When at the best position the goodness-of-fit decreases too sharply(the normalised difference in the goodness-of-fit measure is called a jump, seebelow), or when the locus gives rise to negative distance estimates in the map, thelocus is removed again. This is continued until all loci are handled once. This is theend of the so-called first round. Subsequently, a second attempt is made to add the

20 Using JoinMap

loci previously removed to the map. This can be successful since the map willcontain more loci than at the first attempt because now more pairwise data areused. But it may also be unsuccessful again through too large a jump or negativedistances, so that a locus will be removed once more. This is the end of the secondround. After that, all loci previously removed are added to the map, howeverwithout the constraints of maximum allowed reduction in goodness-of-fit and nonegative distances. This results in a final or third round. Of course, when all loci arefitted there will not be a next round. The results at the end of each round arerepresented by a map node.

In the procedure each map is calculated using the pairwise data of loci present inthe map, but only those that have a recombination frequency smaller than the RECthreshold (0.4 default) and a LOD value larger than the LOD threshold (1.0default). Setting these thresholds to more stringent values (lower REC, higherLOD) results in ignoring more data from the map calculations. After adding a locusto the map, more information than previously available is used for the estimationof map distances for which this locus provides information. Thus, adding a locusmay influence the optimal map order, and to prevent becoming trapped in a localoptimum of the goodness-of-fit an action called ripple is performed each time afteradding one (default) locus. In a ripple all permutations within a moving window ofthree adjacent markers are considered; for each order the map and thecorresponding goodness-of-fit are calculated and the best order is chosen to goahead with. The window moves from one end of the map to the other.

The method of calculating the map is a weighted least squares procedure asdescribed by Stam (1993), with one modification: the squares of the LODs are usedas weights, thereby putting relatively more weight on more informative data. Foreach pair of loci used to calculate the map you have the direct recombinationfrequency estimate (i.e. the pairwise data based on the original genotype data of thetwo loci involved) and the recombination frequency that you can derive from themap (with an inverse mapping function). The goodness-of-fit measure is a G2

likelihood ratio statistic that compares all direct recombination frequencies with themap derived recombination frequencies. The likelihood is based on the (virtual)numbers of recombinant and non-recombinant gametes which are calculated usingthe direct recombination frequencies and their LOD scores. The goodness-of-fitmeasure is expressed as a chisquare value, although it is only roughly distributed aschisquare; a poor goodness-of-fit corresponds with a large chisquare value. Theassociated degrees of freedom is the number of pairs (with a direct estimate) minusthe number of map distances (which is the number of loci minus one). Thenormalised difference in goodness-of-fit chisquare before and after adding a locusis called the jump in goodness-of-fit. A large jump indicates a poor fit of the addedmarker. A threshold value for the jump is used to decide whether or not a locus

Using JoinMap 21

should remain in the map during the first and second rounds in the process ofbuilding the map. Reasonable values for the jump threshold are in the range 3.0 to5.0.

JoinMap allows the use of the two most generally used mapping functions,Haldane's and Kosambi's. The selected mapping function is used to translaterecombination frequency into map distance prior to the weighted least squares mapestimation; the inverse function is used in the goodness-of-fit calculation and in thecalculation of genotype probabilities (see below).

The map node

A map node has several tabsheets. The first three are different representations of themap itself, (1) the map chart, (2) the map as a list or table, and (3) the map in plaintext, each serving its own exporting goal, e.g. for (1) MS-PowerPoint, (2) MS-Excel,(3) MapQTL and MapChart. The map charts can be customised in many waysusing the Map chart options dialog. The charts can extend over several pages(indicated on the Status bar), which can be browsed using the (Ctrl-) PgUp/PgDnkeys or toolbar buttons. Further, the maps can be viewed under variousmagnifications with the + and � keys or zoom buttons . If the charts aremagnified too far to fit in the window, they can be navigated using the (Ctrl-)Home, (Ctrl-) End and (Ctrl-) arrow keys. Further customisation of charts, andcombining map charts with QTL data is possible with JoinMap's separatecompanion software MapChart.

The Mean chisquare contribs. tabsheet shows for each locus the average contributionto the goodness-of-fit.

The Genotype probabilities tabsheet will show (after clicking on the (Re-)calculatebutton) the genotypes with low probability (presented as minus the 10-baselogarithm of the probability, -Log10(P), for which a threshold can be set in theCalculation Options. These probabilities are calculated conditional on the map andconditional on the genotypes of the neighbouring loci. When the genotype of aflanking loci is unknown, the first locus with a known genotype beyond it on themap is used; when there is a known genotype available on one side only, theprobability is calculated conditional on one neighbour only; when there is noknown neighbour available on either side, or when the locus itself is unknown, theprobability is not calculated. For partially unknown genotypes (e.g. dominance orsome CP segregation types), all genotype possibilities are taken into account usingif needed up to 5 loci further on the map. These probabilities may indicate possible

22 Using JoinMap

(but not certain!) genotyping and data entry errors. The subsequent two tabsheetspresent these probabilities averaged over individuals and over loci, respectively.

The final tabsheet will show the Locus genotype frequencies similar to the tabsheet forthe population node, but here the loci are ordered according to the map. It allowsyou to study segregation distortion, which if present should be more or less thesame for loci in the same region on the map.

The pairwise data population map node

The map node has the three tabsheets representing the map itself and it has theMean chisquare contribs. tabsheet. Because the original genotype data are not available,there is no possibility to calculate the conditional genotype probabilities or thelocus genotype frequencies, so the associated tabsheets are not available

External maps

You can load map files into a project, allowing you to compare an external mapwith a map calculated for a segregating population in the project. Map files cancontain more than one linkage group. Loaded maps are represented as map nodes inthe root of the navigation tree. For obvious reasons they will only have the firstthree tabsheets of maps from a genotype data or pairwise data population (see theabove). For the purpose of comparison maps (i.e. maps loaded into the project andmaps from segregating populations in the project) can be combined into a newmap node, displaying multiple linkage groups side by side in the chart. For this, youneed to right-click on the map nodes in the navigation tree and apply the CombineMaps function from the Join menu. The order of selecting the map nodesdetermines the order of the linkage groups in the combined map.

Tutorial 23

TutorialIn this tutorial we will take you through the most important steps of a mappingproject using real life data from an Arabidopsis recombinant inbred line family andsome simulated data.

The first thing to do after starting JoinMap is to create a new project:� use the New Project function from the File menu� you will get a dialog in which you are prompted for a name of the new project

file� go to the DemoData directory, which is a subdirectory of the program

directory (typically C:\Program Files\JoinMap)� enter tutorial in the File name field� click on the Save button.

This will create your project file tutorial.jmp in the DemoData directory, and inaddition the project directory tutorial.jmd, which will contain all internal files ofJoinMap concerning this project. A new project is just a new workspace to storeresults. You will need to load data into the project before you can actually doanything useful. So now load the genotype data file jm20demo.loc into the project:� use the Load Data function from the File menu� in the dialog that follows, click on the jm20demo.loc file (which should be in the

DemoData directory)� click on the Open button.

Now you have the data from this file inside the project; the original file is notneeded for the project anymore. Your JoinMap screen will now resemble Figure 2:notice a population node in the navigation tree and several tabsheets in thecontents-and-results panel. The Info tabsheet will show a summary of the loadeddata and the Data tabsheet has a nicely formatted copy of the original data. Have alook at the Individual genot.freq. tabsheet (by clicking on its tab). You will notice thatit still is empty. Click on the (Re-)calculate button and the results of this analysiswill be shown: for each individual the frequencies of the genotypes over loci are

24 Tutorial

shown. Click on the header of the missing genotypes "�" column; this will sort thelist based on the numbers in this column; click a second time and you will see thatthe list becomes sorted in the opposite direction. You will see that the top threeindividuals (7, 19, 51) have many missing genotypes. These will contribute verylittle information in the map calculations, in fact they might even cause problems.You decide you want to remove these individuals from the further analyses:� select the three individuals in the list, e.g. while holding the control key click on

the three records in the list, the records will become blue� use the Exclude Selected Items function from the Population menu.

The Individual genot.freq. tabsheet is automatically recalculated, and in the Individualstabsheet the three individuals will now be checked in the Exclude column. Verifythis:� in the Individual genot.freq. tabsheet, click on the "�" column and see if the

individuals 7, 19 and 51 are not present anymore� go to the Individuals tabsheet, click on the Exclude column header and see all

excluded individuals together.

Go to the Locus genot.freq. tabsheet. Press the F9 function key to fill the list on thetabsheet. These results enable studying segregation distortion. However,segregation distortion is a normal phenomenon in wide crosses, so be careful inremoving loci, it is better studied after calculating the map. You could, for instance,sort on the X2 column, then select some records above a certain X2 value andapply the Exclude Selected Items function from the Population menu. Anotherpractical usage of this list is sorting on the "�" column and removing the locusgapB which has no genotypes for 38 individuals. Try to do this removal of gapB.

Notice that the i-button in the tool bar is highlighted. Click on it and you willsee a summary of the information that was used in the analysis for the currentlyshown tabsheet. Verify that 3 individuals were excluded here.

Go to the Similarity of loci tabsheet, click on the (Re-)calculate button and sort on thesimilarity column so that the largest values are on top. Notice that several pairs areperfectly identical, with similarity value 1.000. Identical loci will map at exactly thesame position, however they add to the calculation efforts. Therefore, you couldremove the identical loci from the further calculations. But before you do this, youshould store the information on the identical loci and print (or export to file) thepart of the list with these loci:� select the records in the list� use the Print function in the File menu� make sure you pick the Print Selection radiobutton� click on OK.

Tutorial 25

Now you are ready to remove the identical loci; this is simple:� use the Exclude Identicals function from the Population menu.

It will remove the second locus in each pair with a similarity value 1.000. Verify theexclusion on the Loci tabsheet. The Similarity of individuals tabsheet has an identicalfunctionality. In dense map situations it is virtually impossible to obtain identicalindividuals, so this information allows you to discover possible cloned individualsthat should be removed from the further analyses. Under low marker density manyindividuals can and will be identical.

Now you come at the LOD Groupings tabsheets, each is a different view of the sameanalysis. Determining the linkage groups is usually not a straightforward task.Ideally you would like to arrive at a number of linkage groups that is the same asthe number of chromosome pairs of the species you are studying. In practice this isnot easily accomplished because of spurious linkage: just by chance loci ondifferent chromosomes appear to be linked. It used to be advised to take a LODscore of 3 as the threshold deciding whether or not loci were linked. Experiencewith modern data sets with many markers, especially those of species with largenumbers of chromosomes, shows that even using a LOD of 6 may lead to falsepositive linkage. Therefore JoinMap allows you to study the grouping at increasinglevels of LOD (default from LOD 2.0 to 10.0 with steps of 1.0), showing you howgroups fall apart at higher LOD levels. It is advisable to start at a high LOD levelwith more groups than chromosome pairs, calculate the maps, and subsequently trygroupings at lower LODs. If a group consists of loci from more chromosomes thisoften leads to many suspect linkages and to a poor goodness-of-fit of the resultingmap.

Press the F9 function key, and study the tree view. Click inside the tree panel, andexperience navigating the tree with the arrow keys. When a node is highlighted itscontents are shown in the list in the neighbouring panel. When you are ready,restore the original situation by pressing F9 again. Because the data set is ofArabidopsis you would like to end up with five linkage groups. Notice that there isonly one node at LOD 2.0 (the node naming is described in the Using JoinMapchapter): at the LOD 2.0 threshold all loci are significantly linked. At LOD 3.0there are 4 nodes. The lower three nodes are collapsed, which means that the lociin these node stay together even until LOD 10.0. The first node forks at LOD 4.0into two branches, that each do not split further until LOD 9.0. From this we canbe quite certain that the lower three nodes at LOD 3.0 ("3.0/2(30)", "3.0/3(30)","3.0/4(24)") and the upper two nodes at LOD 4.0 ("4.0/1(44)", "4.0/2(37)") willrepresent the five chromosome pairs of Arabidopsis. Select these nodes to preparethem for map calculations:

26 Tutorial

� go to these nodes and use the space bar at them: the nodes become magentaand red when you leave them

� apply the Create Groups for Mapping function in the Population menu.

You JoinMap screen will now look like Figure 4. You will see that the navigationtree gets a grouping node and five group nodes as child and grandchild nodes ofthe population node. Select the grouping node. Notice that the tabsheet of thegrouping node contains a list of loci indicating the group number and groupingnode name in the LOD Grouping. At the bottom of the list are the loci that wereremoved prior to the creation of the grouping; they are given group number 0.

Select group node number 5. Most of the tabsheets are empty. The Data tabsheetcontains some instructions. Press F9. Inspect the tabsheets now. If you want youcan modify the thresholds that determine what is shown in the lists. For instance:� set the weak linkages REC threshold to 0.0 (in the Calculation Options dialog)� press F9� go to the Weak linkages tabsheet� click on the i-button , and verify that now all pairs are shown.

The Suspect linkages tabsheet is empty, so there is no reason to doubt about thegenotype coding in the original loc-file for this group. You are now ready tocalculate the map:� click on the Calculate Map button .

After the map is calculated, the group node in the navigation tree gets a mappingnode and three map nodes as child and grandchild nodes. Inspect the Session log.Notice that mostly the loci are placed on the map close to the locus they have thelargest LOD score with as a pair. Also notice that the loci that are removed in thefirst and second rounds two out of three times have the largest LOD with otherloci than where they appear to fit best on the map; apparently there are somewhatcontradictory pairwise data involved. This is usually not easy to discern in thepairwise data, but in this case try to see the contradiction in recombinationfrequencies between loci 2 (er), 35 (g6842) and 112 (w238) (using the linkagestabsheets of group 5): 2 and 35 have a recombination frequency of 0.0254, whereasthey have nearly equal recombination frequencies with 112 (0.0887 and 0.0842,respectively). Just from these data you will not be able to tell if (and then which) asingle locus is the cause of this, maybe even each contains erroneous genotypes.

Look at the first map node, the results after the first round. The Loci tabsheetcontains a few loci having group number 0: these are the loci the were removedduring the first round, they do not appear in the Map chart nor in the Map text. Goto the Locus genot.freq. tabsheet and calculate the frequencies. Notice the three

Tutorial 27

unmapped loci in the top of the list. Have a good look at the pattern of the realisedsegregation ratios while moving from one locus to the next over the map. Closelylinked loci can't differ much in their segregation ratio, due to linkage of course.Notice that locus 2 (er) is a bit out of the range of its neighbours; its nine missinggenotypes should all be an a genotype to get in the right range, which doesn'tappear to be very random. Go to the Mean chisquare contribs. tabsheet, and noticethat locus 2 also has the largest contribution to the chisquare goodness-of-fitmeasure of the map. A second signal that this locus doesn't fit very well at this mapposition.

Go to the Genotype probabilities tabsheet and press F9. The list gets filled withgenotypes that have a probability of less than one out of hundred (�log10(P)>2).The results point at double recombination events, i.e. recombination took placetwice in neighbouring segments. In this case of a recombinant inbred line familythis means genotypes of three loci (in one individual) either being aba or bab.What is striking, is that locus 2 (er) is involved many times, and that also holds forindividual 43. Clearly, this means that some original genotype scores should beverified.

The second map node is in this case the same as the first map node. In order tocompare the maps of the first and third round, you can create a combined chart:� right-click on node "Map 1" (in the navigation tree)� right-click on node "Map 3"� check that the Join menu is available now� use the Combine maps function from the Join menu.

You will see that a new map node is created containing both maps side by side inthe chart. For the comparison of the segment lengths it is more practical to havetheir lengths shown in the chart. This can be done with one of the many map chartoptions:� use the Map Chart Options function from the Options menu� select the Loci tabsheet� select in the Positions area the radiobutton of the Intervals� click on OK and view the resulting chart.

You could do a fast check to see what happens if locus 2 is removed from themapping data:� go to the group node � exclude locus 2 on the Loci tabsheet� click on the Calculate Map button.

28 Tutorial

A new mapping node and map node appear; as it happens the data set withoutlocus 2 doesn't need more than the first round. But if you check the genotypeprobabilities you will see there are still several improbable genotypes with thisresult.

As a final exercise you will calculate an integrated map. Before you continue, if youhappen to have modified the calculation options you should reset all options to thepreset defaults. Additionally, set the mapping function to Haldane's, because thisfunction was used for the simulation. Load the two loc-files of simulated data of abackcross and an F2, with just two linkage groups and 22 loci:

� use the Load Data function from the File menu� select the file demoBC1.loc (in the DemoData directory) and click Open� use the Load Data function from the File menu� select the file demoF2.loc and click Open� verify that several loci in the F2 are scored in a dominant fashion

(with c 's and d 's)� on the LOD groupings (tree) tabsheet, press F9 and select the two nodes at LOD

2.0 for mapping (by right-clicking, etcetera)� calculate the map for group 2� repeat the previous two steps for the backcross.

Notice that the dominantly scored markers are all in group 2 of the F2. Some ofthese loci have been scored with a 's and c 's, others with b 's and d 's (usually thismeans that the band of the one type is in repulsion phase with the band of theother type in the F1). Just to illustrate the effects of estimating recombinationfrequencies between these types of markers, verify with the Maximum linkagestabsheet that for marker014 the two most closely linked loci (with estimatedrecombination frequency 0.0) are marker016 and marker019; the simulatedrecombination frequencies were 0.16 (=20 cM) and 0.32 (=50 cM !). Also noticethat the F2 needed two rounds, and that the resulting map is not in the simulatedorder (look in the original loc-file for this). It is marker014 added in the secondround that causes the order to change, prior to this the order was the correctsimulated order.

The markers in group 2 of the F2 are the same as those of the backcross. Tocalculate an integrated map you need to combine the data:� right-click on the group nodes (in the navigation tree) of group 2 of the

backcross and the F2� check that the Join menu is available now� use the Combine Groups for Map Integration function from the Join menu

Tutorial 29

� a dialog appears in which you are prompted for a name of the combined group;enter "2 combined"

� click on the OK button.

A new group node is created in the navigation tree. Go to the Heterogeneity (text)tabsheet and press F9. The results appear on the significant differences inrecombination frequency estimates between the two populations. For instance, ontop is the combination number 8, between marker012 and marker020. You canlook up combination number 8 as the serial number (S/n) 8 in the Heterogeneity (list)tabsheet, and the pair numbers 8 and 63 in the Linkages tabsheet. Apparently thereare some significant differences in the recombination between the populationsaccording to these tests; however the data were simulated without such differences.This illustrates the problems with dominance. Let's just continue and calculate themap of this combined node:� click on the Calculate Map button .

Notice that the mapping session just needs a single round and that the order is thesame as for the backcross. Now, let's see what happens when you impose thecombined group map order on the F2:� go to the Session log tabsheet of the mapping node combined group� copy the map in fixed order format at the end of the session log

(the to be copied region starts with the "@" and ends three rows down withthe last marker name)

� go to the Fixed orders tabsheet of the group 2 node of the F2� paste the fixed order into the white region of the tabsheet� click on the Calculate Map button .

The map calculations again need two rounds. Verify the used fixed order in thesession log and check that the final map is according to this order. The chisquaregoodness-of-fit value using the fixed order is only slightly larger than without thefixed order. As a last exercise calculate the maps of these simulated populationsusing Kosambi's mapping function and check out the chisquare goodness-of-fit is abit poorer, which confirms that the data were generated according to the Haldane'smapping function.

30 Tutorial

Data files 31

Data files

General

JoinMap uses plain text files to load the data that must be analysed. A plain text filecan be made with any text editor program. JoinMap uses several types of data files,each containing different kinds of information. Besides the actual data the filescontain instructions that guide the program through the information.

First, there is the locus genotype file (also called loc-file), which contains the genotypecodes for the loci of a single segregating population. For the case in which thepopulation type is not handled directly by JoinMap, or if you only have therecombination frequencies between pairs of loci with their LOD scores (e.g. fromliterature), you can organise the pairwise recombination frequencies into a pairwisedata file (or pwd-file), which can be loaded into JoinMap and used for mapcalculations. If you want to load a map with the positions of loci, possiblycalculated in another JoinMap project, the map file is the file type to use. A loadedmap can be displayed as a chart and can be combined with other maps in theproject, for instance for the purpose of comparison. A map file may also be used incombination with the Prepare Data function in the File menu to sort a loc-fileaccording to the map. The loc-file, pwd-file and the map file have the same formatsas are used for JoinMap version 2.0 (Stam & Van Ooijen, 1995), with the exceptionfor the genotype codes for population type CP; the present version reads andinterprets version 2.0 CP type files correctly, though.

Finally, the Prepare Data function in the File menu can translate a non-JoinMap locusgenotype file (with the genotypes coded in their own system) into a JoinMap codedlocus genotype file. For this purpose it needs a translation file, which defines thetranslation from the private code to the JoinMap code. Additionally, this PrepareData function can transpose a non-JoinMap loc-file into a JoinMap loc-file (the

32 Data files

population data can be regarded as a matrix with rows and columns, hence theterm transpose ; in JoinMap the data for a marker are in the same row, that for anindividual are in the same column). In addition to this all, JoinMap also loads locusgenotype data files that are made up according to the MAPMAKER raw dataformat.

Data file characteristics

Here we give some important general features with respect to the data files forJoinMap. The various data files themselves will be described in detail in subsequentsections.

For the sake of readability the data files may contain extra so-called whitespacewherever found appropriate; this is not allowed, however, within the variousinstructions, indicators, locus and file names, etc.. Whitespace is a sequence of oneor more of the next characters: space, tab, newline (linefeed), carriage-return,vertical-tab and formfeed. The software is indifferent to the use of lower- oruppercase, both in the instructions and in the actual information. It is possible, andgood practice as well, to put relevant comment in a data file. To make a comment lineplace a semicolon ";" at the beginning of the line; to put comment somewhere in aline, place whitespace followed by a semicolon. Anything on the line behind thesemicolon will be ignored by JoinMap.

The layout of the various files is either line-structured or sequential. The choice fora particular layout has to do with readability (by eye) and the amount of data thatbelongs together. Good readability is a proper measure for the prevention oferrors. But occasionally some data groups may be so large that they don't fit on asingle line. Line-structured means that data belonging together have to reside on thesame single line. For instance in the map file, the locus name and its map positionmust be on a single line. Sequential means that the data are read from left to right,from top to bottom, and there is no requirement to group data on a single line. Forinstance in the locus genotype file, the genotype codes belonging to a single locusdetermined in a large population may not fit on a single line, and often have to becontinued over several lines. Of course, it is a good measure to obtain properreadability by suitable spacing.

Some data files contain in the top of the file instructions regarding the contents ofthe data file, e.g. the number of individuals and the number of loci. This part of thefile is called the header. The program is indifferent to the order in which the variousinstructions in the header are given. The header always has a sequential structure.

Data files 33

Some data elements are of fixed length, while others are of variable length. Forinstance, locus names may be up to 20 characters long, but they may also beshorter. In order to read variable-length data fields they must be separated fromother data fields by whitespace. On the other hand, fixed-length data fields neednot be separated by whitespace, although it is allowed (and often to berecommended). For instance, the genotype codes of individuals from onepopulation are all the same size, two characters for cross pollinators (CP) and onefor other population types, and may be given without spacing (though this willresult in poor readability).

The names of loci, linkage groups and populations may be up to 20 characters long.Names cannot include spaces. The (full path) names of files may be up to 255characters long. Lines may be up to 1000 characters wide (this only applies to line-structured data).

Locus genotype file

The locus genotype file (loc-file)contains the information of the loci for a singlesegregating population. It has a sequential structure. The header of the file containsfour instructions on the contents of the data body. The data body contains theactual genotype information for each locus and for all individuals. The fourinstructions define the name of the population (which is for administrative useonly), the type of the population, the number of loci, and the number ofindividuals. These instructions can be given in any order within the header. Thesyntax of the four instructions is:

name = NAME

popt = POPT

nloc = NLOC

nind = NIND

where NLOC and NIND are the numbers of loci and individuals, respectively,NAME is the name of the population (which cannot contain spaces), and POPT isthe code for the population type, which must be one of the codes given in Table 1.

What happens if NIND or NLOC are incorrect? If NIND is incorrect, thenJoinMap will try to interpret part of a locus name as a genotype code, which ingeneral will lead to an error message. If NLOC is larger than the actual number ofloci in the file, then JoinMap will try to read beyond the end of the file, which willalso lead to an error message. If NLOC is smaller than the actual number, then it

34 Data files

Table 1. Population type codes

Type Description

F2 an F2 population: the result of selfing the F1 of a cross between two fullyhomozygous diploid parents

BC1 a first generation backcross population: the result of crossing the F1 of across between two fully homozygous diploid parents to one of the parents

RIx a population of recombinant inbred lines in the x-th generation: the result ofinbreeding an F2 by single seed descent; RI2 is equivalent to an F2

DH a doubled haploid population: the result of doubling the gametes of oneheterozygous diploid individual, linkage phases originally (possibly) unknown

DH1 a doubled haploid population produced from the gametes of the F1 of across between two homozygous diploid parents

DH2 a doubled haploid population: the result of doubling the gametes of an F2population, one doubled gamete from one F2 plant

HAP a haploid population: the gametes (or derived individuals) of oneheterozygous diploid individual, linkage phases originally (possibly) unknown

HAP1 a haploid population derived from the F1 of a cross between two fullyhomozygous diploid parents

CP a population resulting from a cross between two heterogeneouslyheterozygous and homozygous diploid parents, linkage phases originally(possibly) unknown

will issue a warning that there are more data in the file. You might want to exploitthis feature to park loci that you do not want to be used, though this is more easilyaccomplished by excluding loci in a population node.

The data body contains the information for all loci and individuals, grouped perlocus. The data group for a locus consists of the name of the locus, followed by thegenotype codes of all individuals. In between the locus name and the genotypesthere can optionally be up to three additional instructions, depending on the typeof population. JoinMap is indifferent to the order of these instructions. Theinstructions are concerned with the type of segregation of the locus (SEG) (forpopulation type CP), the linkage phases of the locus (PHASE) (for populationtypes CP, DH and HAP), and the type of classification for the locus (CLAS). Inshort, the syntax of a data group for a locus is (optional is indicated with [ ]):

<locus name> [SEG] [PHASE] [CLAS] <NIND genotypes>

Data files 35

Table 2. Genotype codes for population types F2, BC1 and RIx

Code Description

a homozygote as the one parentb homozygote as the other parenth heterozygote (as the F1)c not genotype a (dominant b-allele)d not genotype b (dominant a-allele)� genotype unknown. genotype unknownu genotype unknown

Table 3. Genotype codes for population types DH1, DH2 and HAP1

Code Description

a homozygote or haploid as the one parentb homozygote or haploid as the other parent� genotype unknown. genotype unknownu genotype unknown

Table 4. Genotype codes for population types DH and HAP

Code Description

a the one genotypeb the other genotype� genotype unknown. genotype unknownu genotype unknown

It is important to note that it is absolutely essential that the order of the individualsis identical over all loci in the file. The genotype codes for population types F2,BC1 and RIx are given in Table 2. Those for population types DH1 and HAP1 areidentical to these, albeit that the heterozygous and dominant genotypes areexcluded (Table 3). The genotype codes for a DH or HAP population are identical

36 Data files

to those for DH1 and HAP1, but have a slightly different meaning, since theparentage of the alleles is not relevant (Table 4).

For population types DH or HAP JoinMap automatically determines the linkagephases of the loci in the process of the estimation of the pairwise recombinationfrequencies. The genotype coding scheme is based on the loci to be in coupling inthe parent, i.e. the a 's come from the same one grandparent, the b 's from theother grandparent. However, to allow for linkage phase differences a linkage phaseindicator is used, a phase type. Such a phase type must be one of the followingsingle-letter codes between curly brackets:

{0} or {1}.

For a locus with a phase type 1 the grandparental origin is switched, i.e. the a 'soriginate from the other grandparent, the b 's from the one grandparent. If youhappen to know the linkage phases from other information, you can enter theappropriate phase types for all or part of the loci in the loc-file. Locus pairs withthe same phase code are assumed to be in coupling in the parent, and in repulsionotherwise; subsequently the appropriate recombination estimator will be used.When phase indicators are given, it is still possible to obtain estimates larger than0.5; these will be changed into 0.499.

For population type CP the type of segregation may vary across the loci. Up to fourdifferent alleles may be segregating. Therefore, a code indicating the segregation typemust be given in between the locus name and the genotypes. The segregation typecodes are shown in Table 5. The two characters left of the "x" in these codesrepresent the alleles of the first parent, the two on the right represent those of thesecond parent; each distinct allele is represented with a different character. Thegenotypes for a CP population must be coded with two characters, representing thetwo alleles, per individual. The coding depends on the segregation type, and isshown in Table 6. JoinMap is indifferent to the order of the alleles, so: ac isequivalent to ca. In all cases the ".", the "�", and the u are treated as equivalent,so: h. and hu are both equivalent to h�. Although not required, it isrecommended as a good measure against errors to separate the genotype codes ofindividuals with a space. The two-character codes themselves may not be separatedwith whitespace. The CP coding scheme is enhanced from JoinMap version 2.0,however version 2.0 type files are interpreted correctly by the present version.

Analogous to the population types DH and HAP, JoinMap automaticallydetermines the linkage phases of the loci for both parents during the estimation ofthe recombination frequencies. The genotype coding scheme is based on the alleleson the same position within the segregation type codes to be in coupling in the

Data files 37

Table 5. Segregation type codes for population type CP

Code Description

<abxcd> locus heterozygous in both parents, four alleles<efxeg> locus heterozygous in both parents, three alleles<hkxhk> locus heterozygous in both parents, two alleles<lmxll> locus heterozygous in one parent<nnxnp> locus heterozygous in other parent

Table 6. Genotype codes for a CP population, depending on the segregation type ofthe locus

Seg. type Possible genotypes

<abxcd> ac, ad, bc, bd, �� (no dominance allowed)<efxeg> ee, ef, eg, fg, �� (no dominance allowed)<hkxhk> hh, hk, kk, h-, k-, ��

<lmxll> ll, lm, ��

<nnxnp> nn, np, ��

Remarks:1. each character a to p represents a distinct allele; "�"means unknown allele2. h� and k� are dominant genotypes:

h� means hh or hk, andk� means kk or hk

3. "." and u are treated equivalent to "�"

parent, i.e. the a, e, h and l alleles from the first parent come from the sameone grandparent, the b, f, k and m alleles from the first parent from the othergrandparent. However, to allow for linkage phase differences a linkage phaseindicator is used similar to DH and HAP, but here we need a two-digit phase type, ofwhich the first relates to the one parent and the second to the other. The phasetype must be one of the next two-letter codes between curly brackets:

for the seg. type <lmxll>: {0-} or {1-},for the seg. type <nnxnp>: {-0} or {-1},for the other seg. types: {00}, {01}, {10} or {11}.

38 Data files

Example 1. A locus genotype file for an F2 population

; 12 March 1995

; this is a ridiculously small data file

; but it serves only as an example

name = some_demo!

popt = F2 ; these data are from an F2 population

nloc = 2 ; the file contains data on two loci

nind = 6 ; and six plants

RFLP05 ; this is a locus name

aahba b ; these are the genotypes of the six plants

RFLP67 (a,c) ; classify this locus into a and c

accac a

Locus pairs with the same digit in the first position of their phase types areassumed to be in coupling in the first parent, and in repulsion in the first parentotherwise; for the second position the relation is likewise about the second parent.For instance, if a locus L is of type <hkxhk> {00} and another locus M is <abxcd>{01}, this means that in the first parent the h-allele of L and the a-allele of M are incoupling (and thus also their k- and b-alleles), and that in the second parent the h-allele of L is in repulsion with the c-allele of M (and thus in coupling with the d-allele of M). If you happen to know the linkage phases from other information, youcan enter the appropriate phase types for all or part of the loci in the loc-file inorder to force those linkage phases. The phase type must be given in between thelocus name and the genotypes.

For the Locus genotype frequencies tabsheet the program classifies the genotypesaccording to the usual genotype classes. However, you may wish to classify inanother way, e.g. when there is dominance. Although this is easily done fromwithin the program, a classification type can optionally be given in the loc-file inbetween the locus name and the genotypes to force a certain classification. Theclassification type codes are given in Table 7. The classification type must only begiven, when a classification other than the default is desired. In fact, this is onlynecessary when there is dominance, or in the case of population type RIx. JoinMapdoes not allow classification types other than the default and optional types for thepopulation and/or segregation type. If there is only the default classification type,then a classification type need not and cannot be given. The defaults and theoptions are shown in Table 8.

Data files 39

Example 2. A locus genotype file for a CP type population

; 12 March 1995

; this is another ridiculously small data file

; again, just an example

name = what_a_demo!

popt = CP ; it is a CP type of population

nloc = 3 ; it contains data on three loci

nind = 7 ; and seven plants

RFLP21 <efxeg> {01} ; marker RFLP21 segregates with

; three alleles

ef ee eg fg fg ef eg ; genotypes of the seven plants

RAPD17 <hkxhk> (h-,kk) {00} ; classify into h- and kk

h- h- kk h- kk kk h- ; the seven genotypes in

; identical order as for RFLP21

RFLP34 <nnxnp> {-1} ; the linkage phase at this seg.

; type defines it only for the

; second parent

nn np np np -- ; the autoradiogram was unclear

nn np ; for plantnr 5

Examples 1 and 2 are demonstrations of a locus genotype file.

Pairwise data file

The pairwise data file (pwd-file) contains recombination frequencies of pairs of locitogether with the LOD score. JoinMap can load such a file, which it treats as apopulation. The data can be from various sources and need not come from a singlesegregating population. It can use the data to determine linkage groups, and it cancalculate linkage maps for the derived groups. The layout is line-structured. Theheader contains just one instruction, giving the name of the data set (again foradministrative use only). The syntax of the header is:

name = NAME

in which NAME is the name of the data set (cannot have spaces). There is no need

40 Data files

Table 7. Classification type codes. Ratio is the expected segregation ratio

Code Ratio Classification into genotype classes

(a,b) 1:1 a and b(a,h) 1:1 a and h(a,c) 1:3 * a and c; h and b will be included in class c(h,b) 1:1 h and b(b,d) 1:3 * b and d; a and h will be included in class d(a,h,b) 1:2:1 * a, h, and b(ac,ad,bc,bd) 1:1:1:1 ac, ad, bc, and bd(ee,ef,eg,fg) 1:1:1:1 ee, ef, eg, and fg(hh,k�) 1:3 hh and k�; hk and kk will be included in class k�(h�,kk) 3:1 h� and kk; hh and hk will be included in class h�(hh,hk,kk) 1:2:1 hh, hk, and kk(ll,lm) 1:1 ll and lm(nn,np) 1:1 nn and np

* for RIx the ratios are adjusted according to the generation number x

Table 8. Default and optional classification types

Pop. type Seg. type Default Optional

A classification type is NOT ALLOWED in the data file:

BC1 (a,h) or (h,b) * noneDH (a,b) noneDH1 (a,b) noneHAP (a,b) noneHAP1 (a,b) noneCP <abxcd> (ac,ad,bc,bd) none

<efxeg> (ee,ef,eg,fg) none<lmxll> (ll,lm) none<nnxnp> (nn,np) none

Classification types are ALLOWED in the data file:

F2 (a,h,b) (a,c) or (b,d)RIx (a,b) (a,h,b), (a,c) or (b,d)CP <hkxhk> (hh,hk,kk) (h�,kk) or (hh,k�)

* automatically determined

Data files 41

Example 3. A pairwise data file

; data file created on 14 March 1995

name = example

; the data body is line-structured!

; <1st locus> <2nd locus> <rec> <lod>

loc1 loc2 0.31 2.8

loc1 loc3 0.24 4.6

loc2 loc3 0.15 8.1

loc1 loc2 0.29 2.7

loc1 loc3 0.27 4.1

to instruct JoinMap on the number of pairs in the next part of the file, as these arecounted automatically. Following the header, the recombination is given for pairsof loci, each pair on a separate line. First, the names of the two loci are given, andsubsequently the recombination frequency and the LOD score. The syntax for apair of loci is:

<1st locus name> <2nd locus name> <recombination> <lod>

A small pairwise data file is demonstrated in Example 3. If your happen to havestandard errors of the recombination frequencies instead of LOD scores, you canuse the next formula and a spreadsheet to transform the standard error to a LOD(r: recombination frequency, s: standard error):

LOD = [r*(1-r)/(s*s)] * [log10(2) + r*log10(r) + (1-r)*log10(1-r)].

Map file

The map file contains the map positions of all loci. The map file is strictly line-structured and there is no header. Linkage groups must be started with theinstruction group or chrom on a separate line. On the subsequent lines the loci withtheir map positions must be given in ascending order, one locus with its positionper line. It is not required to start at map position 0.0. A following linkage groupmust start again with the group-instruction. Next to the group-instruction JoinMap

42 Data files

Example 4. A map file

; the file is completely line-structured

group a

;<locus> <map position>

rapd02 0.0

rapd86 11.1

rapd08 15.2

rapd22 17.3

group b

rapd54 0.0

rapd66 15.2

rapd18 22.3

attempts to read a group name of up to twenty characters (no spaces), which, ifavailable, will be used in the output. A small map file is demonstrated in Example4.

Translation file

The Prepare Data function in the File menu can translate locus genotype files thatare coded in a non-JoinMap code, to files in the JoinMap code as described in theLocus genotype file section. A particular translation must be defined in atranslation file. Such a file is line-structured and does not contain a header. Thetranslation can be case-sensitive for the non-JoinMap code (the JoinMap code isnot case-sensitive); for this purpose the translation file must contain a line with theinstruction case-sensitive. The non-JoinMap code can be at most nine characters long.Further, the non-JoinMap code can be of variable-length. This is detectedautomatically by JoinMap. Note that in this case the genotype codes in the file thatis to be translated must be separated by whitespace. The translation file mustcontain for each code translation a separate line with an instruction consisting of asingle non-JoinMap code and the corresponding JoinMap code. The syntax for thetranslation-instruction is:

<non-JoinMap code> -> <JoinMap code>

It is possible to have more than one non-JoinMap codes translate into a singleJoinMap code. When a non-JoinMap code appears in several translation-

Data files 43

Example 5. A translation file

; this is a case-sensitive translation

; the file is completely line-structured

case-sensitive

; <non-JoinMap code> -> <JoinMap code>

; from -> to

MM -> a

Mm -> h

mM -> h

mm -> b

m? -> c

?m -> c

M? -> d

?M -> d

* -> u

; the last "from"-field is shorter than the others,

; so the translation has variable-length input fields

instructions, then only the first one will be used. A translation file is demonstratedin Example 5.

Non-JoinMap locus genotype file

A non-JoinMap locus genotype file (loc-file) is a file with the genotype codes of asingle segregating population, in which the genotypes may be coded according to aprivate (non-JoinMap) code, and/or in which the genotype codes may be groupedfor individuals instead of grouped for loci. The Prepare Data function in the Filemenu is the only place in JoinMap that can handle a non-JoinMap loc-file and canturn it into a JoinMap loc-file.

The header of the file consists of five to seven instructions, depending on thesituation, among which at least the four instructions of a JoinMap loc-file. The filestructure is completely sequential, but for explanatory reasons it is best, for now, to

44 Data files

look at the data body part of the file as a matrix. In a JoinMap loc-file the rows ofthe matrix represent the loci and the columns represent the individuals. For a non-JoinMap loc-file it is allowed (not compulsory) to have it the other way around, i.e.the rows represent the individuals and the columns the loci. When the data areorganised this way, the header must contain the instruction transpose. Additionally,when the data are to be transposed, it is possible to instruct JoinMap to skip one ormore of the leftmost data (or text) columns with the instruction skip (see below).These columns are allowed to be up to 99 characters wide, and must be separatedby whitespace. When the data are coded in a non-JoinMap code, the header mustcontain the instruction translate = <translation file name>. This instruction mustinclude the name of the translation file. The file name may include a directory path.When there is no directory path in the filename, JoinMap will try to find it in thedirectory of the non-JoinMap loc-file. In summary, the syntax of the header of anon-JoinMap loc-file is ([ ] indicate optional):

[ transpose [ skip = SKIP ] ]

[ translate = <translation file name> ]

name = NAME

popt = POPT

nloc = NLOC

nind = NIND

in which NAME, POPT, NLOC and NIND are the usual as indicated in the Locusgenotype file section, and SKIP is the number of data columns to skip. When thedata are organised in the transposed way, the top of the data body must consistfirst of all the titles of the data columns. These titles are the locus names, but mustbe preceded with the titles of the columns that are to be skipped as instructed inthe header. Subsequently and optionally, the segregation, phase, and/orclassification types, if applicable, can be given. Note that when a certain type isgiven, these types must be given for all loci (of course, no types for the columns tobe skipped). If for some loci the exact types are unknown or, for the classificationtypes, are the default types as indicated in Table 8, you may use empty types withjust the angled, curly or round brackets. Examples 6 and 7 are illustrations of non-JoinMap locus genotype files.

Data files 45

Example 6. A non-JoinMap locus genotype file (1)

; this file only needs to be translated

translate = xlate.f2 ; xlate.f2 is shown in Example 5

name = escul_x_peruv

popt = F2

nind = 5

nloc = 2

RFLP12 Mm MM mm M? * ; variable-length fields: the data

RFLP33 MM MM mm Mm mM ; must be separated by whitespace

Example 7. A non-JoinMap locus genotype file (2)

; this file needs to be transposed

transpose skip=1 ; skip the column with the

; plant numbers

name=tuber_x_spegaz popt=CP nind=5 nloc=3

; next to the header information come the column titles

plantnr RFLP56 RFLP73 RFLP95

; following the column titles come the types, optionally only,

; but if given, then for all loci!

<hkxhk> <efxeg> <abxcd> ; segregation types

(hh,k-) () () ; classification types

; the file has a sequential structure, but is presented as a

; matrix, with on each line all data of one plant

93_1 hh ee ad

93_5 hh eg ac

93_14 k- eg --

93_42 hh ef bd

93_106 k- fg ad

46 Data files

Table 9. Default file name extensions

File Extension

locus genotype file .locmap file .mappairwise data file .pwd

Default file name extensions

For ease of use we have introduced default file name extensions for the variousfiles. The default extensions are given in Table 9.

Tables, examples and references 47

Tables, examples

and references

List of figuresFigure 1. User interface 3Figure 2. A population node (jm20demo) with its various tabsheets 4Figure 3. The creation of groups for mapping 4Figure 4. A grouping node with five group nodes 5Figure 5. The navigation tree after the map calculations for Group 2 6

List of tablesTable 1. Population type codes 34Table 2. Genotype codes for population types F2, BC1 and RIx 35Table 3. Genotype codes for population types DH1, DH2 and HAP1 35Table 4. Genotype codes for population types DH and HAP 35Table 5. Segregation type codes for population type CP 37Table 6. Genotype codes for a CP population, depending on the segregation type of

the locus 37Table 7. Classification type codes. Ratio is the expected segregation ratio 40Table 8. Default and optional classification types 40Table 9. Default file name extensions 46

48 Tables, examples and references

List of examplesExample 1. A locus genotype file for an F2 population 38Example 2. A locus genotype file for a CP type population 39Example 3. A pairwise data file 41Example 4. A map file 42Example 5. A translation file 43Example 6. A non-JoinMap locus genotype file (1) 45Example 7. A non-JoinMap locus genotype file (2) 45

References

Maliepaard C., J. Jansen & J.W. Van Ooijen, 1997.Linkage analysis in a full-sib family of an outbreeding plant species: overview andconsequences for applications. Genetical Research 70: 237-250.

MapCharthttp://www.joinmap.nl.

MapQTLhttp://www.mapqtl.nl.

Press, W.H., B.P. Flannery, S.A. Teukolsky & W.T. Vetterling, 1988.Numerical recipes in C. Cambridge University Press, Cambridge.

Stam, P., 1993.Construction of integrated genetic linkage maps by means of a new computerpackage: JoinMap. Plant Journal 3: 739-744.

Stam, P. & J.W. Van Ooijen, 1995.JoinMap (tm) version 2.0: Software for the calculation of genetic linkage maps.CPRO-DLO, Wageningen.

http://www.joinmap.nl/

http://www.mapqtl.nl/

Index 49

IndexCalculate Map 5, 7, 18, 26, 27, 29Calculation Options 3, 12, 14, 18, 21,

26, 28case-sensitive 42, 43classification type 38, 40, 44, 45classification type codes 38, 40Clear Fixed Orders 17Combine Groups 8, 12, 19, 28Combine Maps 8, 12, 22combined group 8, 19, 29combined map 22comment line 32contents-and-results panel 2, 3, 11, 13,

23Create Groups for Mapping 4, 7, 15, 26Data File Characteristics 32data files 2, 11, 13, 31, 32default classification types 38default file name extensions 46degrees of freedom 15, 18, 20df 15diploid 1, 34Environment Options 12Exclude Identicals 14, 25Exclude Selected Items 14, 24export 2, 6, 21file

license 1loc- 2, 13, 26, 28, 31, 33, 36, 38, 43locus genotype 2, 13, 31, 32, 33, 38,

39, 42, 46

map 8, 13, 22, 31, 32, 41, 42, 46non-JoinMap locus genotype 31, 43,

44, 45pairwise data 7, 13, 31, 39, 41, 46pwd- 13, 31, 39translation 31, 42, 43, 44

file name extensions 46file structure 43first round 6, 19, 26, 28fixed order 5, 17, 29Fixed orders tabsheet 5, 7, 17, 18, 29fixed-length 33genotype codes 2, 13, 31, 32, 33, 34, 35,

36, 37, 42, 43genotype data population 2, 7, 13, 16, 18genotype probabilities 7, 21, 22, 28Genotype probabilities tabsheet 6, 21,

27goodness-of-fit 5, 6, 19, 20, 21, 25, 27,

29group name 2, 42group node 2, 3, 5, 7, 8, 11, 12, 14, 16,

18, 19, 26, 27, 28, 29grouping node 5, 7, 15, 16, 26group-instruction 41Haldane 21, 28, 29header 12, 24, 32, 33, 39, 41, 42, 43, 44,

45heterogeneity 7, 18Heterogeneity test tabsheet 7, 18, 29Individuals tabsheet 3, 13, 14, 24

50 Index

Info tabsheet 3, 7, 8, 12, 13, 16, 19, 23installation 1Integration 8, 12, 19, 28JOINMAP.LIC 1, 2jump 5, 6, 19, 20jump threshold 21key combinations 11Kosambi 21, 29layout 32, 39length of names 2license file 1line-structured 32, 33, 39, 41, 42, 43linkage phase 17, 34, 36, 37, 38, 39linkages 5, 7, 12, 17, 18, 26, 28, 29Load Data 2, 13, 23, 28loc-file 2, 13, 26, 28, 31, 33, 36, 38, 43Loci tabsheet 5, 7, 13, 16, 17, 25, 26, 27Locus genot. freq. tabsheet 3, 14locus genotype file 2, 13, 31, 32, 33, 38,

39, 42, 46locus genotype frequencies 6, 7, 22, 38LOD Groupings tabsheet 3, 4, 7, 12, 14,

16, 25LOD Groupings tree 4, 7, 15LOD Score 15-Log10(P) 21map chart 1, 6, 21, 27Map chart tabsheet 6, 21, 26map file 8, 13, 22, 31, 32, 41, 42, 46Map Integration 8, 12, 19, 28map node 5, 6, 7, 8, 12, 13, 18, 20, 21,

22, 26, 27, 28mapping function 20, 21, 28, 29mapping node 5, 7, 18, 19, 26, 28, 29Mapping Procedure 19Maximum linkages tabsheet 5, 17, 28Mean chisquare contribs. tabsheet 6, 7,

21, 22, 27Move Selected Loci 16name-instruction 33, 39, 44navigation panel 2, 11

navigation tree 2, 3, 5, 6, 7, 8, 11, 12, 13,14, 16, 18, 19, 22, 23, 26, 27, 28, 29

negative distance 5, 6, 19nind-instruction 33, 34, 44nloc-instruction 33, 44node

group 2, 3, 5, 7, 8, 11, 12, 14, 16, 18,19, 26, 27, 28, 29

grouping 5, 7, 15, 16, 26map 5, 6, 7, 8, 12, 13, 18, 20, 21, 22,

26, 27, 28mapping 5, 7, 18, 19, 26, 28, 29population 2, 3, 4, 5, 6, 7, 11, 13, 15,

16, 22, 23, 26, 34non-JoinMap code 42, 43, 44non-JoinMap locus genotype file 31, 43,

44, 45optional classification types 40Pairs tabsheet 7, 16pairwise data file 7, 13, 31, 39, 41, 46pairwise data population 2, 7, 8, 13, 16,

18, 19, 22phase type 34, 36, 37, 38plain text 6, 21, 31popt-instruction 33, 44population node 2, 3, 4, 5, 6, 7, 11, 13,

15, 16, 22, 23, 26, 34population type 7, 13, 17, 31, 33, 34, 35,

36, 37, 38population type codes 34Prepare Data 31, 42, 43print 2, 3, 14project 2, 3, 8, 11, 12, 13, 19, 22, 23, 31pwd-file 13, 31, 39ripple 20second round 6, 20, 21, 26, 28segregation distortion 3, 14, 15, 22, 24segregation type 21, 34, 36, 37, 38, 45segregation type codes 36, 37sequential 32, 33, 43, 45session log 17, 29Session log tabsheet 5, 7, 17, 19, 26, 29

Index 51

Similarity of individuals tabsheet 3, 14,25

Similarity of loci tabsheet 3, 14, 24skip-instruction 44special keys 11Strong linkages tabsheet 5, 17suspect linkages 25Suspect linkages tabsheet 5, 17, 26tabsheet

Fixed orders 5, 7, 17, 18, 29Genotype probabilities 6, 21, 27Heterogeneity test 7, 18, 29Individuals 3, 13, 14, 24Info 3, 7, 8, 12, 13, 16, 19, 23Loci 5, 7, 13, 16, 17, 25, 26, 27Locus genot. freq. 3, 14LOD Groupings 3, 4, 7, 12, 14, 16,

25Map chart 6, 21, 26Maximum linkages 5, 17, 28Mean chisquare contribs. 6, 7, 21, 22,

27Pairs 7, 16Session log 5, 7, 17, 19, 26, 29Similarity of individuals 3, 14, 25Similarity of loci 3, 14, 24Strong linkages 5, 17Suspect linkages 5, 17, 26Weak linkages 5, 17, 26

third round 6, 20, 27translate-instruction 44translation file 31, 42, 43, 44translation-instruction 42transpose 31, 44, 45variable-length 33, 42, 43, 45weak linkages 26Weak linkages tabsheet 5, 17, 26whitespace 32, 33, 36, 42, 44, 45

joinmap manual

Documents

plant research

genotype coding

linkage phases

pairwise recombination

direct recombination

linkage phase

population

genetic linkage