vg_recovery_proc

Upload: yucesoyb

Post on 08-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 vg_recovery_proc

    1/7

    People showed me two alternatives to recreate /etc/lvmtab:

    1) vgscan ( many people suggested this one, but seeing the manpage

    it has so many warnings and "this is a last resortcommand"

    stuff that I 'll try to avoid it. Anyone who reallyused it?)

    2) vgexport and then vgimport

    The problem comes from incorrect count of PVs between lvmtab and the kernel.

    (kernel thinks it has a disk it hasn 't).

    The other alternative is to run vgreduce -f vgnameThis command searchs and forcibly eliminates disk it can 't find.In my case, the problem was that I left one mirror copy of a logical volumepointing to that disk. To remove it:

    lvdisplay -k /dev/vgname/lvname (you need -k in order to show the pv numberinstead of "/dev/dsk/c_t_d_" because

    allyou have there is "?????")

    lvreduce -k -m 0 /dev/vgname/lvname pvnumber

    And now that the physical volume was truly empty, I run vgreduce -fand that made the trick.

    The "standard procedure" to replace a disk, according to most people, is

    - remove the bad drive

    - put the new one- pvcreate ...- cgcfgrestore -n /dev/vgname /dev/dsk/c_t_d_- vgchange -a y /dev/vgname- vgsync /dev/vgname

    Regards,

    Martin

    -----Original Message-----Today, I 've tried to lvremove a logical volume and got an error

    The same that I paste here from a vgcfgbackup:

    sid02_en_06#vgcfgbackup vg03

    vgcfgbackup: /etc/lvmtab is out of date with the running kernel:Kernelindicates4 disks for "/dev/vg03"; /etc/lvmtab has 3 disks.

    Cannot proceed with backup.

  • 8/7/2019 vg_recovery_proc

    2/7

    I think it is a consecuence of a change of disks I 've made last week. Thediskwas destroyed (yes, it literaly stoped functioning, not even seen inioscan).I lvreduced the LVs, vgreduce the VG and took out the disk, but now I havethis.

    What can I do to recover ?For next time, what procedure you follow to safely remove a destroyed disk ?(obviusly, I have everything mirrored).

    lvmtab is out of date with running kernel

    PROBLEMWhen doing any LVM command for a particular volume group it errors with:

    vgcfgbackup: /etc/lvmtab is out of date with the running kernel:Kernel indicates # disks for "/dev/vg_name"; /etc/lvmtab has # disks.

    Cannot proceed with backup.

    RESOLUTIONThe above error indicates a serious problem with the volume group. No changesshould be made to the volume group configuration prior to repairing the volumegroup.

    This error indicates that vgdisplay(1m) information Cur PV and Act PVdisagree. Cur PV and Act PV should always agree for the volume group. Thiserror also indicates that the /etc/lvmtab file, which is used to match whichphysical volumes belong to a volume group, is out of date with the LVM datastructures in memory and on disk. vgcfgbackup(1m) cannot complete

    successfully whenever the number of current physical volumes disagrees with thenumber of active physical volumes. Modifying the volume group while in thisstate could cause the vgcfgbackup(1m) backup of the volume group to beinconsistent with the volume group itself and resulting in a more difficultrepair/recovery process.

    Each physical volume of each a volume group has a counter indicating the numberof physical volumes currently within the volume group. This information iscontained within the disk's volume group reserve area (VGRA). The above errorindicates the information within the VGRA shows a different number of physicalvolumes than the system currently sees attached to this volume group. Atvolumegroup activation time the /etc/lvmtab file is used by the system to know what

    physical volumes belong to each volume group.

    This document will explain what to look at and how to repair this situation.Use the following steps to isolate and repair the problem:

    1. Try to locate missing disk device(s).

    Isolating what happened to the volume group to get it into this state canbe very difficult. Here are some suggestions:

  • 8/7/2019 vg_recovery_proc

    3/7

    a. Use the command strings /etc/lvmtab orvgdisplay -v /dev/vg_name to see what disk devices arecurrently attached to the volume group.

    b. Check the date of and physical volumes contained in the last goodvgcfgbackup.

    Use: ll /etc/lvmconf/VG_NAME.conf to see the date of the lastgood backup.

    NOTE: If the volume group has been modified since the time of thelast good vgcfgbackup, then there is the potential that the backup fileis out of date with the LVM data structures on the disk(s) attached tothis volume group. If this is the case then vgcfgrestore(1m)may no longer work for this volume group.

    Use: vgcfgrestore -n /dev/VG_NAME -l to see the list ofphysical volumes contained within the last good backup.

    Use the list from the above vgcfgrestore(1m) command to compare

    to the list from step 1a to see if there are differences. If there isa physical volume listed in the backup listing that is not in the/etc/lvmtab file, you may be able to vgcfgrestore(1m) to thatphysical volume. Make sure the physical volume is unused beforeoverwriting with vgcfgrestore. See step 2 for details.

    c. Check the system for old copies of /etc/lvmtab file.

    A common reason for the system to be in this state is that the lvmtabhas been recreated while unable to communicate with one or more of thephysical volumes belonging the the volume group. If the lvmtab isrecreated while the system is unable to query a physical volume, thatphysical volume will not be added to the lvmtab file. One can see how

    this can cause the lvmtab to mismatched with kernel memory.

    To check for old copies of /etc/lvmtab using the following:ll /etc/lvmtab* or ll /tmp/lvmt*

    Use strings(1)on the backup lvmtab file to try to determinewhich disk device(s) in the volume group differ in comparisonwith the current lvmtab file.

    If the missing physical volume cannot be determined or the missing physicalvolume cannot be readded to the problem volume group then skip to Step

    3.

    Reasons the physical volume could not be readded might be it has been addedto another volume group and cannot be removed or is no longer physicallyconnected to this system.

    2. If possible, restore the missing physical volume into the volume group.

    a. Verify the missing physical volume and it's alternate path(s), ifnecessary, are not in use.

    Use strings /etc/lvmtab to verify that the physical volume and

  • 8/7/2019 vg_recovery_proc

    4/7

    any of it's alternate paths do not belong to any other volume groups orare not mounted or in use by applications on the system. A commonerror that can lead to this type of problem is when an alternate pathis added to a different volume group than the primary path. Consultyour disk device's manuals to determine if it is capable of usingpvlinks or alternate paths. The system will not allow an alternatelink to a different volume group unless pvcreate -fis executed. pvcreate(1m) is not needed to add an alternatepath to a volume group and should be only run on the primary path.

    b. If missing physical volume(s) and alternate path(s) are not in use thenuse vgcfgrestore(1m) to restore the physical volume(s) to thevolume group.

    Example: vgcfgrestore -n /dev/vg03 /dev/rdsk/c2t7d2

    c. If needed restore or recreate /etc/lvmtab.

    If the /etc/lvmtab does not contain the physical volume(s) that werevgcfgrestored to, then this file must be updated. If the lvmtab shows

    the correct physical volumes then skip this step.

    If the /etc/lvmtab does not show the correct physical volumes and youwere able to find an old /etc/lvmtab file in the previous step then savethe current version of /etc/lvmtab and copy the old lvmtab backup fileinto place. Use the strings(1) command to insure all volumegroups show the correct physical volume before changing the lvmtab file.

    If the /etc/lvmtab file is not correct and there is no old copy of/etc/lvmtab that is correct use vgexport(1m) and vgimport(1m)

    to correct the lvmtab.

    NOTE: For the root volume group, typically vg00, you must first

    boot into lvm maintenance mode. See below for details

    Overview:1. vgchange -a n /dev/vg_name2. vgexport -m /tmp/mapfile /dev/vg_name3. mkdir /dev/vg_name4. mknod /dev/vg_name/group c 64 0x0X0000

    NOTE: The minor number (0x0X0000) must be unique for each volumegroup. Substitute X for a number not in use on the system.Use: ll /dev/*/group to see existing group files on thesystem.

    Example: mknod /dev/vg01/group c 64 0x010000

    5. vgimport -m /tmp/mapfile /dev/vg_name /dev/dsk/pv_name

    NOTE: The above commmand requires each of each physical volume involume group to be specified at the end of the command. Thisallows the lvmtab to be correctly rebuilt with all physicalvolumes belonging to the volume group.

  • 8/7/2019 vg_recovery_proc

    5/7

    d. Activate the volume group.

    Use: vgchange -a y /dev/vg_name to activate the volume groupafter the restoring any missing physical volumes. If all was completedcorrectly then vgdisplay /dev/vg_name should show Cur PV and ActPV now agree.

    e. Get a backup of the volume group.

    Use vgcfgrestore /dev/vg_name to insure there is a goodvolume group backup now that Cur PV and Act PV agree.

    3. If you cannot locate the missing disk device or cannot restore thatdevice back into the volume group, then use vgreduce(1m) to forciblyreduce out the missing physical volume.

    NOTE: vgreduce -f should be used as a last resort. Ifvgcfgrestore(1m) cannot be used to make the Cur PV and Act PV agree,then vgreduce -f may be required. Here are the steps to successfullyuse vgreduce -f:

    a. Get a list of logical volumes belonging to the volume group.

    Use: vgdisplay -v /dev/vg_name to get a list of logical volumesfor the volume group.

    b. Find out which logical volume(s) reside on the disk device(s) to beforcibly reduced.

    Use lvdisplay -v /dev/vg_name/lv_name | more to see if anyof the logical volumes extents show ??? in the PV section. Pagethrough every logical extent for each logical volume in the volumegroup. ??? indicate that the extents shown reside on a physical

    volume that the system is unable to query. Any logical volume with??? will have to be removed using lvremove(1m) in order forvgreduce -f to complete successfully.

    c. Remove logical volumes with ??? in their lvdisplay(1m) output.

    Since logical volumes that show ??? have missing or unavailable datathey will have to removed. In order for vgreduce -f to succeedall logical volumes with extents on the physical volume to be reducedmust first be removed. Once the volume group is in the correct state,Cur PV = Act PV, the logical volumes can be recreated and any lost datarestored.

    Use: lvremove /dev/vg_name/lvol_name

    d. Forcibly reduce out the physical volume.

    Use: vgreduce -f /dev/vg_name

    NOTE: The above command does not require a physical volume argument. Itmust be run on a active volume group.

  • 8/7/2019 vg_recovery_proc

    6/7

    e. If the vgreduce -f command does not work or does not give anyerror and vgdisplay(1m) still shows that Cur PV and Act PVdisagree then use the following steps to vgexport and vgimport thevolume group prior to trying Step 3d again.

    This procedure can be used when vgreduce fails to reduce a physicalvolume that can no longer be queried by the system. If executing thefollowing procedure on the root volume group, usually vg00, you mustfirst boot into LVM maintenance mode (** For steps see below).

    1. Get the /dev/vg_name/group minor number and physical volumesbelonging to the volume group.

    Use: ll /dev/vg00/group to get 0x###### minor number.vgdisplay -v /dev/vg_name to get physical volumes.

    2. vgchange -a n /dev/vg_nameNOTE: Skip this step if booting maintanence mode for root volume

    group.

    3. vgexport -m /mapfile /dev/vg_name

    4. mkdir /dev/vg_name

    5. mknod /dev/vg_name/group c 64 0x0#0000Re-use minor number obtained from step 1.

    6. vgimport -m /mapfile /dev/vg_name pv_name [pv_name ...]

    NOTE: Specify all the physical volumes obtained from step 1. Do notinclude the physical volume that you are trying to remove orthat couldn't be queried.

    ** Steps to boot into maintenance mode and active. :1. shutdown -hy now2. interrupt boot sequence3. boot from primary boot path and interact with ISL

    NOTE: Procedure used for steps 2 and 3 may very slightly dependingon machine model.

    4. enter the following at the IPL> prompt:IPL> hpux -lm (;0)/stand/vmunix

    f. Retry the vgreduce -f command specified in step 3d.

    This time the vgreduce should succeed and give you a messagesimilar to: "PV with key # sucessfully deleted from vg /dev/vg_name".It should also display:

    Repair done, please do the following steps.....:1. save /etc/lvmtab to another file2. remove /etc/lvmtab3. use vgscan(1m) -v to recreate /etc/lvmtab

  • 8/7/2019 vg_recovery_proc

    7/7

    4. NOW use vgcfgbackup(1m) to save the LVM setup

    Follow the above 4 steps.

    vgdisplay /dev/vg_name should now show Cur PV and Act PVagree.