Discussion:
Broken zfs setup - log device lost. How to import the pool?
Igor Hjelmstrom Vinhas Ribeiro
2012-06-23 17:04:29 UTC
Permalink
Hi!

I lost a log device and a cache device of an (exported, offline) zpool
version 23.

I am unable to import it now:
*
*
*root:~/ # zpool import*
* pool: igorhvr-data*
* id: 269256866566772131*
* state: UNAVAIL*
*status: One or more devices are missing from the system.*
*action: The pool cannot be imported. Attach the missing*
* devices and try again.*
* see: http://www.sun.com/msg/ZFS-8000-6X*
*config:*

* igorhvr-data UNAVAIL missing device*
* mirror-0 ONLINE*
* disk/by-id/dm-name-igorhvr-data-0 ONLINE*
* disk/by-id/dm-name-igorhvr-data-1 ONLINE*

* Additional devices are known to be part of this pool, though their*
* exact configuration cannot be determined.*
*root:~/ # zpool import -f igorhvr-data*
*cannot import 'igorhvr-data': one or more devices is currently unavailable*
* Destroy and re-create the pool from*
* a backup source.*

zpool import -F fails with the same error message. Both data devices are
ok and where not touched (except for the failed import attempt above).
Supposedly the pool should be recoverable (since it is version 23), based
on what I read.

Is this true? Assuming it is possible to recover it, what method would
be most advised? The options I am currently considering are:

- Retrying the import under a recente FreeBSD version (perhaps a
version-28 toolset would have a better chance of importing the pool);
- Some (rather painful) variation of the method described at the bottom
of http://forums.freebsd.org/showthread.php?t=18221 to build a (fake) log
device that can be used to import the pool;
- Remove the GUID SUM verification code (basically comment out this
section):
* /*
* If the vdev guid sum doesn't match the uberblock, we have an
* incomplete configuration.
*/
if (mosconfig && type != SPA_IMPORT_ASSEMBLE &&
rvd->vdev_guid_sum != ub->ub_guid_sum)
return (spa_vdev_err(rvd, VDEV_AUX_BAD_GUID_SUM, ENXIO));*

and re-attempt the import (and hope the pool will be loaded in a degraded
state, my guess).

Any advice on what road to take and/or other ideas I could try?

Best Regards,
--
igorhvr
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
Igor Hjelmstrom Vinhas Ribeiro
2012-06-23 21:44:19 UTC
Permalink
All,

   Problem solved. I am writing this in case someone has the same
problem in the future trying to zpool import a pool without a log
device.

   After debugging zfs-fuse a bit (printf and zfs-fuse -n are my
friends, I guess) to understand better where the problem was, I ended
up doing the following change (in libzpool/vdev.c):

        /*
         * If this is a top-level vdev, initialize its metaslabs.
         */
        if (vd == vd->vdev_top && !vd->vdev_ishole &&
            (vd->vdev_ashift == 0 || vd->vdev_asize == 0 ||
-           vdev_metaslab_init(vd, 0) != 0))
-               vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN,
-                   VDEV_AUX_CORRUPT_DATA);
+            vdev_metaslab_init(vd, 0) != 0)) {
+         printf("\nFound corrupted top level vdev.");
+         //vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN,
+         //        VDEV_AUX_CORRUPT_DATA);
+       }

Basically I commented out the location that marked the state of the
log vdev as broken. With that done, the pool will zpool import -F
even without the log device, in an slightly odd state (missing was not
the name of the log device, this comes from VDEV_TYPE_MISSING):

root:~/ # zpool status igorhvr-data
pool: igorhvr-data
state: UNAVAIL
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
igorhvr-data UNAVAIL 0 0
0 insufficient replicas
mirror-0 DEGRADED 0 0 0
disk/by-id/dm-name-igorhvr-data-0 ONLINE 0 0 0
disk/by-id/dm-name-igorhvr-data-1 OFFLINE 0 0 0
missing-1 ONLINE 0 0 0
cache
mapper/cache UNAVAIL 0 0
0 cannot open

errors: No known data errors
root:~/ # ls /igorhvr-data
aasylum dt fileList.txt floating-asylum ildata u wqasylum www.iasylum.net

Still, despite the unavailable state, everything seems to be
working fine. In particular zfs send works well, so I was able to zfs
send my data to a sane place...

Regards,
--
igorhvr

On Sat, Jun 23, 2012 at 2:04 PM, Igor Hjelmstrom Vinhas Ribeiro
Hi!
   I lost a log device and a cache device of an (exported, offline) zpool version 23.
root:~/ # zpool import
  pool: igorhvr-data
    id: 269256866566772131
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
        igorhvr-data                           UNAVAIL  missing device
          mirror-0                             ONLINE
            disk/by-id/dm-name-igorhvr-data-0  ONLINE
            disk/by-id/dm-name-igorhvr-data-1  ONLINE
        Additional devices are known to be part of this pool, though their
        exact configuration cannot be determined.
root:~/ # zpool import -f igorhvr-data
cannot import 'igorhvr-data': one or more devices is currently unavailable
        Destroy and re-create the pool from
        a backup source.
   zpool import -F fails with the same error message. Both data devices are ok and where not touched (except for the failed import attempt above). Supposedly the pool should be recoverable (since it is version 23), based on what I read.
    - Retrying the import under a recente FreeBSD version (perhaps a version-28 toolset would have a better chance of importing the pool);
    - Some (rather painful) variation of the method described at the bottom of http://forums.freebsd.org/showthread.php?t=18221 to build a (fake) log device that can be used to import the pool;
        /*
         * If the vdev guid sum doesn't match the uberblock, we have an
         * incomplete configuration.
         */
        if (mosconfig && type != SPA_IMPORT_ASSEMBLE &&
            rvd->vdev_guid_sum != ub->ub_guid_sum)
                return (spa_vdev_err(rvd, VDEV_AUX_BAD_GUID_SUM, ENXIO));
 and re-attempt the import (and hope the pool will be loaded in a degraded state, my guess).
   Any advice on what road to take and/or other ideas I could try?
Best Regards,
--
igorhvr
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
sgheeren
2012-06-23 22:18:36 UTC
Permalink
Igor,

thanks a bunch for sharing this information. Perhaps it would be good to
share this information on the zfsonlinux list
(zfs-discuss-VKpPRiiRko4/***@public.gmane.org) too, as it is probably not specific to
zfs-fuse. (I don't know whether you acually ended up trying using
OpenSolaris/BSD/zfsonlinux ports).

This looks like rather am unintentional point of 'unrecoverable failure'
for ZFS, so it might need to be fixed. I'm not going to see about fixing
it (as far as I'm aware using log/cache devices is rather uncommon using
fuse-based zfs). I have the impression that zfsonlinux is actively being
used in quite large hardware scales (in fact, LLNL appears to have
developed the linux port for precisely that reason) so this will be more
relevant to their port, in a way.

Looks like quite a nice feat of troubleshooting you have achieved there
anyways. Remember to scrub your data :)

Seth
Post by Igor Hjelmstrom Vinhas Ribeiro
All,
Problem solved. I am writing this in case someone has the same
problem in the future trying to zpool import a pool without a log
device.
After debugging zfs-fuse a bit (printf and zfs-fuse -n are my
friends, I guess) to understand better where the problem was, I ended
/*
* If this is a top-level vdev, initialize its metaslabs.
*/
if (vd == vd->vdev_top&& !vd->vdev_ishole&&
(vd->vdev_ashift == 0 || vd->vdev_asize == 0 ||
- vdev_metaslab_init(vd, 0) != 0))
- vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN,
- VDEV_AUX_CORRUPT_DATA);
+ vdev_metaslab_init(vd, 0) != 0)) {
+ printf("\nFound corrupted top level vdev.");
+ //vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN,
+ // VDEV_AUX_CORRUPT_DATA);
+ }
Basically I commented out the location that marked the state of the
log vdev as broken. With that done, the pool will zpool import -F
even without the log device, in an slightly odd state (missing was not
root:~/ # zpool status igorhvr-data
pool: igorhvr-data
state: UNAVAIL
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: none requested
NAME STATE READ WRITE CKSUM
igorhvr-data UNAVAIL 0 0
0 insufficient replicas
mirror-0 DEGRADED 0 0 0
disk/by-id/dm-name-igorhvr-data-0 ONLINE 0 0 0
disk/by-id/dm-name-igorhvr-data-1 OFFLINE 0 0 0
missing-1 ONLINE 0 0 0
cache
mapper/cache UNAVAIL 0 0
0 cannot open
errors: No known data errors
root:~/ # ls /igorhvr-data
aasylum dt fileList.txt floating-asylum ildata u wqasylum www.iasylum.net
Still, despite the unavailable state, everything seems to be
working fine. In particular zfs send works well, so I was able to zfs
send my data to a sane place...
Regards,
--
igorhvr
On Sat, Jun 23, 2012 at 2:04 PM, Igor Hjelmstrom Vinhas Ribeiro
Hi!
I lost a log device and a cache device of an (exported, offline) zpool version 23.
root:~/ # zpool import
pool: igorhvr-data
id: 269256866566772131
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://www.sun.com/msg/ZFS-8000-6X
igorhvr-data UNAVAIL missing device
mirror-0 ONLINE
disk/by-id/dm-name-igorhvr-data-0 ONLINE
disk/by-id/dm-name-igorhvr-data-1 ONLINE
Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.
root:~/ # zpool import -f igorhvr-data
cannot import 'igorhvr-data': one or more devices is currently unavailable
Destroy and re-create the pool from
a backup source.
zpool import -F fails with the same error message. Both data devices are ok and where not touched (except for the failed import attempt above). Supposedly the pool should be recoverable (since it is version 23), based on what I read.
- Retrying the import under a recente FreeBSD version (perhaps a version-28 toolset would have a better chance of importing the pool);
- Some (rather painful) variation of the method described at the bottom of http://forums.freebsd.org/showthread.php?t=18221 to build a (fake) log device that can be used to import the pool;
/*
* If the vdev guid sum doesn't match the uberblock, we have an
* incomplete configuration.
*/
if (mosconfig&& type != SPA_IMPORT_ASSEMBLE&&
rvd->vdev_guid_sum != ub->ub_guid_sum)
return (spa_vdev_err(rvd, VDEV_AUX_BAD_GUID_SUM, ENXIO));
and re-attempt the import (and hope the pool will be loaded in a degraded state, my guess).
Any advice on what road to take and/or other ideas I could try?
Best Regards,
--
igorhvr
--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/
Loading...