FileSystem access freezes when lots of mounted snapshots

Discussion:

Kartweel

2012-03-27 15:59:17 UTC

Hi,

I'm testing a ZFS-FUSE install. Because snapshots are not directly
accessible as a mounted file system, I have a script which takes the
snapshot and then clones it. I can then access the clone.

But what is happening is that after about 1000 snapshots (and
subsequent clones), the zpool has ground to a halt and any attempt to
access it results in the terminal freezing. No processor use reported
by top, and zpool iostats is idle. However if I type "ls /" (my zpool
is called zpool1 and mounted in root), then it freezes. All the zfs
commands appear to work correctly (eg. zfs list takes a few seconds
then lists all the clones).

Any ideas what I can try or where to look?. Could it be something with
my linux setup and too many mounted file systems? Access slowly gets
slower and slower, around the 1000 mark it froze indefinitely.
Rebooting the system results in identical behaviour.

Running Slackware with zfs-fuse 0.7.0

Top reports zfs-fuse memory as Virt 9134m Res 78m Shr 1828
System has 2GB RAM, 1GB Free Memory
Kernel 3.1.8

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Kartweel

2012-03-27 16:08:28 UTC

Permalink

I just found in another recent post...

"this is practically untested (I think I'm the only one who has ever used
it; I remember here are issues with the number of clones mounted and it
being quite slow)"

So I guess it is a known issue with a lot of clones mounted?... Maybe I
should try and mount on demand?... oh but so inconvenient :)

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-27 17:11:03 UTC

Permalink

Did you try zfs on linux ? I saw they now support the .zfs directory, which
might be convenient for you, but I don't know if it will work well with
that many snapshots... It's worth testing I would say...

2012/3/27 Kartweel <rhow-***@public.gmane.org>

> I just found in another recent post...
>
> "this is practically untested (I think I'm the only one who has ever used
> it; I remember here are issues with the number of clones mounted and it
> being quite slow)"
>
> So I guess it is a known issue with a lot of clones mounted?... Maybe I
> should try and mount on demand?... oh but so inconvenient :)
>
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/
>

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Milan Knížek

2012-03-27 17:46:22 UTC

Permalink

V Tue, 27 Mar 2012 19:11:03 +0200
Emmanuel Anne <emmanuel.anne-***@public.gmane.org> napsáno:

> Did you try zfs on linux ? I saw they now support the .zfs directory,
> which might be convenient for you, but I don't know if it will work
> well with that many snapshots... It's worth testing I would say...
>

Even before that, it was possible to mount manually (or via automounter
script) the snapshots to a separate directory without cloning it.

https://github.com/zfsonlinux/zfs/issues/173#issuecomment-1110052

Though, zfs on linux is unsupported on 32bit systems.

Milan
--
http://www.milan-knizek.net/
About linux and photography (Czech only)
O linuxu a fotografování

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

2012-03-27 18:12:27 UTC

Permalink

On 03/27/2012 06:08 PM, Kartweel wrote:
> I just found in another recent post...
>
> "this is practically untested (I think I'm the only one who has ever
> used it; I remember here are issues with the number of clones mounted
> and it being quite slow)"
I was referring to the ctldir patch (branch). This is not in any release

>
> So I guess it is a known issue with a lot of clones mounted?... Maybe
> I should try and mount on demand?... oh but so inconvenient :)

Yes, there are known problems with a lot of clones mounted (but that has
nothing to do with your previous quote). It has to do with the threading
model for the fuselistener (every FS gets a thread). The thread
allocation runs out. IME things just fail after exceeding the max.
number of available threads, but I can imagine other systems
experiencing other difficulties.

Seth

> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Ryan How

2012-03-28 00:24:13 UTC

Permalink

Thanks,

On 28/03/2012 2:12 AM, sgheeren wrote:
> Yes, there are known problems with a lot of clones mounted (but that
> has nothing to do with your previous quote). It has to do with the
> threading model for the fuselistener (every FS gets a thread). The
> thread allocation runs out. IME things just fail after exceeding the
> max. number of available threads, but I can imagine other systems
> experiencing other difficulties. Seth

That seems to explain it. Thread count of zfs-fuse process is 1116,
prolly a bit too high eh :)

I've been using zfs-fuse for a while now and haven't had any issues
(apart from this), and zfs on linux looked quite new (although looks
like it has made a lot of steps recently!), so I might give it a go when
it seems to have stabilized a bit.

For now I'll just drop the dream of keeping lots of snapshots :).
Previously I was hard-linking and copying changed files, but with
de-duplication it didn't use extra space. I thought snapshots would be a
far more efficient method, but it seems it has some limits....

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Kartweel

2012-03-29 00:47:19 UTC

Permalink

Just thinking, maybe put a limit on the number of mounted file systems? (if
it is possible), so it doesn't run out of threads and completely freeze up?
Coz they are mounted on startup, it makes it a bit more nasty. And it is
very easy to run into if you have a few scripts getting excited making
snapshots and mounting them.

On Wednesday, 28 March 2012 08:24:13 UTC+8, Kartweel wrote:
>
> Thanks,
>
>
> On 28/03/2012 2:12 AM, sgheeren wrote:
> > Yes, there are known problems with a lot of clones mounted (but that
> > has nothing to do with your previous quote). It has to do with the
> > threading model for the fuselistener (every FS gets a thread). The
> > thread allocation runs out. IME things just fail after exceeding the
> > max. number of available threads, but I can imagine other systems
> > experiencing other difficulties. Seth
>
> That seems to explain it. Thread count of zfs-fuse process is 1116,
> prolly a bit too high eh :)
>
> I've been using zfs-fuse for a while now and haven't had any issues
> (apart from this), and zfs on linux looked quite new (although looks
> like it has made a lot of steps recently!), so I might give it a go when
> it seems to have stabilized a bit.
>
> For now I'll just drop the dream of keeping lots of snapshots :).
> Previously I was hard-linking and copying changed files, but with
> de-duplication it didn't use extra space. I thought snapshots would be a
> far more efficient method, but it seems it has some limits....
>
>

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-29 20:36:15 UTC

Permalink

I was curious to see this bug in action, so I tried to reproduce it.
Well it took forever to create the snapshots, and then to clone them. I did
it on a ramdisk to speed things up, but the clone operation creates a new
filesystem and it isn't immediate.
Oh well, I reached the end finally, wondering how you could create and
manage so many snapshots at the same time.
Well it works for me.
I mean, when I reach 1000, the daemon displays "Warning: filesystem limit
(1000) reached, unmounting.." which is probably lost for you since you run
it from the init.d script.
Anyway I had created 1002 clones, only clones 1 through 999 are created,
after this they are cleanly unmounted and the dir remains empty.

Normal file operations / zfs / zpool commands still work.

By the way this limit is from a define (MAX_FILESYSTEMS), so it can be
changed to 10000 if you like, it should work until 32767 because it doesn't
create a new thread/filesystem finally, it just creates a new socket (for
fuse).
Of course this isn't from 0.7.0, it's from my git version, but normally
there shouldn't be any difference in 0.7.0 on this point.

2012/3/29 Kartweel <rhow-***@public.gmane.org>

> Just thinking, maybe put a limit on the number of mounted file systems?
> (if it is possible), so it doesn't run out of threads and completely freeze
> up? Coz they are mounted on startup, it makes it a bit more nasty. And it
> is very easy to run into if you have a few scripts getting excited making
> snapshots and mounting them.
>
>
> On Wednesday, 28 March 2012 08:24:13 UTC+8, Kartweel wrote:
>>
>> Thanks,
>>
>>
>> On 28/03/2012 2:12 AM, sgheeren wrote:
>> > Yes, there are known problems with a lot of clones mounted (but that
>> > has nothing to do with your previous quote). It has to do with the
>> > threading model for the fuselistener (every FS gets a thread). The
>> > thread allocation runs out. IME things just fail after exceeding the
>> > max. number of available threads, but I can imagine other systems
>> > experiencing other difficulties. Seth
>>
>> That seems to explain it. Thread count of zfs-fuse process is 1116,
>> prolly a bit too high eh :)
>>
>> I've been using zfs-fuse for a while now and haven't had any issues
>> (apart from this), and zfs on linux looked quite new (although looks
>> like it has made a lot of steps recently!), so I might give it a go when
>> it seems to have stabilized a bit.
>>
>> For now I'll just drop the dream of keeping lots of snapshots :).
>> Previously I was hard-linking and copying changed files, but with
>> de-duplication it didn't use extra space. I thought snapshots would be a
>> far more efficient method, but it seems it has some limits....
>>
>> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/
>

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-29 21:50:50 UTC

Permalink

I just added 2 commits to my git repository to have much faster mounts
(this one is safe, it just removes a very old sync() call and should be
100% safe to use), and much faster unmounts (this one seems to work, which
is surprising because I remember we had added these sync() calls to prevent
zpool export to fail when having a tree of subvolumes. Well I tried to make
it fail, and it worked all the time, and the sync calls are just replaced
by an empty loop when needed !). Anyway it makes handling 1000 clones much
much faster and much more reasonable !

2012/3/29 Emmanuel Anne <emmanuel.anne-***@public.gmane.org>

> I was curious to see this bug in action, so I tried to reproduce it.
> Well it took forever to create the snapshots, and then to clone them. I
> did it on a ramdisk to speed things up, but the clone operation creates a
> new filesystem and it isn't immediate.
> Oh well, I reached the end finally, wondering how you could create and
> manage so many snapshots at the same time.
> Well it works for me.
> I mean, when I reach 1000, the daemon displays "Warning: filesystem limit
> (1000) reached, unmounting.." which is probably lost for you since you run
> it from the init.d script.
> Anyway I had created 1002 clones, only clones 1 through 999 are created,
> after this they are cleanly unmounted and the dir remains empty.
>
> Normal file operations / zfs / zpool commands still work.
>
> By the way this limit is from a define (MAX_FILESYSTEMS), so it can be
> changed to 10000 if you like, it should work until 32767 because it doesn't
> create a new thread/filesystem finally, it just creates a new socket (for
> fuse).
> Of course this isn't from 0.7.0, it's from my git version, but normally
> there shouldn't be any difference in 0.7.0 on this point.
>
>
> 2012/3/29 Kartweel <rhow-***@public.gmane.org>
>
>> Just thinking, maybe put a limit on the number of mounted file systems?
>> (if it is possible), so it doesn't run out of threads and completely freeze
>> up? Coz they are mounted on startup, it makes it a bit more nasty. And it
>> is very easy to run into if you have a few scripts getting excited making
>> snapshots and mounting them.
>>
>>
>> On Wednesday, 28 March 2012 08:24:13 UTC+8, Kartweel wrote:
>>>
>>> Thanks,
>>>
>>>
>>> On 28/03/2012 2:12 AM, sgheeren wrote:
>>> > Yes, there are known problems with a lot of clones mounted (but that
>>> > has nothing to do with your previous quote). It has to do with the
>>> > threading model for the fuselistener (every FS gets a thread). The
>>> > thread allocation runs out. IME things just fail after exceeding the
>>> > max. number of available threads, but I can imagine other systems
>>> > experiencing other difficulties. Seth
>>>
>>> That seems to explain it. Thread count of zfs-fuse process is 1116,
>>> prolly a bit too high eh :)
>>>
>>> I've been using zfs-fuse for a while now and haven't had any issues
>>> (apart from this), and zfs on linux looked quite new (although looks
>>> like it has made a lot of steps recently!), so I might give it a go when
>>> it seems to have stabilized a bit.
>>>
>>> For now I'll just drop the dream of keeping lots of snapshots :).
>>> Previously I was hard-linking and copying changed files, but with
>>> de-duplication it didn't use extra space. I thought snapshots would be a
>>> far more efficient method, but it seems it has some limits....
>>>
>>> --
>> To post to this group, send email to zfs-fuse-/***@public.gmane.org
>> To visit our Web site, click on http://zfs-fuse.net/
>>
>
>
>
> --
> my zfs-fuse git repository :
> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
>

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Ryan How

2012-03-30 01:20:46 UTC

Permalink

The snapshots are created from a backup script. But I wrote a test just
to make sure my backup script would work and rotate the snapshots
properly. At first it only took half a second for each snapshot / clone
and then got longer and longer as it got over 200. I didn't notice any
output from the zfs-fuse daemon, but I didn't look :). I'll take a look
at your git version as soon as I get a chance and run my test again to
see if I get better results. I'll keep an eye on thread counts and open
file descriptors, etc...

Thanks!

On 30/03/2012 5:50 AM, Emmanuel Anne wrote:
> I just added 2 commits to my git repository to have much faster mounts
> (this one is safe, it just removes a very old sync() call and should
> be 100% safe to use), and much faster unmounts (this one seems to
> work, which is surprising because I remember we had added these sync()
> calls to prevent zpool export to fail when having a tree of
> subvolumes. Well I tried to make it fail, and it worked all the time,
> and the sync calls are just replaced by an empty loop when needed !).
> Anyway it makes handling 1000 clones much much faster and much more
> reasonable !
>
> 2012/3/29 Emmanuel Anne <emmanuel.anne-***@public.gmane.org
> <mailto:emmanuel.anne-***@public.gmane.org>>
>
> I was curious to see this bug in action, so I tried to reproduce it.
> Well it took forever to create the snapshots, and then to clone
> them. I did it on a ramdisk to speed things up, but the clone
> operation creates a new filesystem and it isn't immediate.
> Oh well, I reached the end finally, wondering how you could create
> and manage so many snapshots at the same time.
> Well it works for me.
> I mean, when I reach 1000, the daemon displays "Warning:
> filesystem limit (1000) reached, unmounting.." which is probably
> lost for you since you run it from the init.d script.
> Anyway I had created 1002 clones, only clones 1 through 999 are
> created, after this they are cleanly unmounted and the dir remains
> empty.
>
> Normal file operations / zfs / zpool commands still work.
>
> By the way this limit is from a define (MAX_FILESYSTEMS), so it
> can be changed to 10000 if you like, it should work until 32767
> because it doesn't create a new thread/filesystem finally, it just
> creates a new socket (for fuse).
> Of course this isn't from 0.7.0, it's from my git version, but
> normally there shouldn't be any difference in 0.7.0 on this point.
>
>
> 2012/3/29 Kartweel <rhow-***@public.gmane.org <mailto:rhow-***@public.gmane.org>>
>
> Just thinking, maybe put a limit on the number of mounted file
> systems? (if it is possible), so it doesn't run out of threads
> and completely freeze up? Coz they are mounted on startup, it
> makes it a bit more nasty. And it is very easy to run into if
> you have a few scripts getting excited making snapshots and
> mounting them.
>
>
> On Wednesday, 28 March 2012 08:24:13 UTC+8, Kartweel wrote:
>
> Thanks,
>
>
> On 28/03/2012 2:12 AM, sgheeren wrote:
> > Yes, there are known problems with a lot of clones
> mounted (but that
> > has nothing to do with your previous quote). It has to
> do with the
> > threading model for the fuselistener (every FS gets a
> thread). The
> > thread allocation runs out. IME things just fail after
> exceeding the
> > max. number of available threads, but I can imagine
> other systems
> > experiencing other difficulties. Seth
>
> That seems to explain it. Thread count of zfs-fuse process
> is 1116,
> prolly a bit too high eh :)
>
> I've been using zfs-fuse for a while now and haven't had
> any issues
> (apart from this), and zfs on linux looked quite new
> (although looks
> like it has made a lot of steps recently!), so I might
> give it a go when
> it seems to have stabilized a bit.
>
> For now I'll just drop the dream of keeping lots of
> snapshots :).
> Previously I was hard-linking and copying changed files,
> but with
> de-duplication it didn't use extra space. I thought
> snapshots would be a
> far more efficient method, but it seems it has some limits....
>
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> <mailto:zfs-fuse-/***@public.gmane.org>
> To visit our Web site, click on http://zfs-fuse.net/
>
>
>
>
> --
> my zfs-fuse git repository :
> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
>
>
>
>
> --
> my zfs-fuse git repository :
> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-30 06:39:30 UTC

Permalink

If you really want to experiment with more than 1000 filesystems (which
will definetely ruin the output of df and mount !), cbange the
MAX_FILESYSTEMS define in zfs-fuse/fuse_listener.c and let me know if it
works.
You'll see my code is now incredibly faster handling all these filesystems,
just hoping it won't create any side effect !

2012/3/30 Ryan How <rhow-***@public.gmane.org>

> The snapshots are created from a backup script. But I wrote a test just
> to make sure my backup script would work and rotate the snapshots properly.
> At first it only took half a second for each snapshot / clone and then got
> longer and longer as it got over 200. I didn't notice any output from the
> zfs-fuse daemon, but I didn't look :). I'll take a look at your git version
> as soon as I get a chance and run my test again to see if I get better
> results. I'll keep an eye on thread counts and open file descriptors, etc...
>
> Thanks!
>
>
> On 30/03/2012 5:50 AM, Emmanuel Anne wrote:
>
> I just added 2 commits to my git repository to have much faster mounts
> (this one is safe, it just removes a very old sync() call and should be
> 100% safe to use), and much faster unmounts (this one seems to work, which
> is surprising because I remember we had added these sync() calls to prevent
> zpool export to fail when having a tree of subvolumes. Well I tried to make
> it fail, and it worked all the time, and the sync calls are just replaced
> by an empty loop when needed !). Anyway it makes handling 1000 clones much
> much faster and much more reasonable !
>
> 2012/3/29 Emmanuel Anne <emmanuel.anne-***@public.gmane.org>
>
>> I was curious to see this bug in action, so I tried to reproduce it.
>> Well it took forever to create the snapshots, and then to clone them. I
>> did it on a ramdisk to speed things up, but the clone operation creates a
>> new filesystem and it isn't immediate.
>> Oh well, I reached the end finally, wondering how you could create and
>> manage so many snapshots at the same time.
>> Well it works for me.
>> I mean, when I reach 1000, the daemon displays "Warning: filesystem limit
>> (1000) reached, unmounting.." which is probably lost for you since you run
>> it from the init.d script.
>> Anyway I had created 1002 clones, only clones 1 through 999 are created,
>> after this they are cleanly unmounted and the dir remains empty.
>>
>> Normal file operations / zfs / zpool commands still work.
>>
>> By the way this limit is from a define (MAX_FILESYSTEMS), so it can be
>> changed to 10000 if you like, it should work until 32767 because it doesn't
>> create a new thread/filesystem finally, it just creates a new socket (for
>> fuse).
>> Of course this isn't from 0.7.0, it's from my git version, but normally
>> there shouldn't be any difference in 0.7.0 on this point.
>>
>>
>> 2012/3/29 Kartweel <rhow-***@public.gmane.org>
>>
>>> Just thinking, maybe put a limit on the number of mounted file systems?
>>> (if it is possible), so it doesn't run out of threads and completely freeze
>>> up? Coz they are mounted on startup, it makes it a bit more nasty. And it
>>> is very easy to run into if you have a few scripts getting excited making
>>> snapshots and mounting them.
>>>
>>>
>>> On Wednesday, 28 March 2012 08:24:13 UTC+8, Kartweel wrote:
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> On 28/03/2012 2:12 AM, sgheeren wrote:
>>>> > Yes, there are known problems with a lot of clones mounted (but that
>>>> > has nothing to do with your previous quote). It has to do with the
>>>> > threading model for the fuselistener (every FS gets a thread). The
>>>> > thread allocation runs out. IME things just fail after exceeding the
>>>> > max. number of available threads, but I can imagine other systems
>>>> > experiencing other difficulties. Seth
>>>>
>>>> That seems to explain it. Thread count of zfs-fuse process is 1116,
>>>> prolly a bit too high eh :)
>>>>
>>>> I've been using zfs-fuse for a while now and haven't had any issues
>>>> (apart from this), and zfs on linux looked quite new (although looks
>>>> like it has made a lot of steps recently!), so I might give it a go
>>>> when
>>>> it seems to have stabilized a bit.
>>>>
>>>> For now I'll just drop the dream of keeping lots of snapshots :).
>>>> Previously I was hard-linking and copying changed files, but with
>>>> de-duplication it didn't use extra space. I thought snapshots would be
>>>> a
>>>> far more efficient method, but it seems it has some limits....
>>>>
>>> --
>>> To post to this group, send email to zfs-fuse-/***@public.gmane.org
>>> To visit our Web site, click on http://zfs-fuse.net/
>>>
>>
>>
>>
>> --
>> my zfs-fuse git repository :
>> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
>>
>
>
>
> --
> my zfs-fuse git repository :
> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/
>
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/
>

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

2012-03-30 06:43:20 UTC

Permalink

On 03/30/2012 08:39 AM, Emmanuel Anne wrote:
> just hoping it won't create any side effect
A.k.a. have backups :)

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-30 10:09:38 UTC

Permalink

Yeah well I don't expect anything that dangerous, maybe some failed export
or mount operation in some very specific conditions, but I couldn't find
any so far. If it happens, just do the operation manually and report it
(unlikely !).

Ok, I finally committed a fix for the problem I had with gcc-4.6 because
it's still there with gcc-4.6.3. It's specific to gcc optimizations, can be
easily reproduced with scons debug=1 or debug=0. Just run
./zfs-fuse/zfs-fuse --pidfile /var/run/zfs-fuse.pid and it won't create its
pid file. Actually the fix in your branch didn't work for me, it's not a
stale pointer, the whole stack seems to be affected, it can't even read
"--pidfile" here.

The workaround is stupid and easy, but it's the way of doing it with this
kind of bugs : if it detects gcc-4.6, it just creates an array locally (so
it will be on the stack) and copy data to it so that it won't be discarded
by the optimizer. End of the problem, with this everything is back to
normal. If I am very motivated, I'll try to extract this to a very small
piece of code to send to gcc, but motivation is low currently !

2012/3/30 sgheeren <sgheeren-***@public.gmane.org>

> On 03/30/2012 08:39 AM, Emmanuel Anne wrote:
> > just hoping it won't create any side effect
> A.k.a. have backups :)
>
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

2012-03-30 18:24:37 UTC

Permalink

On 03/30/2012 12:09 PM, Emmanuel Anne wrote:
> Actually the fix in your branch
Does Ryan have a branch? Are you referring to
7b2127a04471412b435be8ecc542971ea2773183
<http://gitweb.zfs-fuse.net/?p=official;a=commit;h=7b2127a04471412b435be8ecc542971ea2773183>
?

Seth

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-30 19:23:27 UTC

Permalink

Yes *your* branch, and this commit. Does absolutely nothing for me, but if
it works for you that's nice.
I wonder if the change observed is because of a cpu difference or because
of a difference in the zfsrc file ? Anyway...

2012/3/30 sgheeren <sgheeren-***@public.gmane.org>

> **
> On 03/30/2012 12:09 PM, Emmanuel Anne wrote:
>
> Actually the fix in your branch
>
> Does Ryan have a branch? Are you referring to
> 7b2127a04471412b435be8ecc542971ea2773183<http://gitweb.zfs-fuse.net/?p=official;a=commit;h=7b2127a04471412b435be8ecc542971ea2773183>?
>
> Seth
>
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

2012-03-30 21:43:25 UTC

Permalink

On 03/30/2012 09:23 PM, Emmanuel Anne wrote:
> Yes /your/ branch, and this commit. Does absolutely nothing for me,
> but if it works for you that's nice.
That fixed undefined behaviour. So you absolutely want that fix. Period.

> I wonder if the change observed is because of a cpu difference or
> because of a difference in the zfsrc file ? Anyway...

I wouldn't be surprised if you hit _another_ UB bug there, but in that
case, perhaps run it under valgrind to see where the pain comes from.
Undefined behaviour is just that, and the fact that the behaviour
changes with -On only confirms the fact that it is _undefined_. When I
get the chance, I'll see whether I can track the other error like you
decribed.

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-30 22:35:34 UTC

Permalink

2012/3/30 sgheeren <sgheeren-***@public.gmane.org>

> **
> On 03/30/2012 09:23 PM, Emmanuel Anne wrote:
>
> Yes *your* branch, and this commit. Does absolutely nothing for me, but
> if it works for you that's nice.
>
> That fixed undefined behaviour. So you absolutely want that fix. Period.
>

cf_pidfile = optarg;
optarg = argv[n] where n is found by getopt_long.
argv is never deleted
so tell me where the undefined behavior is... ?

>
> I wonder if the change observed is because of a cpu difference or because
> of a difference in the zfsrc file ? Anyway...
>
>
>
> I wouldn't be surprised if you hit _another_ UB bug there, but in that
> case, perhaps run it under valgrind to see where the pain comes from.
> Undefined behaviour is just that, and the fact that the behaviour changes
> with -On only confirms the fact that it is _undefined_. When I get the
> chance, I'll see whether I can track the other error like you decribed.
>

Well, I'll just wait for your findings then !

>
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-30 22:37:38 UTC

Permalink

ps : normally in a case lilke this you are supposed to take the assembler
output in gdb and find out why it behaves so badly with optimizations
enabled. But well, your idea seems more... interesting, so I'll just wait
for it !

2012/3/31 Emmanuel Anne <emmanuel.anne-***@public.gmane.org>

> 2012/3/30 sgheeren <sgheeren-***@public.gmane.org>
>
>> **
>> On 03/30/2012 09:23 PM, Emmanuel Anne wrote:
>>
>> Yes *your* branch, and this commit. Does absolutely nothing for me, but
>> if it works for you that's nice.
>>
>> That fixed undefined behaviour. So you absolutely want that fix. Period.
>>
>
> cf_pidfile = optarg;
> optarg = argv[n] where n is found by getopt_long.
> argv is never deleted
> so tell me where the undefined behavior is... ?
>
>>
>> I wonder if the change observed is because of a cpu difference or because
>> of a difference in the zfsrc file ? Anyway...
>>
>>
>>
>> I wouldn't be surprised if you hit _another_ UB bug there, but in that
>> case, perhaps run it under valgrind to see where the pain comes from.
>> Undefined behaviour is just that, and the fact that the behaviour changes
>> with -On only confirms the fact that it is _undefined_. When I get the
>> chance, I'll see whether I can track the other error like you decribed.
>>
>
> Well, I'll just wait for your findings then !
>
>>
>> --
>> To post to this group, send email to zfs-fuse-/***@public.gmane.org
>> To visit our Web site, click on http://zfs-fuse.net/
>
>
>
>
> --
> my zfs-fuse git repository :
> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
>

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

sgheeren

2012-03-31 12:15:35 UTC

Permalink

On 03/31/2012 12:37 AM, Emmanuel Anne wrote:
> ps : normally in a case lilke this you are supposed to take the
> assembler output in gdb and find out why it behaves so badly with
> optimizations enabled. But well, your idea seems more... interesting,
> so I'll just wait for it !
Translation: I'm not motivated to find out what's happening, so I'll
just assume it is an optimizer bug and pull a hack that hides the
symptoms _on my system_.

It's ok :)

It's been a long time since I devoted some time to zfs-fuse though, so
I'll have to find a bit of time somewhere out of nowhere, I'm afraid.

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Emmanuel Anne

2012-03-31 12:42:58 UTC

Permalink

Yeah that's exactly it, I just found something to make it work and kept it.
But after noticing you can't seem to be able to reproduce it, and too bad
nobody else sent some mails about that, I made some more investigation, and
I wonder if it's not related to the amd cpu bug found by this guy for
dragonfly bsd :
http://it.slashdot.org/story/12/03/06/0136243/amd-confirms-cpu-bug-found-by-dragonfly-bsds-matt-dillon
(yeah I happen to be using and amd x4 cpu and I am very happy with it
except on this very particular topic).

It would perfectly fit the picture in that my workaround just moves the
stack pointer and it fixes it, and that not everybody can reproduce it. The
only different part is that I can reproduce it at will, 100% of the time
here. I took a quick look at the asm output, and it could also fit the
picture, callq used, but I am not familliar with amd64 asm, and I was not
in mood to dig any further.
For now I'd say the workaround is enough, but I sent a mail to this person,
just in case.

2012/3/31 sgheeren <sgheeren-***@public.gmane.org>

> On 03/31/2012 12:37 AM, Emmanuel Anne wrote:
> > ps : normally in a case lilke this you are supposed to take the
> > assembler output in gdb and find out why it behaves so badly with
> > optimizations enabled. But well, your idea seems more... interesting,
> > so I'll just wait for it !
> Translation: I'm not motivated to find out what's happening, so I'll
> just assume it is an optimizer bug and pull a hack that hides the
> symptoms _on my system_.
>
> It's ok :)
>
> It's been a long time since I devoted some time to zfs-fuse though, so
> I'll have to find a bit of time somewhere out of nowhere, I'm afraid.
>
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/
>

--
my zfs-fuse git repository :
http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Ryan How

2012-03-30 07:00:17 UTC

Permalink

df is really already ruined as soon as you use zfs :), I just filter out
any zfs entries from the output and it is ok :)
then use zfs list to complete the picture (which is easy to filter out
the mounted clones).

zfs is exciting :).

On 30/03/2012 2:39 PM, Emmanuel Anne wrote:
> If you really want to experiment with more than 1000 filesystems
> (which will definetely ruin the output of df and mount !), cbange the
> MAX_FILESYSTEMS define in zfs-fuse/fuse_listener.c and let me know if
> it works.
> You'll see my code is now incredibly faster handling all these
> filesystems, just hoping it won't create any side effect !
>
> 2012/3/30 Ryan How <rhow-***@public.gmane.org <mailto:rhow-***@public.gmane.org>>
>
> The snapshots are created from a backup script. But I wrote a test
> just to make sure my backup script would work and rotate the
> snapshots properly. At first it only took half a second for each
> snapshot / clone and then got longer and longer as it got over
> 200. I didn't notice any output from the zfs-fuse daemon, but I
> didn't look :). I'll take a look at your git version as soon as I
> get a chance and run my test again to see if I get better results.
> I'll keep an eye on thread counts and open file descriptors, etc...
>
> Thanks!
>
>
> On 30/03/2012 5:50 AM, Emmanuel Anne wrote:
>> I just added 2 commits to my git repository to have much faster
>> mounts (this one is safe, it just removes a very old sync() call
>> and should be 100% safe to use), and much faster unmounts (this
>> one seems to work, which is surprising because I remember we had
>> added these sync() calls to prevent zpool export to fail when
>> having a tree of subvolumes. Well I tried to make it fail, and it
>> worked all the time, and the sync calls are just replaced by an
>> empty loop when needed !). Anyway it makes handling 1000 clones
>> much much faster and much more reasonable !
>>
>> 2012/3/29 Emmanuel Anne <emmanuel.anne-***@public.gmane.org
>> <mailto:emmanuel.anne-***@public.gmane.org>>
>>
>> I was curious to see this bug in action, so I tried to
>> reproduce it.
>> Well it took forever to create the snapshots, and then to
>> clone them. I did it on a ramdisk to speed things up, but the
>> clone operation creates a new filesystem and it isn't immediate.
>> Oh well, I reached the end finally, wondering how you could
>> create and manage so many snapshots at the same time.
>> Well it works for me.
>> I mean, when I reach 1000, the daemon displays "Warning:
>> filesystem limit (1000) reached, unmounting.." which is
>> probably lost for you since you run it from the init.d script.
>> Anyway I had created 1002 clones, only clones 1 through 999
>> are created, after this they are cleanly unmounted and the
>> dir remains empty.
>>
>> Normal file operations / zfs / zpool commands still work.
>>
>> By the way this limit is from a define (MAX_FILESYSTEMS), so
>> it can be changed to 10000 if you like, it should work until
>> 32767 because it doesn't create a new thread/filesystem
>> finally, it just creates a new socket (for fuse).
>> Of course this isn't from 0.7.0, it's from my git version,
>> but normally there shouldn't be any difference in 0.7.0 on
>> this point.
>>
>>
>> 2012/3/29 Kartweel <rhow-***@public.gmane.org
>> <mailto:rhow-***@public.gmane.org>>
>>
>> Just thinking, maybe put a limit on the number of mounted
>> file systems? (if it is possible), so it doesn't run out
>> of threads and completely freeze up? Coz they are mounted
>> on startup, it makes it a bit more nasty. And it is very
>> easy to run into if you have a few scripts getting
>> excited making snapshots and mounting them.
>>
>>
>> On Wednesday, 28 March 2012 08:24:13 UTC+8, Kartweel wrote:
>>
>> Thanks,
>>
>>
>> On 28/03/2012 2:12 AM, sgheeren wrote:
>> > Yes, there are known problems with a lot of clones
>> mounted (but that
>> > has nothing to do with your previous quote). It has
>> to do with the
>> > threading model for the fuselistener (every FS gets
>> a thread). The
>> > thread allocation runs out. IME things just fail
>> after exceeding the
>> > max. number of available threads, but I can imagine
>> other systems
>> > experiencing other difficulties. Seth
>>
>> That seems to explain it. Thread count of zfs-fuse
>> process is 1116,
>> prolly a bit too high eh :)
>>
>> I've been using zfs-fuse for a while now and haven't
>> had any issues
>> (apart from this), and zfs on linux looked quite new
>> (although looks
>> like it has made a lot of steps recently!), so I
>> might give it a go when
>> it seems to have stabilized a bit.
>>
>> For now I'll just drop the dream of keeping lots of
>> snapshots :).
>> Previously I was hard-linking and copying changed
>> files, but with
>> de-duplication it didn't use extra space. I thought
>> snapshots would be a
>> far more efficient method, but it seems it has some
>> limits....
>>
>> --
>> To post to this group, send email to
>> zfs-fuse-/***@public.gmane.org <mailto:zfs-fuse-/***@public.gmane.org>
>> To visit our Web site, click on http://zfs-fuse.net/
>>
>>
>>
>>
>> --
>> my zfs-fuse git repository :
>> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
>>
>>
>>
>>
>> --
>> my zfs-fuse git repository :
>> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
>> --
>> To post to this group, send email to zfs-fuse-/***@public.gmane.org
>> <mailto:zfs-fuse-/***@public.gmane.org>
>> To visit our Web site, click on http://zfs-fuse.net/
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> <mailto:zfs-fuse-/***@public.gmane.org>
> To visit our Web site, click on http://zfs-fuse.net/
>
>
>
>
> --
> my zfs-fuse git repository :
> http://rainemu.swishparty.co.uk/cgi-bin/gitweb.cgi?p=zfs;a=summary
> --
> To post to this group, send email to zfs-fuse-/***@public.gmane.org
> To visit our Web site, click on http://zfs-fuse.net/

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/

Manuel Amador

2012-03-29 22:18:08 UTC

Permalink

Sounds like your ZFS-FUSE process has ran out of file descriptors.

man ulimit.

On Tuesday, March 27, 2012 08:59:17 Kartweel wrote:
> Hi,
>
> I'm testing a ZFS-FUSE install. Because snapshots are not directly
> accessible as a mounted file system, I have a script which takes the
> snapshot and then clones it. I can then access the clone.
>
> But what is happening is that after about 1000 snapshots (and
> subsequent clones), the zpool has ground to a halt and any attempt to
> access it results in the terminal freezing. No processor use reported
> by top, and zpool iostats is idle. However if I type "ls /" (my zpool
> is called zpool1 and mounted in root), then it freezes. All the zfs
> commands appear to work correctly (eg. zfs list takes a few seconds
> then lists all the clones).
>
> Any ideas what I can try or where to look?. Could it be something with
> my linux setup and too many mounted file systems? Access slowly gets
> slower and slower, around the 1000 mark it froze indefinitely.
> Rebooting the system results in identical behaviour.
>
> Running Slackware with zfs-fuse 0.7.0
>
> Top reports zfs-fuse memory as Virt 9134m Res 78m Shr 1828
> System has 2GB RAM, 1GB Free Memory
> Kernel 3.1.8
--
Manuel Amador (Rudd-O)
http://rudd-o.com/

--
To post to this group, send email to zfs-fuse-/***@public.gmane.org
To visit our Web site, click on http://zfs-fuse.net/