Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

32bit Integers

7 views
Skip to first unread message

Justin Huff

unread,
Sep 27, 2008, 6:29:25 PM9/27/08
Howdy!

Our Mogile install (some SVN snapshot from ages ago) hit a nasty wall
this afternoon. The DB schema uses unsigned ints for fids, so I figured
we were good until 4billion or so. I also knew we were approaching our
2billionth FID (we create and destroy lots of files)

Things broke badly when we exceeded the range of a 32bit SIGNED int
(2,147,483,647). The auto_increment key was working fine. The problem
was in DevFID::uri_path:
my $nfid = sprintf '%010d', $fidid;

%d is for signed integers. This caused the subsequent URL construction
to use a wrapped fid.

My fix was to change the %d to a %u. Our trackers are running on 32bit
VMs, so sprintf's notion of an integer is 32bit. Thus, we would have
hit this even if we were running a BIGINT schema.

Luckily, we shifted nearly all of our storage writes to S3, so the site
impact was minimal. I also upgraded to 2.20 before I discovered the
core issue.

--Justin
picnik.com

Dieter Plaetinck

unread,
Sep 30, 2008, 11:18:43 AM9/30/08
> I also upgraded to 2.20 before I discovered the core issue.

What do you mean? This bug is still present in 2.20 ?
It would be useful if we could compile a list of all the limits in mogilefs,
or does it just suffice to always look at the column type in mysql?
(i assume if something else then the mysql column type is the limit, this can be classified as a bug? - esp on64b systems?)

Dieter

dormando

unread,
Sep 30, 2008, 11:47:11 AM9/30/08
Hey,

We use high 64-bit numbers like whoa, and haven't ran into this bug... I
can't think off the top of my head why this would affect you, Justin.

Can you give more background into how you're experiencing the bug, and
what your setup looks like?

Thanks,
-Dormando

Paul Bakerâ„¢

unread,
Sep 30, 2008, 11:50:14 AM9/30/08
It probably has to do with the fact that he is running on 32bit hardware, not 64bit.

dormando

unread,
Sep 30, 2008, 11:53:08 AM9/30/08
hah hah.

hah.

eeeeeh.

I'm having an off week folks, sorry!

promise I won't commit anything today!

-Odnamrod

Justin Huff

unread,
Sep 30, 2008, 1:55:51 PM9/30/08
Yup:)
However, I still this this warrants a fix so that the behavior lines up
with column types in all cases.
--Justin

dormando

unread,
Dec 3, 2008, 3:09:57 AM12/3/08
Hey,

Following up on this a bit...
Do you recall the exact symptoms you saw when this bit you?

With apologies to folks at gaiaonline.com, this bug was actually a lot
more stupid than I thought it was (I've been busy for a few months; sorry
:/), from 2 billion to 3 billion mogilefs can actually store the fids with
the wrong paths. Negative numbers and all. Once you get above 3 billion
the fid gets completely broken and stops storing at all.

Swapping %d for %u is a short term fix, but then makes all of the crap you
might've stored between 2b and 3b disappear. Apologies to gaia, but shame
on you for not upgrading to 64-bit a year and a half ago ;)

Anyway, I wasn't really happy with the 4 billion limit under 32-bit, so I
tried to make it go a bit further:

index 9b44d81..2bc1458 100644
--- a/trunk/server/lib/MogileFS/DevFID.pm
+++ b/trunk/server/lib/MogileFS/DevFID.pm
@@ -83,6 +83,13 @@ sub uri_path {
my $fidid = $self->{fidid};

+ my $nfid;
+ my $len = length $fidid;
+ if ($len < 10) {
+ $nfid = '0' x (10 - $len) . $fidid;
+ } else {
+ $nfid = $fidid;
+ }
my ( $b, $mmm, $ttt, $hto ) = ( $nfid =~ m{(\d)(\d{3})(\d{3})(\d{3})}
);

return "/dev$devid/$b/$mmm/$ttt/$nfid.fid";

... patch is untested, but the code is from a small test script.

Switching the sprintf from %d to %u (and ensuring your fid colums in your
DB are all bigint unsigned's!) will allow 32-bit trackers to support fids
up to 4294967296 numerically.

The new code seems to support fidids up to 999,999,999,999,999
... that's about, a quadrillion? Under 64-bit it works as expected, with
18,446,744,073,709,551,616 being the limit.

Note that this isn't total fids stored, but total fids added. Which means
if you add 100 million files, then delete 100 million files, then add 100
million files again, your fidids are in the 200 million.

If this looks good to folks, this seems to be the only place in the code I
can find that would wig out on 2^32. I'll get the patch in after my next
big set of patches :/ Or before, maybe. I should just rebase my branch.

Anyway. Lesson of the day is that you should probably be running 64-bit by
now. It's almost 2010, geez.
-Dormando

Justin Huff

unread,
Dec 3, 2008, 11:28:40 AM12/3/08

> Do you recall the exact symptoms you saw when this bit you?
We saw exactly what you expected. Everything *looks* fine between 2b
and 3b..but breaks after that. After changing the sprintf, I ran a
script that moved files with negative fids to the right place in the dir
tree.

Oh yeah, upgrading our tracker VMs has been on my TODO list for a while:)

--Justin

dormando

unread,
Dec 4, 2008, 4:05:45 AM12/4/08
>> Do you recall the exact symptoms you saw when this bit you?
> We saw exactly what you expected. Everything *looks* fine between 2b
> and 3b..but breaks after that. After changing the sprintf, I ran a
> script that moved files with negative fids to the right place in the dir
> tree.

Thanks - I might've freaked out a bit more if I had that detail. Was under
the impression it just stopped working, sorry all :/

> Oh yeah, upgrading our tracker VMs has been on my TODO list for a while:)

:)

-Dormando

Netlog

unread,
Dec 16, 2008, 8:22:07 AM12/16/08
to mogile
Fsck printlog, has problems displaying them. It only goes to 4
billion.
I changed the sprintf from %d to %Q

An unsigned quad value. (Quads are available only if your system
supports 64-bit integer values _and_ if Perl
has been compiled to support those. Causes a fatal error otherwise.)

I put all the FID's in the database to BIGINT, then I insert a row
into temp_file to bump up the FID value (Since this is the place where
the FID value is set).
I started testing and did an fsck with the following result.

robbie@database219:~$ mogadm fsck clearlog
robbie@database219:~$ mogadm fsck reset
robbie@database219:~$ mogadm fsck status

Running: No
Status: 0 / 1000000000002 (0.00%)
Time: 0s (0 fids/s; 0s remain)
Check Type: Normal (check policy + files)

[stop_time]: 0

robbie@database219:~$ mogadm fsck start
robbie@database219:~$ mogadm fsck printlog
unixtime event fid devid
1229432177 MISS 4294967295 2
1229432177 REPL 4294967295 -

See the FID hit the 4.2 billion limit. I have not found where it gets
it from yet. Time shortage.
I'll check it later.

grtz,

Robbie

dormando

unread,
Jan 2, 2009, 9:14:51 PM1/2/09
to mogile
I pushed the fix for uri generation to trunk..

This one might need a little more work. I don't have a running 32-bit test
instance of mogilefs... but if someone wants to give me access to one to
fiddle with I could try that... I'm not too enthused on fixing this bug
otherwise.

The issue:

- mogadm is treating the fid as a string in that print already.. which is
what you need to do in order to print fids up to 40 bits or so on 32-bit
hardware.
- I did a unit test to confirm this on a 32-bit box of mine.
- so ... the fid's likely getting truncated elsewhere?
- fid gets mulled through the api, through Query.pm, and pulled in via
DBI. Treated as a string so far as I can tell, except when pulled in via
DBI.

... which makes me believe the fid's getting truncated by DBI? Which
probably means none of these fixes will work. I'd be a little surprised,
since the mysql protocol treats everything as a string already. There
might be an option to not squash numerics or similar...

Anyone want to test that out and report back?
-Dormando

Reply all
Reply to author
Forward
0 new messages