Diana the Valkyrie

Diana the Valkyrie's Newsletter - August 2017

August 2017

Lots of sun, lots of rain. In England, you know it's summer because the rain is warm.

New and updated Galleries

Galleries added this month.

The Library

Stories added this month.

The Movie Theatre

Movies added this month.

Newsthumbs

Running fine.

Down on the server farm

I've been doing some major work on the servers that store the older newsthumbs.

The way the older newsthumbs work, is it's a *vast* set of data. Hundreds of millions of files; in fact, about a billion files! When I set it up, I knew it was going to be big, so I designed it accordingly.

These billion files are divided into data sets, So, for example, January 2001 till June 2001 might be one data set. The earlier data sets are around 600gb, because back then I was using 8 80gb drives in the news gatherer. The current ones are around 950 gb, because I'm using a terabyte are the standard data set size. With modern 8tb drives, I could have 8tb data sets, but that would be much more clumsy to handle than the 1tb segments. Currently, there's just over 50 datasets.

When you access the Older Newsthumbs, all of this is invisible to you. I attach each of these data sets to the older newsthumbs server using nfs, and then there's a bit of magic so that, when you access a file, it works out which server it's on and goes and gets it, without you needing to be aware of which data set you're getting it from. To make all this work, about 15 years ago I wrote a database that could handle this many files, and to do it, I couldn't use four-byte integers because there wouldn't be enough of them, and so the first thing I wrote was a thing that could handle five-byte integers, and if you know anything about computers, then you know just how wacky that is. But it's worked well!

Hard drives don't last for ever, so for each 1-terabyte bunch of files, I have them stored on three servers, because that's easier and cheaper than using tape for backup. Why three and not two? Because when the one you're using fails, you really don't want to find out that your only backup doesn't work either. From time to time, I check that the backups (and the customer-facing server) are OK. To do this, I use two methods. 1) if the system can't access some of the files on a server, there's a problem, and 2) the SMART system.

Modern drives check themselves, and if they find a sector that's difficult to read, they swap it for a spare sector that isn't normally used. So a modern hard drive always has all sectors reading perfectly (except when they don't). The SMART system counts the number of sectors that have been reallocated in this way, and I've found, from experience, that if this is under 100, it's probably OK, between 100 and 1000 is deteriorating, and over 1000 means that drive needs to be retired. So I monitor the "reallocated sectors".

My latest checkup revealed that a several drives were failing, in the "over 100" category, and a few were in the "over 1000" category. So, I thought, time for a spring clean!

I started off with a server called "bigbe", so-called because it was the first in my experience of "big servers". It had six 1-tb drives when I set it up; since then it's been upgraded a few times. I needed to replace three of the drives that were looking rough, and at the same time I increased the number of drives in the case to ten, for a total of 25 tb. That's using Fedora version 17 (the current version is 26) but I decided not to try to upgrade it, because that's just hassle for no good reason.

I got bigbe up and running, and then I copied ten 1-tb datasets to it, using rsync (which means I can do the copying with one command, although it takes a few days to run). This is running on my gigabit network, so what used to take me ten days on my 100 mbit network, is now a lot quicker. After the copies had finished, I repeated the copies (in case anything had been missed the first time), and then "reaped" each data set.

"Reaping" is my own invention. What I do is, first I run a program that deletes all files that don't make sense. That gets rid of jpg files that don't have the correct jpg header and so are never going to load, files that arrived in multiple parts but not all the parts arrived, and other such scurf. Then I run another program that calculates an MD5 digest of each file. So each file has associated with it, a 16 byte digest, which will be unique to that file. Once all the digests have been calculated, I output the digests and filenames, and this is written to a big file. That file is then sorted and that makes it easy to see which files are duplicates. So it writes a list of which files are duplicates of which other files. Finally I run a program called "reaper" which deletes the duplicate files and replaces them with a hard link to the file that it's a duplicate of. This sounds like a complicated procedure (it was to write, but to run it is just a single command) and why bother? Because this is usenet, and the same files get posted again and again, and I can store 50% more on the same disk space by doing this.

Bigbe was now sorted out, so I turned my attention to xappe. xappe is a partial reincarnation of giggi, which was one of my original "monster" servers, with 16 hard drives in one huge box. The biggest problem with this idea was that the whole thing was enormously heavy, and working on it entailed getting it from the rack to the workbench, working on it, then moving it back. Xappe was in a normal-sized box, and contained ten drives, which meant that I needed only two sata interface cards, leaving one PCI slot free, into which I could put a gigabit ethernet card. So xappe was easier to handle, and faster in data transfer.

Xappe was in a very poor condition. There were three out of ten failing drives, and for reasons I still don't understand it never did allow rsync to be run (so I had to use an older method called "mirror", which is very inferior becuase rsync automatically preserves the hard links that let me squeeze 1.5 tb into a 1 tb space, whereas mirror doesn't and I have to reap the drive again. So I decided to completely replace xappe, but not in-situ. I set up a new server called "ultra" in which I put nine drives including the five good ones from xappe, three that were salvaged from the servers that used to be at my colocation, and a 2.5 inch system drive (see below).

But Ultra wouldn't install Fedora version 26; I had to put in version 20. Then I upgraded that to version 23, then 24 and finally 26. Which was fairly time consuming, because each step took a while, but it wasn't too bad because I could leave it running while I did other stuff. So that left me with the 32 bit version of Fedora 26, eight data drives and the 20gb system drive.

Then, because these drives had come from the somewhat umpty Xappe, I decided to do the copy and recopy thing. That revealed that A) some of those data drives really needed this because there were missing some files, and B) that four of the drives had file times which were slightly skewed from the drive I was copying from. Which would have meant that rsync would have copied all the files, all over again (unless I told it to ignore file dates). But I have a better way. I wrote a program that reads all the file names and file dates from one drive, and another that reads that file and sets the file dates and times on the other drive. So all the file dates were adjusted by one day (and I have no idea how they came to be skewed). And then I could do the copy and recopy without it copying about 20 million files (did I meantion that each data set is about 20 million files?).

In order to do all this shuffling around of files, I set up a new server, called "brant", on which I installed seven of the 4-tb files salvaged from my colocation (I no longer use a colocation, I have a 100 mbit line running into the Data Shed).

I installed Fedora 26 64-bit on this because it's the latest version and copied nine data sets onto it, followed by a recopy and a reap. Brant now has nine data sets installed on two of those drives, with five of the 4-TB drives spare and ready for action.

Then I started on giggj. That's running Fedora 24, which is fairly recent. That has 11 drives, of which one is failing. But for a reason I cannot fathom, the files are all owned by user 501, whereas on all other servers, they're owned by user 500. So I wrote a little program to correct this, and I can't do more until that's finished running. Then I need to replace one drive.

Bethe was running a version of Fedora so old, it was before it was called Fedora! It was Red Hat version 8 (Yarrow). So I took out the system drive and put in a replacement, on which I installed Fedora 24. Because Fedora 26 needs a 64 bit CPU, and bethe has a 32 bit cpu, and Fedora 24 supports that. And it's recent enough.

I also changed the system drive; there was a time when I decided to use CF cards for the operating system, but it turns out that they are painfully slow, which probably doesn't matter too much for the system files, but matters a lot for the swap space. But I didn't want to use a full-size drive for the system, because that would either take up the space of an entire 3.5 drive, or else I could make one of the data drives also work as a system drive, and that didn't appeal because I want the data drives to be read-only.

I solved this dilemma by using one of the rather small 2.5 inch drives that I used a while ago and which have been sitting in a box ever since. They are only 20gb, but that's plenty big enough for the operating system and the swap space. And they're so small, I can slide them into a small space in the server.

But then Fedora 24 wouldn't recognise the raid system set up under Red Hat 8, and rather than spend ages trying to push a square peg into a round hole, I just reformatted the raid drives and recopied the data set onto bethe.

Still to do as of August 24. Cully needs a new CMOS battery and a replacement drive. Ellsa needs a replacement drive. Atena needs four drives to be replaced. Giggj needs one drive to be replaced.

Also, I found that I can monitor the CPU temperature, and send myself an email if it's getting too hot. I'll add that to my server-monitoring software.

... later ...

I decided to put Cully and Atena into a single box called Donna, leaving out all the duff drives. I replaced the drive in Ellsa and the one in Giggj. And I replaced the motherboard in Penny. Penny had one of my very old motherboards that accepted two 433 mhz Celerons. It wouldn't boot, and I'm not too bothered about such an ancient motherboard (it's about 20 years old now, state of the art at the time, of course). And I also reloaded Bethe, another vewry old box with just a single data set.

All of the drives that were taken out have been retested; some are suitable for reuse, some for the recycle bin.

DtV Family web sites

Here's the full list of DtV family web sites

Back Page

I checked the site statistics that Sandra counts up each night.

At the end of August 2017, there were about 1,600,000 pictures (365 gigabytes), 548 gigabytes of video, 16720 text files (mostly stories) and a total of about 914 gigabytes. There's 488,000,000 pictures altogether in Newsthumbs, increasing at about 2 million per month.

To the Magic Carpet