Metadata files ins and outs
I’d like to share with you about some ins and outs of the metadata files, metalogger server and the metarestore tool. This fundamental knowledge will help you better understand the system, its behaviour and let you know how to recover from any situation of emergency.
Information about the filesystem objects (files, directories, links, etc.) is kept in a special metadata file, called metadata.mfs. When the master server starts up it first reads this entire file into RAM and renames it to metadata.mfs.back. When the master server exits cleanly it writes out the entire metadata from the RAM back to this file and renames it to metadata.mfs.
The metadata file is read entirely into memory on the master server startup for performance. Under normal operation there may be a large number of requests to the master server to read and write the metadata for every file that is being read from and written to. The overhead of the master server to need to make a disk IO request to look up the metadata to service every request from the chunk servers and mfs mount clients would become a performance bottle neck.
The master server process will regularly (by default every hour) flush the working metadata in RAM back to the metadata.mfs.back binary file. The file has a version number which corresponds to the last operation made in the file system.
Additionally, the master server continuously saves changes to the filesystem as they occur into text file changelog.0.mfs. The change log file is rotated every hour. For example, changelog file 0 becomes file 1, file 1 becomes file 2 and so on, numbers are changed from N to N+1 (similar to if you were to manually perform a mv changelog_N.mfs changelog_N+1.mfs).
The number of change log files will increase (e.g. a new one every hour), up to a configurable maximum. The default number of changelog files is 50. This can be configured with the BACK_LOGS variable in the mfsmaster.cfg file (see man mfsmaster.cfg).
Every line in a changelog file represents one change or operation to the filesystem. The line records have the following format:
change_number: timestamp|operation
These changelog files are used to recover the metadata.mfs.back file in the event the master server process is killed or exits uncleanly, such as when the system crashes or due to a power failure. The changes following the last full metadata save from RAM to the metadata.mfs.back file are replayed into the main metadata.mfs file using the mfsmetarestore utility. The mfsmaster observes a crashed state with the absence of the metadata.mfs file and the presense of the metadata.mfs.back file, and informs you a recover operation is required.
Changing metadata filename from metadata.mfs to metadata.mfs.back helps differentiate a clean exit from an abnormal stop of the process. When the master server is started up, if it finds metadata.mfs it means the file is a correct one – either saved after clean exit or properly restored by the metarestore tool. When the master server upon startup finds metadata.mfs.back it won’t start at all because it means the master process had not been stopped in a suitable way. Never start the existing system with an empty metadata.mfs file – your chunks would get erased!
The metalogger server is an optional process you can configure and have started up in one or more places in your MooseFS deployment. The purpose of the metalogger service is to allow for the metadata that normally lives on the master server to accumulate in a second location, or as many locations for which the metalogger server process is running, in the event of a catastrophic failure of the machine that runs the master server. The metadata collected by a metalogger server may be later be used by a master server process.
The metalogger will download the metadata.mfs.back file on a regular basis (by default every 24 hours) from the master server. The downloaded file is saved with the file name metadata_ml.mfs.back. Similarly, it also continuously receives the current changes from the master server and writes them into its own text change log named changelog_ml.0.mfs. Where these files are also rotated every hour up to the configured maximum number of change log files (see man mfsmetalogger.cfg).
If the master server process was stopped abnormally you will need to use the mfsmetarestore utility to repair the metadata. Mfsmetarestore reads the binary file metadata.mfs.back and tries to merge all information from changelog files (changelog.N.mfs) where the change number is higher than a version of metadata.mfs.back. If there are no errors during merging, the tool saves the fixed file as metadata.mfs with the version number of the last change. (At which point the master server may be started).
In typical cases it is sufficient to merge changes only from changelog.0.mfs, as the master server dumps metadata.mfs.back every hour and changelog.0.mfs will contain changes from the last hour. In some "boundary" cases it is also necessary to use changelog.1.mfs. We keep many more changelog files around for convenience, in case it is ever required to audit what has happened to the file system in recent history (e.g. a day before).
When you pass changelog filenames as parameters into the mfsmetarestore command line tool, it is necessary to pass them in a chronological order. We therefore recommend using mfsmetarestore -a which automatically will find and apply the changelog files in their proper order.
When you encounter outage of the master server which doesn’t incur hard drive failure (like power outage) it is sufficient to run mfsmetarestore -a on the master server and start the master server process again.
When you encounter a more severe problem, such as hardware failure or otherwise loss of the master server, or where for whatever reason it is impossible to start the master server at all you will need to obtain the metadata file from a metalogger machine. This is achieved by running mfsmetarestore -a on a metalogger machine in order to merge the metadata_ml.mfs.back with the changelog_ml.N.mfs files to produce a proper metadata.mfs file. When you have fixed metadata.mfs file you can copy it to the new master machine or start mfsmaster on a metalogger machine which would now take over tasks of the master server.
Note however that all of the chunk servers and mfs mount clients may require their configuration updated to refer to the new IP address or host name, or perhaps you would require the DNS update to have mfsmaster host now refer to the IP address of this new master server machine. If you'd like to avoid this situation you can configure and use CARP, which allows that there exist two machines with the same IP in one LAN. You can get more information at this Mini-HOWTO page: How to prepare a fail proof solution with a redundant master?
It may happen that in the metadata directory you'll find a file mfsdata.mfs.emergency. The file is created only when it was impossible to save the mfsdata.mfs.back file, probably due to some write error. In such a situation you should first make sure there is available space on the disk and make backups of all the metadata files. Later delete mfsdata.mfs.back, change mfsdata.mfs.emergency to mfsdata.mfs.back and run mfsmetarestore -a.
While it is possible to create a script to perform the metadata restore operation, it is recommended that the metadata restoring commands be issued manually to observe any errors that might occur during the fixing operation.
Here is a sample BASH shell script to start the master server. It attempts to first invoke mfsmetarestore -a if there is a metadata.mfs.back file, such as to perform the recovery operation if required, and then start up the master server process.
#!/bin/bash
set -x
PREFIX=/usr/local
if [ -f $PREFIX/var/mfs/metadata.mfs.back -a ! -f $PREFIX/var/mfs/metadata.mfs ];
then
$PREFIX/sbin/mfsmetarestore -a
if [ $? -ne 0 ]; then
echo "FAILED to invoke the mfsmetarestore operation, check logs.";
echo "Unable to start mfsmaster service.";
exit 1;
fi;
fi
$PREFIX/sbin/mfsmaster start
If you have any questions, please share them in the comments below.
2010-05-14 12:30
Michał Borychowski
4 Comments


4 Responses
MooseFS | 2010-07-16
zheng | 2010-07-09
question:
when the master server dumping metadata.mfs.back,how about the performance be affected ?
huizheng | 2010-05-18
sery | 2010-05-17
Add a comment