% -*- latex -*-
% $Id: raid6.latex,v 1.29 2011/12/21 00:13:42 hpa Exp $
% amssymb required for modern notation for the standard fields, like hollow-Z
\documentclass{article}
\usepackage{amssymb}
\usepackage[margin=1.5in]{geometry}
\def\g#1{{\tt \{#1\}}}
\def\GF{{\bf GF}}
\def\A{{\bf A}}
\def\B{{\bf B}}
\def\C{{\bf C}}
\def\P{{\bf P}}
\def\Pstar{{\bf P}^\star}
\def\Q{{\bf Q}}
\def\Qstar{{\bf Q}^\star}
\def\D{{\bf D}}
\def\V{{\bf V}}
\def\X{{\bf X}}
\def\z{{\bf z}}
\def\email#1{$<${\it #1}$>$}
\def\url#1{\tt #1}
\title{The mathematics of RAID-6}
\author{H. Peter Anvin \email{hpa@zytor.com}}
\date{First version 20 January 2004 \\ Last updated 20 December 2011}
\begin{document}
\maketitle
RAID-6 supports losing any two drives. The way this is done is by
computing two syndromes, generally referred $\P$ and $\Q$.
\section{A quick summary of Galois field algebra}
The algebra used for this is the algebra of a Galois field, $\GF(2^8)$.
A smaller or larger field could also be used, however, a smaller field
would limit the number of drives possible, and a larger field would
require extremely large tables.
$\GF(2^8)$ allows for a maximum of 257 drives, 255 ($2^8-1$) of which
can be data drives; the reason for this is shown below.
The {\em representation} of $\GF(2^8)$ is called a {\em cyclic}
representation. Earlier versions of this paper incorrectly stated
that it ``is the same one as used by the Rijndael (AES)
cryptosystem,'' however, that is not correct.\footnote{\g{02} is not a
generator of the AES representation of $\GF(2^8)$, and as such it is
not suitable.}
It has the following properties; this is not, however, an exhaustive
list nor a formal derivation of these properties; for more in-depth
coverage see any textbook on group and ring theory.
Note: A number in $\g{}$ is a Galois field element (i.e. a byte) in
hexadecimal representation; a number without $\g{}$ is a conventional
integer.
\begin{enumerate}
\item
The {\em addition} field operator ($+$) is represented by bitwise XOR.
\item
As a result, addition and subtraction are the same operation: $A + B = A - B$.
\item
The additive identity element ($0$) is represented by $\g{00}$.
\item
Thus, $A + A = A - A = \g{00}$.
\item
{\em Multiplication} ($\cdot$) by $\g{02}$ is implemented by the following
bitwise relations:
$$
\begin{array}{rcl}
(x \cdot \g{02})_7 & = & x_6 \\
(x \cdot \g{02})_6 & = & x_5 \\
(x \cdot \g{02})_5 & = & x_4 \\
(x \cdot \g{02})_4 & = & x_3 + x_7 \\
(x \cdot \g{02})_3 & = & x_2 + x_7 \\
(x \cdot \g{02})_2 & = & x_1 + x_7 \\
(x \cdot \g{02})_1 & = & x_0 \\
(x \cdot \g{02})_0 & = & x_7
\end{array}
$$
Hardware engineers will recognize this as a linear feedback shift
register (LFSR), and matematicians as boolean polynomial
multiplication modulo the irreducible polynomial $x^8 + x^4 + x^3 +
x^2 + 1$.
\item
The multiplicative identity element ($1$) is represented by $\g{01}$.
$$
A \cdot \g{01} = \g{01} \cdot A = A
$$
\item
The following basic rules of algebra apply:
$$
\begin{array}{lrcl}
\hbox{Addition is commutative:} & A+B & = & B+A \\
\hbox{Addition is associative:} & (A+B)+C & = & A+(B+C) \\
\hbox{Multiplication is commutative:} & A \cdot B & = & B \cdot A \\
\hbox{Multiplication is associative:} & (A \cdot B) \cdot C & = & A
\cdot (B \cdot C) \\
\hbox{Distributive law:} & (A+B)\cdot C & = & A \cdot C + B \cdot C
\end{array}
$$
\item
For every $A \not = \g{00}$, there exists an element $A^{-1}$ such
that $A \cdot A^{-1} = \g{01}$. $A^{-1}$ is called the {\em inverse}
(or {\em reciprocal}) of $A$. $\g{01}$ is
its own inverse, $\g{00}$ lacks inverse, for all other $A$,
$A^{-1} \not = A$.
\item
Division is defined as multiplication with an inverse:
$$
A / B = A \cdot B^{-1}
$$
Any nonzero element can uniquely divide any element:
If $A \cdot B = C$ then $C/B = A$ for any $B \not = \g{00}$.
In particular, $A/A = A \cdot A^{-1} = \g{01}$ for any $A \not = \g{00}$.
\item
Multiplying by zero is zero:
$$
A \cdot \g{00} = \g{00}
$$
\item
Any value can be multiplied by observing that bits decompose the same
as in ordinary arithmetric, and applying the distributive law:
$$
\begin{array}{rccccl}
\g{02}^2 & = & \g{02}\cdot\g{02} & = & \g{04} \\
\g{02}^3 & = & \g{04}\cdot\g{02} & = & \g{08} \\
\g{02}^4 & = & \g{08}\cdot\g{02} & = & \g{10} \\
\g{02}^5 & = & \g{10}\cdot\g{02} & = & \g{20} \\
\g{02}^6 & = & \g{20}\cdot\g{02} & = & \g{40} \\
\g{02}^7 & = & \g{40}\cdot\g{02} & = & \g{80}
\end{array}
$$
(Note, however: $\g{02}^8 = \g{1d}$.)
For example:
$$
\begin{array}{rcl}
\g{8d} & = & \g{80} + \g{08} + \g{04} + \g {01} \\
& = & \g{02}^7 + \g{02}^3 + \g{02}^2 + \g{01}
\end{array}
$$
Thus:
$$
A \cdot \g{8d} = A \cdot \g{02}^7 + A \cdot \g{02}^3 + A \cdot \g{02}^2 + A
$$
or, equivalently,
$$
A \cdot \g{8d} = (((A \cdot \g{02}^4)+A)\cdot\g{02}+A)\cdot\g{02}^2 + A
$$
\item
{\em Raising to a power} (repeated multiplication with the same value)
is congruent mod 255 (cardinality of all elements except
$\g{00}$). Also note that the exponent is an
{\em ordinary integer modulo 255} (an element in ${\mathbb Z}_{255}$)
as opposed to a Galois field element.
$$
\left .
\begin{array}{rclclcl}
A^{256} & = & \g{01} \cdot A & = & A\\
A^{255} & = & \g{01} \\
A^{254} & = & A^{255}/A & = & \g{01}/A & = & A^{-1}
\end{array}
\right \} A \not = \g{00}
$$
\item
There are elements ($g$), called {\em generators}, of the field such that
$g^n$ doesn't repeat until they have exhausted all elements of the
field except $\g{00}$. For the Linux RAID-6 field representation, $\g{02}$ is
such a generator -- as is $\g{02}^n$ for any $n$ which is
relative prime to 255.
\item
Accordingly, any generator $g$ defines a function from the nonzero
elements in $\GF(2^8)$ to the elements in ${\mathbb Z}_{255}$
(the integers 0-254 modulo 255) called the {\em logarithm with
base $g$} and written $\log_g$. For example, $\g{02}^4 = \g{10}$, so
$\log_\g{02} \g{10} = 4$.
\item
For any nonzero Galois field elements $A$ and $B$:
$$
\begin{array}{rcl}
A \cdot B = C & \Longleftrightarrow & \log_g A \oplus \log_g B =
\log_g C \\
A / B = C & \Longleftrightarrow & \log_g A \ominus \log_g B =
\log_g C
\end{array}
$$
... where $\oplus$ and $\ominus$ represents conventional integer
addition and subtraction modulo 255. Therefore:
$$
\begin{array}{rcl}
A \cdot B = C & \Longleftrightarrow & C = g^{(\log_g A \oplus \log_g
B)} \\
A / B = C & \Longleftrightarrow & C = g^{(\log_g A \ominus \log_g B)}
\end{array}
$$
These relations can be used to do multiplication and division without
large tables, as long as $\g{00}$ is handled specially.
\end{enumerate}
\section{Application to RAID-6}
We treat each disk block as a vector of bytes, and will perform the
same calculations on each byte in the vector. Symbols in {\bf
boldface} represent vectors (where each byte has a different value);
constants, or symbols in {\it italics} represent scalars (same value
across every data byte.)
In order to be able to suffer the loss of any two disks, we need to
compute two {\em syndromes}, here referred to as $\P$ and $\Q$.
For $n$ data disks $\D_0$, $\D_1$, $\D_2$, ... $\D_{n-1}$ $(n \leq
255)$ compute:
\begin{eqnarray}
\P & = & \D_0 + \D_1 + \D_2 + ... + \D_{n-1} \label{pdef} \\
\Q & = & g^0 \cdot \D_0 + g^1 \cdot \D_1 + g^2 \cdot \D_2 +
... + g^{n-1} \cdot \D_{n-1} \label{qdef}
\end{eqnarray}
where $g$ is any generator of the field (we use $g = \g{02}$.)
$\P$ is the ordinary XOR parity, since ``addition'' is XOR. $\Q$ is
referred to as a Reed-Solomon code.
If we lose one data drive, we can use the normal XOR parity to recover
the failed drive data, just as we would do for RAID-5. If we lose a
non-data drive, i.e. $\P$ or $\Q$, then we can just recompute.
If we lose one data drive plus the $\Q$ drive, we can recalculate the
data drive using the XOR parity, and then recompute the $\Q$ drive.
If we lose one data drive plus the $\P$ drive, we can recompute the
lost data drive ($\D_x$) from the $\Q$ drive by computing $\Q_x$ as if
$\D_x = \g{00}$, and observing:
\begin{equation}
\Q_x + g^x \cdot \D_x = \Q
\end{equation}
Here, $x$, $\Q$ and $\Q_x$ are known. Since addition and
subtraction is the same:
\begin{equation}
g^x \cdot \D_x = \Q + \Q_x
\end{equation}
\begin{equation}
\D_x = (\Q + \Q_x)/g^x = (\Q + \Q_x) \cdot g^{-x} \label{recdp}
\end{equation}
where, per the algebra rules, $g^{-x} = g^{255-x}$.
If we lose two data drives, $\D_x$ and $\D_y$, but still have the $\P$
and $\Q$ values, we compute $\P_{xy}$ and $\Q_{xy}$ by setting the
missing drives to $\g{00}$, and we get:
\begin{eqnarray}
\P_{xy} + \D_x + \D_y & = & \P \\
\Q_{xy} + g^x \cdot \D_x + g^y \cdot \D_y & = & \Q
\end{eqnarray}
$x$, $y$, $\P$, $\P_{xy}$, $\Q$ and $\Q_{xy}$ are known.
Divide the second equation by $g^x$:
\begin{equation}
g^{-x} \cdot \Q_{xy} + \D_x + g^{y-x} \cdot \D_y =
g^{-x}\cdot\Q
\end{equation}
Remembering that addition equals subtraction in this algebra:
\begin{equation}
\D_x + g^{y-x}\cdot\D_y = g^{-x}\cdot\Q +
g^{-x}\cdot\Q_{xy}
\end{equation}
\begin{equation}
\D_x = g^{-x} \cdot (\Q + \Q_{xy}) + g^{y-x} \cdot \D_y
\end{equation}
Substitute into the first equation, solve for $\D_y$:
\begin{equation}
\D_y = \P + \P_{xy} + \D_x
\end{equation}
\begin{equation}
\D_x = g^{-x}\cdot(\Q + \Q_{xy}) + g^{y-x}\cdot(\P + \P_{xy} + \D_x)
\end{equation}
\begin{equation}
\D_x = g^{-x}\cdot(\Q + \Q_{xy}) + g^{y-x}\cdot(\P + \P_{xy}) +
g^{y-x} \cdot \D_x
\end{equation}
\begin{equation}
\D_x + g^{y-x}\cdot\D_x = g^{-x}\cdot(\Q + \Q_{xy}) + g^{y-x}\cdot(\P + \P_{xy})
\end{equation}
\begin{equation}
(g^{y-x}+\g{01})\cdot\D_x = g^{-x}\cdot(\Q + \Q_{xy}) + g^{y-x}\cdot(\P + \P_{xy})
\end{equation}
If $g^{y-x}+\g{01} \not = \g{00}$, we can divide by it. This requires
$g^{y-x} \not = \g{01}$; this will be true as long as $y \not = x$,
mod 255. Since we can have no more than 255 data disks, $0 \leq x,y
\leq n-1 < 255$, this implies the only constraint is $y \not = x$,
which is true by assumption. Thus, we can divide:
\begin{equation}
\D_x = {{ g^{-x}\cdot(\Q + \Q_{xy}) + g^{y-x}\cdot(\P + \P_{xy}) }
\over {g^{y-x}+\g{01}}}
\end{equation}
For any particular data reconstruction, we can simplify this by
precomputing a few multiplication tables:
\begin{eqnarray}
A & = & { g^{y-x} \over g^{y-x}+\g{01} } = g^{y-x} \cdot (g^{y-x}+\g{01})^{-1} \\
B & = & { g^{-x} \over g^{y-x}+\g{01} } = g^{-x} \cdot (g^{y-x}+\g{01})^{-1}
\end{eqnarray}
... which only depend on $x$ and $y$ as opposed to on the data bytes.
The expression then becomes:
\begin{equation}
\D_x = A \cdot(\P + \P_{xy}) + B \cdot(\Q + \Q_{xy}) \label{recdd}
\end{equation}
We can then get $\D_y$ from the previous expression:
\begin{equation}
\D_y = (\P + \P_{xy}) + \D_x
\end{equation}
\section{Making it go fast}
The biggest problem with RAID-6 has historically been the high CPU
cost of computing the $\Q$ syndrome. The biggest cost is related to
the cost of Galois field multiplication, which doesn't map
conveniently onto standard CPU hardware, and therefore has typically
been done by table lookup.
Table lookups, however, are inherently serializing; it would be
desirable to make use of the wide datapaths of current CPUs.
In order to do this, we factor equation \ref{qdef} as such:
\begin{equation}
\Q = ((...\D_{n-1} ...)\cdot g + \D_2)\cdot g + \D_1)\cdot g + \D_0
\end{equation}
The only operations in this is addition, i.e. XOR, and multiplication
by $g = \g{02}$. Thus, we only need an efficient way to implement
multiplication by $\g{02}$ in order to compute $\Q$ quickly, not
arbitrary multiplication.
Multiplication by $\g{02}$ for a single byte can be implemeted using
the C code:
\begin{verbatim}
uint8_t c, cc;
cc = (c << 1) ^ ((c & 0x80) ? 0x1d : 0);
\end{verbatim}
Now, we want to do this on multiple bytes in parallel. Assume for the
moment we are on a 32-bit machine (the extension to 64 bits should be
obvious), and separate these into two parts:
\begin{verbatim}
uint32_t v, vv;
vv = (v << 1) & 0xfefefefe;
vv ^= ((v & 0x00000080) ? 0x0000001d : 0) +
((v & 0x00008000) ? 0x00001d00 : 0) +
((v & 0x00800000) ? 0x001d0000 : 0) +
((v & 0x80000000) ? 0x1d000000 : 0);
\end{verbatim}
The {\tt 0xfefefefe} of the first statement masks any bits that get
shifted into the next byte. The second statement is clearly too
complex to be efficiently executed, however. If we can produce a
mask based on the top bit in each byte, we could just do:
\begin{verbatim}
uint32_t v, vv;
vv = (v << 1) & 0xfefefefe;
vv ^= MASK(v) & 0x1d1d1d1d;
\end{verbatim}
In standard portable C, one implemenation of this {\tt MASK()}
function looks like:
\begin{verbatim}
uint32_t MASK(uint32_t v)
{
v &= 0x80808080; /* Extract the top bits */
return (v << 1) - (v >> 7); /* Overflow on the top bit is OK */
}
\end{verbatim}
The result is {\tt 0x00} for any byte with the top bit clear, {\tt
0xff} for any byte with the top bit set. This is the algorithm used
in the file {\tt raid6int.uc}.
For additional speed improvements, it is desirable to use any integer
vector instruction set that happens to be available on the machine,
such as MMX or SSE-2 on x86, AltiVec on PowerPC, etc. These
instruction sets typically have quirks that may make them easier or
harder to use than the integer implementation, but usually easier.
For example, the MMX/SSE-2 instruction {\tt PCMPGTB} conveniently
implements the {\tt MASK()} function when comparing against zero, and
the {\tt PADDB} instruction implements the shift and mask in the first
line of the operations on {\tt vv} when added with itself.
Note that none of this will avoid the arbitrary multiplications of
equations \ref{recdp} and \ref{recdd}. Thus, in 2-disk-degraded mode,
performance will be very slow. However, it is expected that that will
be a rare occurrence, and that performance will not matter
significantly in that case.
\subsection{Special notes on PowerPC AltiVec and x86 SSSE3}
The Altivec SIMD vector instruction set for PowerPC has a special
instruction, {\tt vperm}, which does a parallel table lookup using the
bottom five bits of each byte in a vector.
This can be used to handle arbitrary scalar\ $\cdot$\ vector
multiplication (as in equations \ref{recdp} and \ref{recdd}) quickly,
by decomposing the vector.
This decomposition is simply a matter of observing that, from the
distributive law:
$$
\begin{array}{rcl}
\V & = & \V_a + \V_b \\
A \cdot \V & = & A \cdot \V_a + A \cdot \V_b
\end{array}
$$
For the decomposition to work, there can only be 32 possible values
for each byte in $\V_a$ or $\V_b$; the easiest such decomposition is
simply $\V_a$ being the low four bits and $\V_b$ being the high four
bits; since addition is XOR this is a valid decomposition, and there
are only 16 possible values of each.
Thus, for each multiplication (i.e. value of $A$) we need to set up a
pair of vector registers, one which contains ($A\cdot\g{00}$,
$A\cdot\g{01}$, $A\cdot\g{02}$, ... $A\cdot\g{0f}$) and one which
contains ($A\cdot\g{00}$, $A\cdot\g{10}$, $A\cdot\g{20}$,
... $A\cdot\g{f0}$).
If these vectors are in {\tt v12} and {\tt v13} respectively, and {\tt
v14} set up to contain (\g{04}, \g{04}, ...), we can compute
${\tt v1} \leftarrow A\cdot{\tt v0}$ this way:
\begin{verbatim}
vsrb v1, v0, v14
vperm v2, v12, v12, v0
vperm v1, v13, v13, v1
vxor v1, v2, v1
\end{verbatim}
On most Altivec processors, this will execute in three cycles. Note
that we don't actually need to mask the top bits for the first
{\tt vperm}; since we repeat {\tt v12} twice we effectively ignore
bit\ 4, and bits\ 5-7 are ignored by the hardware anyway.
The SSSE3 (Supplemental SSE3) extensions to the x86 instruction set
includes a {\tt PSHUFB} instruction, which can be used in a similar
way. {\tt PSHUFB} uses bit\ 7 as a control bit, which means that the
lower half operation has to be masked; simply replicating the inputs
will not help. Furthermore, since no {\tt PSRAB} instruction exists,
one also has to mask the high half. Thus, as above, with {\tt xmm14}
having the scalar constant \g{0f}:
\begin{verbatim}
movdqa xmm2, xmm0
psraw xmm0, 4
movdqa xmm1, xmm12
movdqa xmm3, xmm13
pand xmm0, xmm14
pand xmm2, xmm14
pshufb xmm1, xmm0
pshufb xmm3, xmm2
pxor xmm1, xmm3
\end{verbatim}
\section{Single-disk corruption recovery}
It is possible to use the RAID-6 syndrome set to recover from a single
disk {\em corruption}, as opposed to one or two known failed drives
(called {\em erasures}.)
This requires recomputation of the syndrome set on read. This can of
course also be done as a periodic integrity check, or as recovery if
corruption is known or believed.
To consider the case of a single corrupt disk, we first consider the
case where the failed disk ($z$) is one of the data drives ($\D_z$).
We will represent the corrupt data on that drive with $\X_z$.
Obviously, the value $z$ is unknown, although of course, by definition,
$0 \leq z < n \leq 255$.
We compute the standard syndrome set over the corrupt disk set:
\begin{eqnarray}
\P' & = & \D_0 + \D_1 + \dots + \X_z + \dots + \D_{n-1} \\
\Q' & = & g^0\cdot\D_0 + g^1\cdot\D_1 + \dots + g^z\cdot\X_z + \dots + g^{n-1}\cdot\D_{n-1}
\end{eqnarray}
It obviously follows that:
\begin{eqnarray}
\Pstar & = & \P + \P' = \D_z + \X_z \\
\Qstar & = & \Q + \Q' = g^z\cdot\D_z + g^z\cdot\X_z = g^z\cdot(\D_z + \X_z) =
g^z\cdot\P^\star
\end{eqnarray}
By assumption, $\X_z \not= \D_z$ and thus $\Pstar \not= \g{00}$.
Furthermore, since $g^z \not= \g{00}$ for any $z$, $\Qstar \not= \g{00}$.
Thus it it valid to state:
\begin{equation}
\Qstar / \Pstar = g^z
\end{equation}
Since $0 \leq z < n \leq 255$, it then follows:
\begin{equation}
z = \log_g (\Qstar / \Pstar) = \log_g \Qstar \ominus \log_g \Pstar \label{cdrive}
\end{equation}
... which will be a well-defined relation for all possible values that
fit the required assumptions.
As noted above, for the case of a corrupt data drive, $\Pstar \not=
\g{00}$, and $\Qstar \not= \g{00}$. The {\em other} possible cases
can be trivially shown to result in various combinations which involve
$\Pstar$ and/or $\Qstar$ being zero:
\begin{center}
\def\tmp#1{\multicolumn{1}{|l|}{#1}}
\begin{tabular}{l|c|c|} \cline{2-3}
& $\Pstar$ & $\Qstar$ \\ \hline
\tmp{No corruption} & $= \g{00}$ & $= \g{00}$ \\ \hline
\tmp{$\P$ drive corruption} & $\not= \g{00}$ & $= \g{00}$ \\ \hline
\tmp{$\Q$ drive corruption} & $= \g{00}$ & $\not= \g{00}$ \\ \hline
\tmp{Data drive corruption} & $\not= \g{00}$ & $\not= \g{00}$ \\ \hline
\end{tabular}
\end{center}
or, equivalently:
\begin{center}
\def\tmp#1{\multicolumn{1}{|l|}{#1}}
\begin{tabular}{l|c|c|} \cline{2-3}
& $\P'$ & $\Q'$ \\ \hline
\tmp{No corruption} & $\P = \P'$ & $\Q = \Q'$ \\ \hline
\tmp{$\P$ drive corruption} & $\P \not= \P'$ & $\Q = \Q'$ \\ \hline
\tmp{$\Q$ drive corruption} & $\P = \P'$ & $\Q \not= \Q'$ \\ \hline
\tmp{Data drive corruption} & $\P \not= \P'$ & $\Q \not= \Q'$ \\ \hline
\end{tabular}
\end{center}
Obviously, for the cases of $\P$ or $\Q$ drive corruption, just
replace the corrupt data with the recomputed $\P'$ or $\Q'$,
respectively. In the case of data drive corruption, once the faulty
drive has been identified, recover using the $\P$ drive in the same
way as a one-disk erasure failure.
It should be noted that although we have used scalar notation for the
corrupt drive, data corruption is actually detected on a {\em byte by
byte} basis. Thus, the zeroness tests should be done for each byte,
and $z$ in equation \ref{cdrive} really should be a vector result,
$\z$. It it, of course, a quality of implementation issue whether or
not it is possible to recover from multiple drives having
non-overlapping corruption in corresponding sectors or blocks.
Finally, as a word of caution it should be noted that RAID-6 by itself
cannot (in the general case) even detect, never mind recover from,
dual-disk corruption. If two disks are corrupt in the same byte
positions, the above algorithm will (again, in the general case)
introduce {\em additional} data corruption by corrupting a third
drive. However, the following probabilistic patterns are likely to be
indicative of such multidisk corruption, and a quality implementation
should take appropriate action, such as aborting rather than further
corrupting data:
\begin{itemize}
\item $z$ values inconsistent with the number of disks, for example
$z = 136$ when $n = 20$.
\item Inconsistent $z$ values within a single hardware sector or
block. This does not apply to occasional bytes with no corruption
($\P^\star = \Q^\star = \g{00}$) -- after all, even a standing clock
is correct once every 12 hours.
\end{itemize}
\section{Beyond RAID-6}
Reed-Solomon coding can be exploited further to allow for any
combination of $n$ data disks plus $m$ redundancy disks allowing for
any $m$ failures to be recovered. However, with increasing amount of
redundancy, the higher the overhead both in CPU time and I/O. The
Linux RAID-6 work has been focused on handling the case of $m = 2$
efficiently in order for it to be practically useful.
An excellent paper on implementing arbitrarily complex recovery sets
using Reed-Solomon coding can be found at:
\begin{verbatim}
http://www.cs.utk.edu/~plank/plank/papers/CS-96-332.html
\end{verbatim}
\end{document}