OK, this post is all about math which frustrated me for weeks, a year and a half ago. I’ve forgotten most of the details since then and I’m not fully convinced I understood them in the first place, but the question has come up again in discussing sensor fusion at the Portland State Aerospace Society (PSAS), so I’m going to try to re-discover the details here.
At the time I was working on a high-assurance quadcopter autopilot (SMACCMPilot), but it turns out that quadcopters have many of the same kinds of sensors as rockets, and similar sensor fusion problems. (I talked about sensor fusion for rockets in my first post here and I’m sure it’ll come up again.)
I have the impression that this topic is one where people who’ve actually been trained in aerospace think the answer is obvious. Consider the response I got when I tried to ask the author of a Kalman filter I wanted to re-use, “What representation is expected for gyro measurements?”
I have no formal aerospace training and I still don’t think it’s obvious at all. Being told to “See standard texts on strapdown nav” was not helpful—especially since I have no idea which texts are “standard”, even if one of the best sources I eventually found did turn out to be a book titled “Strapdown Inertial Navigation Technology”.
The short, technical, version of the question: How do you use a three-axis rate gyro to update a quaternion that represents your vehicle’s current orientation? (If you already know what this question means you can skip to the next section.)
Quadcopters and rockets both usually have “rate gyros”, which are sensors that will tell you how fast your vehicle is spinning. You might interpret a rate gyro’s output as a number measured in degrees per second. (The actual units don’t matter for anything except your sanity, of course. Degrees per second is probably what you’ll find in your sensor’s datasheet; radians per second may be more convenient when working with trig functions; cycles per second might be best when presenting numbers to humans.)
Each gyro measures rotation around a single axis; you need three of them to measure complete 3D angular velocity. In aerospace the three rotation axes are usually called “roll”, “pitch”, and “yaw”, when measured with respect to the vehicle itself.
But for many purposes we don’t care how fast the vehicle is rotating. What we want to know is which direction the vehicle is pointing, with respect to some convenient frame of reference, such as North/East/Down (NED) or Earth-Centered/Earth-Fixed (ECEF).
It might seem like a natural way to represent that orientation would be as an angle around each of the three axes. That representation is referred to as the Euler angle representation. There’s a problem with that representation, called “gimbal lock”, which has been written about extensively elsewhere so I won’t cover it here. Euler angles are also weird because rotations aren’t commutative so you have to pick a convention for which order to apply the rotations in, and there are literally a dozen choices. Ick.
Because of the problems with the Euler angle representation, aerospace systems usually use either a 3x3 rotation matrix, or a unit quaternion. Quaternions have several advantages over the matrix representation, including that they’re smaller (4 numbers instead of 9); and that numerical precision problems due to floating-point inaccuracy are easier to fix up on quaternions.
Quaternions have the disadvantage that they are utterly baffling, but let’s not let that stop us, eh?
If we assume that we know precisely what direction the vehicle was pointing when it powered on, and we have perfect measurements of how fast it’s been rotating since then, then conceptually, we can just add up all those rotations to figure out which direction the vehicle is pointing now. In practice it isn’t so easy because we don’t have perfect information about the initial orientation nor about the rotations since then, but dealing with those problems is what sensor fusion is all about.
Sensor fusion algorithms such as the Kalman filter augment a model which assumes perfect information to turn it into one that can take advantage of imperfect information instead. So in this post I’ll pretend like we do have perfect information. Using the model we’ll derive here in a sensor fusion implementation is left as an exercise for the reader. (Or you can study my Haskell sensor fusion example, or Paul Riseborough’s Matlab/C++ version whose example I was trying to follow. Just be careful with the latter as I found plenty of implementation bugs in it, and be careful with the former because I make no guarantees about having the right model.)
Putting all that together: how do we do the “adding up” of a series of 3-axis rotation measurements taken from rate gyros, given a unit quaternion representation for orientation?
Why am I confused?
When asking for help understanding Paul Riseborough’s filter, I wrote:
[The filter] constructs
deltaQuatby assuming that
correctedDelAngis in an axis-angle representation, where the magnitude of the
Vector3fis the angle, and the direction of the vector is the axis of rotation.
I expected that any gyro measurement would instead be represented as Euler angles, that is, three orthogonal measurements of rotation rate. That’s not algebraically equivalent to the axis-angle representation, as far as I can tell, except when any two of the components are 0.
Recently one of the students in PSAS asked me:
I thought what I needed to do was store three vectors for local pitch/yaw/roll directions, in terms of my local tangent plane coordinate frame. Then, when I get data from the rate gyros, I can convert the rotations into quaternions using the local pitch/yaw/roll axis vectors and then compute the updated local axis vectors.
But that seems wrong because I know rotations are non-commutative, and I’d have to choose what order to perform the rotations in. So I think there’s something I’m missing.
I think these are basically the same confusion. It turns out that this is really easy mathematically, but physically speaking, I still don’t understand why!
- If your current orientation is represented by a quaternion q, and you define an update quaternion p by making the vector part of p equal the gyro measurement vector while the scalar part is 0, then the derivative of q is 0.5q·p.
- Discretized, that means that the new value of q is equal to the old value of q, times a quaternion derived from treating the gyro measurement as a 3-vector axis/angle representation, where the magnitude of the measurement vector is the rotation angle, and the norm of the vector is the axis around which the vehicle is rotating.
- Computing the norm of the gyro measurement vector requires dividing each element by the magnitude of the vector. But if the vehicle isn’t rotating then the magnitude is 0. An equivalent version without any division by 0 comes from taking the Taylor series expansion of the axis/angle formula, which also allows you to decide how much precision you care about in your approximation of sine and cosine.
My main sources for these answers are the book “Strapdown Inertial Navigation Technology”, 2nd edition, by Titterton and Weston; and a nearly-unreadable PDF titled “Quaternion Differentiation”.
Both sources assert point #1, but I think that the PDF gives a derivation justifying it. I’m not entirely sure, to be honest, because although I’m convinced of every algebra step in that paper, I’m not sure what I can conclude from them, or why I should believe their initial assumptions. It’s really poorly written.
Point #2 is derived from point #1 in the book, in section 11.2.5, on pages 319-320. The method is basically the same as that shown in the Wikipedia article on discretization:
- Re-write the differential equation so that it’s a matrix multiplication instead of a quaternion multiplication, because the discretization procedure works on matrices, not quaternions. They do this by replacing the right-hand side with 0.5W·q, where W is the matrix representation of the conjugate of p. (I think they take the conjugate because they’ve moved it to the left of q instead of the right? I’m not sure.) Note that q still has the same 4 elements, only now we’re pretending it’s any old 4-vector instead of a quaternion.
- Assume that the discrete time steps are short enough that the angular velocity is basically constant. This is the “zero-order hold” assumption, according to Wikipedia.
- Integrate W over the current time-step, which is easy given zero-order hold: just multiply all the elements by T, the length of the time-step. (The book doesn’t explain this step but it’s really important that you multiply by delta-t somewhere!)
- Compute the matrix exponential of 0.5·T·W, and multiply the result by the old q to get the new q.
- But we want quaternions instead of matrices, so at this point the book turns that matrix exponential back into a quaternion through magic. A similar development was given in section 11.2.1 on pages 312-313, so they handwave it here. I’ll handwave it too, but if you squint at the power series definition of matrix exponential you can see how it’s like the Taylor series expansion of the sine and cosine terms in the axis-angle to quaternion formula. Specifically, if we define σ as the gyro measurement vector times the length of the time step, then the rotation axis is the norm of σ, and the rotation angle is the magnitude of σ. If we call the resulting quaternion r, then the new q is the old q times r. (Notice we’ve magically flipped back to multiplying on the right! I can’t explain that. I bet there’s another conjugate hiding in the hand-waving.)
OK. That was a lot of magic, but I don’t feel too terrible about it because (a) at least lots of sources seem to agree on the basic steps and (b) I find them convincing enough that these are principled steps and not just stuff somebody made up that seemed to work.
The book goes on to derive point #3 from point #2. I don’t want to go into detail on the Taylor series expansion of sine and cosine in this post, but if it helps, I wrote a Haskell function which computes an arbitrary-order approximation of the axis-angle to quaternion formula. In addition to avoiding any division by 0, this definition is nice because it has no square-roots, only three multiplies, plus one division and addition per order, and it computes both sine and cosine at the same time. On non-X86 CPUs, which don’t typically have hardware sine/cosine instructions, I think this is about as efficient of an algorithm as you can get.
Just like when I was first banging my head against this question, in today’s post I have not managed to justify the answers here from first principles of physics or anything. I decided then, and I’m still OK with this now, that I had dug far enough to be satisfied that there is some underlying reason why this is the right math.
But if anyone wants to explain where that initial formula of 0.5q·p comes from, please leave a comment!