1. 程式人生 > 實用技巧 >The Montgomery ladder for x-coordinate-based scalar multiplication

The Montgomery ladder for x-coordinate-based scalar multiplication

//https://www.researchgate.net/publication/277940984_High-speed_Curve25519_on_8-bit_16-bit_and_32-bit_microcontrollers

Synthetically, the advantages of the Montgomery ladder are that it issimpleandfast.

If you look at X25519, the Diffie-Hellman algorithm applied to Curve25519 and described inRFC 7748, you will see that for ann

-bit Montgomery curve, multiplying a point with ann-bit scalar, you will need to compute about10nmultiplications of field elements. In more details, there areniterations of a loop, each of them implying:

  • 4 multiplications with two varying field element values
  • 4 squarings
  • 1 multiplication with a fixed and usually small constant (calleda24
    in the RFC)
  • 1 multiplication with the base point "u" coordinate (fixed throughout the algorithm, but may vary between invocations)

To that, a final field inversion must be added, which is normally done with a modular exponentiation. If the field modulus is well chosen, that inversion will add a bit more thann

squarings. So the value of "10" given above is an estimate that depends on how squarings and multiplications by constants can be optimised; in practice, overall cost will be between8nand12n.

Now, with "classic" curves with Weierstraß equationy2= x3- 3x + b(for some constantb), the usual implementation method is to useJacobian coordinates, in which a point (x,y) is represented by the triplet (X,Y,Z), with:

  • x=X/Z2
  • y=Y/Z3

With that representation, apoint doublingcosts 8 multiplications (4 of them are squarings), while a point addition which is not a doubling uses 16 multiplications (4 of which being squarings). A basic double-and-add algorithm would then need an average of 16 multiplications per multiplier bit, but in fact 24 if you want a constant-time implementation (which is recommended), therefore not leaking information about which scalar bits are 0 and which are 1. An extranmultiplications in total are to be added, for the final inversion ofZto get back to affine coordinates (there again, classically with a modular exponentiation).

Things can be sped up with awindow optimisationin which, for base pointP, you precompute small multiples ofP; for instance, with a 5-bit window, you computekPfor allkfrom 0 to 31 (which involves 15 doublings and 15 point additions), and then you only make one point addition every five doublings. This requires some extra implementation care (for instance doing a constant-time lookup in the window, if constant-time execution is sought) but can bring down overall cost to about13nfield multiplications, i.e. 30% more than for a Montgomery curve. Some further speedups are possible when the base point is known in advance, as is the case for the first half of Diffie-Hellman (using the conventional base curve), because the addition formulas in projective coordinates when one of the point hasZ= 1 (called "mixed additions") are a bit simpler (11 multiplications instead of 16).


So, in practice, Montgomery curves get a speed advantage, but notreallybecause they are Montgomery curves:

  • As shown above, the advantage is slight (about 30%). It must be said, though, that implementation of Montgomery curves is mucheasierand it shows up as smaller code and less room for tricky bugs.

  • A much bigger speedup of Curve25519 over NIST curve P-256 comes from the definition of the base field. NIST P-256 uses integers modulop= 2256- 2224+ 2192+ 296- 1. Curve25519 usesp= 2255- 19. Both are integers chosen because they allow for faster modular reduction; however, NIST's choice is optimised for computer hardware of the late 1980s, while Curve25519 shoots at mid-2000s hardware, with much cheaper multiplication opcodes, and superscalar architectures. On modern hardware (and this includes modern "small" embedded CPU), the latter choice seems to be roughly twice faster than NIST's modulus.

  • An extra interesting feature of Curve25519 is that it is notonecurve, buttwocurves. Every possible source value for the "u" coordinate will define either a point on the curve, or on the "twisted" curve, which is also cryptographically good (its order is a big prime multiplied by a very small cofactor). This allows X25519 to be safe without performing any validation of the incoming point, which again makes for simpler and shorter code. There again, that feature is not intrinsic to Montgomery curves; classic curves in short Weierstraß form can exhibit the same property, but the NIST curve does not.

  • Conversely, thereisa complication that comes from Montgomery curves, which is that their order is necessarily a multiple of 8; hence, it cannot be prime. This means that a valid curve point is not necessarily a point on the prime-order subgroup on which we perform most operations. The X25519 specification accommodates that issue by forcing scalars to be multiple of 8. When Curve25519 (or its derivative edwards25519) is used in other, more complex cryptographic protocols, this propertycan be problematicand must be side-stepped appropriately (usually by the same methods of multiplying things by 8 or forcing multipliers to be multiple of 8). In a way, this is a trade-off: implementation is more simple, but at the expense of a bit of extra complexity in theprotocols.


To summarise:the Montgomery ladder makes for a somewhat faster algorithm which is easier to implement, especially if you aim for constant-time code. But modern standard curves, that are Montgomery curves, also come with a few extra features that are quite nice to have, and yield even greater speedups; these extra features arenotreserved to Montgomery curves, but NIST curves don't offer them because they had not been invented or at least noticed at that time.

As an historical note: Montgomery curves were first invented and used for theelliptic curve factorization methodin which raw speed is of paramount importance, but the curves themselves don't have any actual cryptographic relevance.