OV-chip 2.0 progress
On this page we describe our progress in the OV-chip 2.0 progress. Items will be added irregularly at the top. Therefore it is probably best to start reading this page at the bottom...
June 26th: Performance comparison between NFC phones and Java cards
After struggling for some time with buggy NFC implementations on Nokias NFC phones, François Kooman finally implemented enough workarounds to get a reliable NFC connection to those phones. This makes it possible to run all our applets with few modifications as midlets on the phone. First François has ported the test applet to the phone to perform performance measurements on the phone. The chart below shows the performance of our JCOP 4.1 cards in comparison with the Nokia 6131 NFC and the Nokia 6212 classic.
For the measurement the length of the exponent varies with the length of the bases as usual on this page. The actual exponent size is plotted on the right y-axis. The first computation on the card for a given key takes always much longer (probably because of RSA cipher initialization on the card). Therefore one line of read and green crosses is far above and many of those green crosses are outside the chart.
Cards and phones use the respectively fastest methods for computing multi powers. On the card this using the RSA cipher for single powers and multiplying the intermediate results with squaring multiplication. On the phones (without a cryptographic coprocessor) this is simultaneous squaring on the basis of Montgomery multiplication with a precomputed table of base factors. The phone uses of course the int/long configuration of the Bignat library, where the cards can only use the byte/short configuration.
Naively one would expect a mobile phone with its own power resources and its fast CPU to completely outperform a Java card. However, Java does not seem to run very fast on mobile phones and the CPU's do not seem to be that fast. On the other side the dedicated hardware in the cryptographic coprocessor on Java cards seems to be very fast. As a result the cards are significantly faster then the Nokia 6131 for long RSA key sizes - when the card is connected over the contact interface. The Nokia 6212 seems to have a faster CPU and is therefore faster than the cards, even for long key sizes.
June 26th: First public OV-chip source release
The first public release of our sources appeared on http://www.sos.cs.ru.nl/ovchip.
Brands protocols for selective disclosure that we have implemented are protected by patents that are currently in the hands of Microsoft. Already back in January we asked Microsoft whether we could release our implementation under a permissive license for research purposes. Apparently this was a very difficult question. Microsoft lawyers are still pondering our request.
In this unfortunate situation we were forced to delete those method bodies that are directly related to Brands protocols. Altogether 21 methods bodies are missing in the public release. This makes it impossible to compile and run our OV-chip protocols and also the demonstrator, which appeared in some screen shots on this page.
However, our test applet is not affected by the patents and can therefore be compiled and used without problems. The test applet tests our Bignat library and produced most of the performance measurements on this page.
The algorithms of the missing method bodies are very precisely described in our "Performance issues" paper (see the entry of June 5th below) and also in Brands book Rethinking Public Key Infrastructures and Digital Certificates: Building in Privacy. It should therefore not be too hard to reimplement the missing methods (if you are permitted to do so by patent law).
June 16th: Demonstrating Privacy
The GUI displays now the identities of the cards together with the actions that are taken on the card. This makes it very clear what kind of privacy the implemented protocols provide.
In the above screen shot you see the entry gate imitation of the demonstrator. The right window shows the progress messages, while the card action log on the left shows the card identities together with the actions. The card identity is value of the blinded attribute expression. To keep the display comprehensible I only display the first ten decimal digits of the blinded attribute expression.
What you see is that the applet got identity 2466682085 when it was initialized. As last step of the initialization the resign protocol is run to give the applet a new blinding. The resign protocol recognizes the applet 2466682085. Because of the changed blinding the identity of the applet changes and when the same card turns up at an entry gate it is recognized with identity 2688996633. At this point it is impossible to say whether the same or a completely different card turned up at the entry gate. One can only see that card 2688996633 passes a second entry gate and obtains then a new blinding (and changes its identity by that).
Afterwards when identity 1426296638 is recognized at an entry gate it might or might not be the same card...
Apart from the visible changes in the demonstrator also other improvements have been made. Most notably our three OV-chip applets have been made to work with the int/long configuration of the Bignat library. Further, a reset method has been added to the applets to make measurements with different key sizes possible without the need to reinstall the applet. These changes have been made for our experiments with running the applets as midlets on NFC capable mobile phones. These phones run a more standard version of Java with int and long integer types. Further they have a garbage collector which will clean up the dangling objects that the reset operation leaves behind.
June 11th: API documentation complete
After writing almost 1500 documentation items the API documentation of the OV-chip 2.0 sources is considered to be complete. In total there are 2532 items of which 2409 are documented. Of those, the IDL compiler generates 953 items together with their API documentation. The remaining 123 undocumented items are methods and fields that have been generated by the netbeans IDE for the GUI.
The complete Java sources contain about 20.000 lines of code (pure Java without comments). To that the IDL compiler adds more than 16.000 lines of code. The handwritten documentation comprises almost 18.000 lines. No wonder that it took several month to complete the API documentation.
June 5th: Paper accepted at WISTP 2009
Our paper Performance issues of Selective Disclosure and Blinded Issuing Protocols on Java Card (on which I hinted already at Mai 4th) was accepted for publication at this years Workshop in Information Security, Theory and Practices (WISTP 2009). The paper evaluates the performance of selective disclosure protocols on Java Card and describes the performance problems we run into in this project. Apart from pdf, you can download the paper in ps.gz, ps and dvi.tar.gz format.
Mai 28th: Running times beyond two minutes
When I started Java Card programming all my APDU commands were aborted after 2 minutes. Calling APDU.waitExtension() did not make any change. All the measurements I did were therefore limited to a running time of 2 minutes. Then someday in March, probably as a side effect of upgrading from Debian Etch to Debian Lenny, the 2 minutes timeout disappeared and the card could compute for half an hour on a single APDU command. Since then I redoing many measurements and update the charts on this page. Until now the following entries have been updated: April 7, December 9, and September 5.
Mai 25th: Performance of the squaring RSA applet
Building on the squared multiplication I implemented a third variant of the RSA applet, the squaring RSA applet. The performance improvements are twofold. Firstly, squared multiplication is much faster than Montgomery multiplication, and, secondly, additional multiplication with Montgomery correction factors is no longer needed, so in total less multiplications are performed.
The left chart show the performance of the resign and proof protocols. The lines in the chart are just an average, the single measurements vary by up to 2 seconds. Note that the y-axis goes just up to 20 seconds. For a key size of 1280 bits, which is secure for the next few years, resigning takes between 12 and 14 seconds and proving takes between 8 and 10 seconds.
The right hand chart shows the time needed to compute multi-powers. For 1280 bits it takes just little more than 2 seconds. The first multi-power for a given key size takes always much longer, probably because of the cipher initialization on the card.
Mai 22nd: Performance of squared multiplication
The following chart shows the performance of the new modular multiplication method.
The timings for squared multiplication vary, because it requires between 1 and 4 additions (in addition to 2 subtractions and 3 squarings). In addition to the timings for squared multiplication I also plotted the timings for addition and squaring. I also measured subtraction, but the difference between addition and subtraction is negligible. The difference between the four red lines nicely corresponds to the time of one addition, as shown by the green line.
The squared multiplication provides modular multiplication for numbers in the range of 512-1952 bits. For Brands selective disclosure protocols we also need normal (that is none-modular) multiplication for number in the range 100-200 bits. For that we can also use the squaring trick and reuse the same RSA cipher with the same RSA modulus. Because the modulus is twice as long than the arguments it will compute the normal product. The chart below compares this short squaring multiplication with normal multiplication.
As one can see the squaring multiplication is faster for numbers from about 96 bits.
Mai 15th: Faster modular multiplication?
Last Tuesday Bart Jacobs came up with the idea to use the expansion of (a + b)^2 to compute products. This gives a * b = ((a+b)^2 - a^2 - b^2) / 2 and is a silly idea in general, but on Java Card it advantageous because the squares can be computed on the cryptographic coprocessor. For modular multiplication the idea works only under certain preconditions, which are luckily fulfilled in our case. Because there are squares involved I will call this new multiplication algorithm squared multiplication. I am now busy with adding squared multiplication to the Bignat library.
Mai 11th: Graphical demonstrator works again
Since the transition to Wojciech Mostowski's global platform manager (see February 19th) the graphical demonstrator (see January 8th) was broken. I now also integrated the global platform manager into the graphical demonstrator and deleted the clumsy applet installation code that was used before. I further redesigned the demonstrator slightly: All long running actions, especially all code that talks to a Java card, runs now on a separate thread and is not making the GUI unresponsive any more. The Montgomerizing applet has been integrated as well, so with a key size of 32 bits one can actually use the demonstrator in a presentation. The demonstrator now works also fine with the jcop emulator, which is nice for testing.
All in all, the redesign and the switch to the global platform manager fixed a lot of problems in the demonstrator. It is now working smoothly and doesn't require expert knowledge any more.
There has also been progress in the documentation. Now 1699 out of 2034 items are documented, more than 83%.
Mai 4th: Anonymous paper submission
We submitted a paper to a workshop describing our experiences with implementing Brands selective disclosure protocols on Java cards. I am not giving the name of the workshop or the title of the paper here, because the submission was completely anonymous and I don't want to subvert that by placing the paper into the Google index.
The paper shows our disappointing performance results analyzes the reasons for the bad results: The Java Card API does not facilitate the implementation of new cryptographic protocols, because it does not provide access to the fast operations for big integers that the cryptographic coprocessor provides.
April 9th: API Documentation of OV-chip protocol layer complete
The next intermediate step in documentation writing has been reached. The sources in the util directory are completely documented now. This also covers the OV-chip protocol layer (formerly known as custom RMI layer, see September 1st). 1390 of required 1960 documentation entries have been finished now, that makes almost 71%.
April 7th: The Montgomerizing OV-chip applet runs on the card
As the title says, the Montgomerizing OV-chip applet runs flawlessly on the card. It is faster and slower at the same time in comparison with the old plain applet.
It is faster, because it permits ridiculously short RSA moduli of 32 bits. One could even work with RSA keys smaller than 32 bits, but than the probability that the card guesses one of the secret RSA factors when creating a random number becomes significant. For such cases the protocol breaks because some responses of the card are not invertible any more. For 32 bit RSA keys resigning takes 5 seconds and the gate protocol 2 seconds. This might be nice for the GUI demonstrator.
On key length above 200 bits the Montgomerizing applet will take ages. Update: The following charts show the performance of the Montgomerizing applet.
The left chart shows the running times for the Montgomerizing applet and the right chart shows both, the Montgomerizing and the plain applet. Note that the y-axis goes up to 70 minutes with one APDU command using as much as 40 minutes! The measurement for the Montgomerizing applet run several days and was manually aborted at the key size of 512 bits.
The Montgomerizing applet shares most of the source code with the plain applet. Only one class of the plain applet has been duplicated in order to make the necessary changes for the Montgomerizing applet. The host driver is generic. A command-line option tells the host driver which applet to install. For the gate and the resign protocol the applet transmits now an applet identifier such that the host driver can automatically adapt itself.
It was again a pleasant experience to use the host test frame for debugging the applet (see November 26th). In the beginning there was one non-trivial problem in the code: The host driver was sending one number not Montgomerized as the applet expected, causing wrong responses from the applet. Debugging this in the host test frame with adding print statements in the middle of the applet code was relatively easy. I don't even want to think about how one could debug such problems on a real card. After the applet was running fine in the host test frame it was immediately working on a real card.
BTW, if one over-over-stretches Lenstra's estimations (see November 3rd below) for the security of RSA keys, one finds that a RSA key of 32 bits has security level -1 in 2009.
March 31st: Preparing the Montgomerizing OV-chip applet
The current OV-chip applet uses the RSA cipher on the Java Card to compute modular powers. Therefore the applet is inherently unportable. One could of course substitute some modular power routine for these uses of the RSA cipher. However, this makes no sense, because, without the RSA cipher, one can take advantage of simultaneous squaring for speeding up the computations of those multi-powers.
In order to have a reasonable implementation of the OV-chip applet that can also run on devices without hardware-accelerated RSA cipher (for instance phones), we decided to develop a variant of the OV-chip applet that uses simultaneous squaring instead of the RSA cipher to compute multi-powers. Our implementation of simultaneous squaring requires that all bases are Montgomerized and it also delivers a Montgomerized result. To save time on the card, the new applet will only compute with Montgomerized numbers. Demontgomerization will take place in the host driver and all the communication between host and card will be Montgomerized too. Because so much Montgomerization is going on, the new applet will be called the Montgomerizing OV-chip applet. The old applet will from now on be called the Plain OV-chip applet, because the communication protocol for it sends plain, unmontgomerized numbers.
Until now we completed some preparations for the Montgomerizing applet: Data structures on the card and in the communication protocols have been changed for the Montgomerizing applet.
March 23th: Documentation and Refactorization
Nothing spectacular happened in the last 4 weeks. I am still busy with writing documentation. The Hackers guide grows and more and more method and classes in the repository are fully documented. Writing the documentation forces me to reexamine the code and thereby I now and then see points that have the potential for improvement. So apart from the growing documentation there have also been some refactorizations.
I started to experiment with a Java Card that supports the int type. The hope is that using int/short instead of short/byte would speed up the computations a bit. However, the only card with int support that we have, is very old and the implementation of the global platform on it is a bit broken. I have to wait until I get some newer cards with int support before I continue this track.
February 26th: Bignat documentation complete
About 55% of the API is now documented (1054 out of 1889 items). The API documentation for the Bignat library is now complete. Apart from writing documentation the code has been improved at various places without adding new features.
February 20th: x^0 = 0
There was actually a bug that caused the occasional hiccups in the OV-chip applet that I reported on December 9th.
The problem is that the RSA cipher on the card delivers 0 (zero) as cipher text for any plain text, when the public exponent is 0. This means that the card computes x^0 = 0, which of course causes some errors afterwards. The jcop emulator shows a slightly different behavior for this case: It hangs for about 20 minutes and then dies with a segmentation fault.
I probably only saw this problem because I programmed the host driver to set an attribute to 0 with a probability of 5% during the resign protocol. Now after the fix the applet runs without any problems.
February 19th: Global platform library integration
Until now I used a separate, independent tool for installing and removing applets from Java Cards. Wojciech Mostowski, a member of our group here, recently wrote the global platform manager, a tool for installing and removing applets on a Java Card. The global platform manager is written completely in Java and relies only on the javax.smartcardio package. This makes it easy to integrate the global platform manager into any application that talks to Java Cards.
Both the host driver of the performance and test applet and the host driver of the OV-chip 2.0 applet are now capable of (re-)installing the applet. I also contributed to the global platform manager: It can now talk to the jcop emulator via my TerminalFactory.JcopEmulator provider for the javax.smartcardio package (see also November 26th).
February 16th: The OV-Chip 2.0 Hacker Guide
The javadoc API documentation still makes progress. For somebody who wants to get involved in the project the API documentation is, however, a bit too low level. I therefore decided to write some more high-level documentation about the design decisions that have been taken in our sources. It will be called OV-Chip 2.0 Hacker Guide.
I want the hacker guide to be available in a printable version as well as in an online html version. Therefore I decided to use gosh from Norman Feske for type setting the hackers guide. (gosh stands for gold out of sh** and translates Usenet style ASCII text into latex and html.)
February 2nd: Generated documentation
Writing documentation takes time. Sometimes I have the impression that I need more time for the documentation than I needed to write the code. Now essential parts our Bignat library (see August 20th and October 31st) are documented but not much more.
Big portions of the Java code are generated by our IDL compiler (see November 26th and September 25th). I let the IDL compiler now generate javadoc documentation comments too. This is not ideal for all items, especially for variable declarations that are taken over from the IDL sources the generated documentation does not say much at the moment. However, for many other items the generated documentation is completely adequate. To make it perfect the IDL compiler should be able to copy documentation from the input to the output. However, this feature has to wait on the wish list for the foreseeable future.
Counting the generated documentation as well 893 out of 1876 documentation items have been written now. That makes 47.6%.
January 27th: Performance of new NXP cards
Through our contacts with NXP I obtained some new Java Card that even implements the 2.2.2 API (compare the December 10th entry). The first tests showed that the card does however not implement the desired javacardx.framework.math package. Nevertheless, the new card is about 25%-30% faster than the JCOP 4.1 cards I am usually working with. However, this new card does not have a contact interface. Therefore the old cards still outperform the new one over their contact interface.
In the above chart you see the timings for Montgomery multiplication and exponents (the latter via RSA encryption). For both you see the timings for the new NXP card over the contact-less interface and for an older JCOP 4.1 card over the wired and the wireless interface. The new card is faster than the old card over the contact less interface. However, over the contact interface the old card is still faster.
January 19th: 1867 documentation items
Until last week some unstructured comments were the only available documentation of the developed code. I started now to write structured javadoc documentation. The documentation is targeted at people that are going to further develop the code, therefore not only the public API must be documented but everything. For each package, for each class, for each method and field declaration a structured javadoc comment must be written. Altogether about 1800 documentation items. Up to now 150 have been finished, that is about 8%.
January 8th: Demonstrator II
The demonstrator has a reasonable set of features now:
- logistic services imitation
- create, load and save system keys and parameters
- adjust key size and attribute numbers
- applet installation, deletion
- applet personalization (download keys, attributes and system parameters to the applet)
- service automaton imitation
- run the resign protocol (check signature on old attribute expression, update attributes and blinding, generate a new signature for the new attribute expression)
- entry gate imitation
- run entry gate protocol (check signature and check knowledge of all attributes with a zero-knowledge proof)
The demonstrator has various rough edges and limitation. One of the biggest problems is that some long running actions are done in the GUI thread and that the GUI therefore sometimes locks up for up to 3 minutes. It also uses external tools for applet deletion and installation, which creates some strange card communication problems. At the moment only those who know the implementation can use the demonstrator.
Here is a screen shot of the logistic services tab together with the configuration window.
December 19th: Demonstrator GUI
Until now all the test frames have been command-line test frames, meaning that these are non-interactive programs to be run in a terminal window that you can only control via command-line options.
For giving more impressing demonstrations I started now to program a graphical demonstrator for our OV applet and the various functions in the host-driver code. It is going to be just one application, where in the application you can choose whether the demonstrator should mimic an entry gate or a service automaton, etc.
I wanted to benefit from a GUI builder, therefore I decided in favor of the netbeans IDE and the javax.swing package. GUI building is a completely different experience than programming crypto algorithms. The amount of mouse clicks that are necessary even for simple GUI's is tremendous. One should probably use a drawing tablet or a touch-screen for controlling the GUI builder.
Here is a screen shot from the GUI builder.
December 10th: 2 Minutes waiting at the entry: An Analysis
The performance results I reported yesterday are really disappointing. We have a transaction time of 100 seconds for the entry gate for a completely insufficient key length and where important features are still missing! This shows that our current approach will now yield a usable prototype. What's wrong then?
The computation time on the Java Cards is dominated by the computation of multi exponents, that is, by expressions of the form a_1^b_1 * ... * a_n ^ b_n, where n is the number of attributes the card possesses. These multi exponents are now computed in the following way: The single exponents a_i^ b_i are computed by abusing RSA encryption. This takes only about 50 milliseconds per exponent, see November 14th. The single exponents are multiplied together with our own implementation of Montgomery multiplication on the card. Every such multiplication takes about 5 seconds, see September 5th. Exponentiation is so much faster because we use a native method of the card vendor for it, which in turn uses the crypto coprocessor of the card. In contrast our Montgomery multiplication runs completely in java bytecode on the card. The performance would be much worse, if we would compute the multi exponent purely in Java, see November 10th.
Our main dilemma here is that we can only use the public Java Card 2.2.1 interface and this interface provides no support for implementing new crypto protocols. The card has methods for particular algorithms, for instance RSA or Diffie-Hellmann. However, the big integer operations that are needed for these algorithms are completely inaccessible. In particular, there is no big integer multiplication in Java Card 2.2.1. From the timings of the exponent we guess that the crypto coprocessor on the card needs about 1 millisecond to compute a modular product.
Java Card 2.2.2 has big integer multiplication in the optional javacardx.framework.math package. However, it is almost impossible to find cards implementing the 2.2.2 standard. So far we only found one 2.2.2 card (the Athena IDProtect), but the friendly customer service told us that this card does not implement the optional javacard.framework.math package.
We guess that with access to the crypto coprocessor transaction times below one second would be possible for 1000 bit RSA keys.
December 9th: First OV applet runs on real cards
Our approach of creating our own test frame for our applet code really paid off now. It was only a minor step to get the applet code running on a real card. Apart from a view hiccups the applet is running fine on real cards. I tend to blame transient card failures for these hiccups, because they do not occur in the test frame. I noticed these problems already when doing performance testing for multi-exponents. There, I got now and then a wrong result from the card. However, when feeding exactly the same data to the card again, it computed the right result.
For proving and resigning I measured the following transaction times.
The measurement starts at 512 bits, because the RSA cipher on the card does not support shorter RSA keys. Update: After the 2 minutes timeout disappeared measurements up to a key size of 1952 bits were possible. Note that the y-axis now goes up to 25 minutes.
With more features implemented the timings will improve a bit, because the card will usually disclose some of its attributes and only proving knowledge of the remaining ones. Therefore the prove protocol will be run for two or three bases and not for four. For practical purposes the transaction time on a gate (and therefore of the prove protocol) should be below 200 milliseconds. This is clearly out of reach.
Because having the first OV applet running on a real card is an important milestone, I summarize what has been implemented.
- Applet and host code for initialization, card issuing, proving and resigning
- Initialization computes all the necessary parameters, most notably the modulus n, product of two large primes, and the basis g_i. The bit size of the key and the number of bases (and therefore attributes a card can hold) is controlled via parameters.
- Issuing initializes the card with all necessary data, including its attributes (which are currently chosen at random). Issuing contains a resigning step, because the blinding of the card is initialized to 1.
- Resigning: The card processes the attribute updates it receives, chooses a new blinding, computes its new blinded attribute expression A' and obtains a signature on it.
Currently missing are the following features
- authentication of the host before the protocol starts
- exit ticket checking
- validity checking of attribute updates on the card
- blacklist checking on the host
- Proving: The card presents its blinded attribute expression, the host checks the certificate and the card proofs knowledge of all its attributes without revealing them. Currently missing features:
Computations on the card are done with our Bignat library. It (ab-)uses the RSA cipher of the card to compute exponents and features allocation free Montgomery multiplication. For division the schoolbook algorithm is used.
Eventually, the proof protocol will form the heart of the interaction at an entry gate. For the full entry gate functionality the following features are missing:
- entry and exit tickets
- time stamps
- authentication of the gate
- selective disclosure of some attributes of the card, depending on the gate, and proving knowledge of the remaining attribute values.
Currently completely missing are the following parts:
- back office with trip data base
- protocol for exit gates
- protocol for conductors
- protocol(s) for card owners
- travel logs on the card
With the above measurements it is clear that with current Java Card technology we cannot develop a working replacement for the Dutch OV Chipkaard. Probably not all of the above points will get implemented until my contract finishes in June next year.
December 3rd: First OV applet runs in the test frame
The code for proving knowledge of attributes and for obtaining new signatures runs fine in the testframe. This is the core functionality for the protocols that run at an entrance gate and in ticket machines.
November 26th: Progress and multi-exponents via RSA encryption
For two weeks I have been focussing on the implementation of the OV-chip 2.0 applet. Already in our first applet exercise it paid off to be able to run big portions of the applet code on a normal Java virtual machine (JVM), because this simplifies debugging a lot. I therefore designed an environment that permits me to run applets and host drivers, which would normally communicate via my custom RMI layer, together on a normal JVM. The most important step for this was to extend the IDL compiler to additionally generate what I call the test stub code. The normal stub code, that the IDL compiler is already generating for quite some time, performs some data-type conversions, transfers then data to a Java Card and, finally, invokes a method there. In contrast, the test stub code calls the relevant methods of the applet directly after the data-type conversion on the same JVM. The API of both stubs is the same, such that by changing one symbol the same code will communicate with a real applet.
From the OV-chip 2.0 applet I finished the initialization and the basic protocol for proving knowledge of attributes. The latter will form the core of the communication at an entry gate. Both, the initialization and the prove protocol run fine in the just described test frame.
Other achievements worth mentioning are
- Support for the jcop emulator via the smartcardio library. (This was necessary because the two SUN emulators do not support the RSA_NOPAD cipher.)
- Small extensions of the Bignat library: addition, random number generation and division of numbers of different size.
RSA multi-exponents measurements
I was further able to finish the last measurements.
The plot shows the time it takes to compute a multi-exponent exploiting RSA encryption. For a fixed modulus I measured the time for 10 randomly chosen sets of bases and exponents. For unknown reasons the first measurement took always about 4 seconds longer. The very first measurement took even 10 seconds longer.
From the plots below we know that it takes only a fraction of a second to compute one exponent. The 20 seconds that I measured for 5 bases of 512 bits is therefore almost completely spent in the multiplications. For 5 bases there are 5 multiplications necessary, 4 for multiplying the single exponents and a final one with a Montgomery correction factor to get the right result out of this chained Montgomery multiplication.
Recently we learned that the JCOP cards, we are currently using, have hardware support for big-integer multiplication but not for exponentiation. Given the measurements below, I would estimate that this hardware big-integer multiplication takes about one Millisecond. If we had access to the right interfaces it should therefore be possible to compute a multi exponent in about 2 seconds for 1952 bit bases and in about 0.4 seconds for 512 bit bases.
November 14th: modPow via RSA encryption
Here are the measurements for computing modular exponents (modPow in java.math.BigInteger parlance) via the RSA encryption on the card. The Java Card API offers quite a bit of cryptographic algorithms. Apart from the symmetric block ciphers they often rely on modular big integer arithmetic. However, the Java Card API is very high-level. It provides, for instance, a method for computing a DSA signature, but one cannot control the message digest, the random numbers or the padding that takes part in the computation. So although there is a big integer multiplication inside the DSA-signature algorithm, there is no way to get direct access to it. The only exceptions I am aware of are the RSA cipher without padding (javacardx.crypto.cipher.ALG_RSA_NOPAD), which can be abused to compute modular exponents, and the Diffie-Hellmann key exchange (javacard.security.KeyAgreement.ALG_EC_SVDP_DH), which can probably be used to compute scalar multiples of elliptic curve points.
We have reported about computing exponents via RSA encryption already on September 12. The measurements here show both, the time for the public key encryption and the time for computing the whole exponent. (The latter takes noticeably longer because before encrypting the base one has to copy the exponent into a key and initialize the cipher with that key.) Further I chose the exponent length according to Lenstras' estimations (see November 3rd). What we see, is that the computation time is much shorter for small exponents. Further, key and cipher initialization has a constant overhead of about 50 milliseconds.
It is not clear why the last violet point (for a 1952 bit base) lies so far above.
In order to investigate the differences between these measurements and the ones from September I also measured the computation time for exponents of a constant fixed size.
The measurement for 17 bit exponents corresponds to the popular choice of 65537 as public RSA exponent. Contrary to what I guessed before, also for a fixed exponent size the computation time depends on the size of the base. Further, the curves are very smooth, suggesting that the card cannot multiply large big integers (of, for instance, 512 bit) in hardware.
The curves for small exponents (17, 104, 200 bits) only start at 512 bits, because the RSA cipher does not support key sizes below 512 bits. For the larger exponents the curves already start at a base size of 16 bits. In order to have an exponent of 512 bits for a 16 bit base one has to choose a key size of 512 bits and fill the 16 bit base with zeros. Thereby the size of the modulus is always equal to the key size, because otherwise the card crashes (yes, it is hanging up, not even giving an exception). Therefore (in case of 512 bit exponents) we are rediculously forced to take a 512 bit modulus for a 16 bit base. This also explains the long computation times for small bases and why the computation time only increases when the base gets longer than the exponent.
There is still quite a gap in the measurements: In September it took about 3.5 seconds to a power with 1952 base and exponent. Now it is taking less than 2.5 seconds. It might be, that back in September I used EEPROM for the result, while I am using RAM now. Although this should not make such a big difference, unless the card is abusing the result byte array as temporary storage. There are, however, more important things to do than solving this puzzle.
November 10th: Multi-exponent Performance
Finally the performance measurements for computing multi-exponents are available.
Simultaneous repeated squaring with precomputed factors means that I compute the whole multi-exponent at once. For instance, I compute a^6 * b^5 * c^2 as ((1 * ab)^2 * ac)^2 * b. For an exponent of n bits this requires n-1 squarings and about n multiplications. As you can see in the example, one needs certain products of the bases, here ab, ac, and b. Because the bases are constant in our application I can precompute these products. Note, that with this method it does not really matter if you compute a multi-exponent with 3 or 5 bases. Indeed the measurement shows that 5 bases are only inslightly slower than 3 bases.
The computation time depends very much on the size of the exponent. For these measurements the size of the exponent is determined from the base size according to the equations that I extracted from Lenstra's Key Length article (see next entry from November 3rd). The exponent size used is plotted with the yellow dots (to be read off at the right y-axis). RSA moduli below 40 bits have a negative security level (not sure what that means), therefore I would get a negative exponent size for such short RSA moduli. To measure something I use 2 bit exponents in these cases.
Lenstra's estimations do for sure not make much sense for such small key sizes, but they are the only way to determine the exponent size. According to Lenstra's estimations, an RSA modulus of size 240 provides you with security level 28 in 2009, that is, such a modulus is 250 million times easier to break than a 56 bit DES key in 2009. Overstretching those estimations tells us that a device that can break a 240 bit RSA modulus in one second costs 5 US cents. So the abacus that we bought last week for 20 Euro cent was totally overpriced.
November 3rd: Key length estimation
In his book Rethinking Public Key Infrastructures and Digital Certificates: Building in Privacy Stefan Brands discusses two kinds of protocols for selective disclosure: One is based on discrete logarithms (DL versions) and the other one on the RSA problem (RSA versions). For the new OV Chip protocols we currently focus on the RSA versions, because they are slightly more flexible.
The RSA versions in fact also depend on the discrete logarithm problem. We therefore have to choose a modulus that is big enough to make the RSA problem difficult and a public exponent, called v in this context, that makes the discrete logarithm problem difficult. In his paper Key Length Arjen K. Lenstra presents formulas to estimate the strength of the parameters of various cryptographic algorithms. From there it follows that the security level of a given RSA modulus decreases over time, because there is steady progress on factoring algorithms. The security level, expressed in bits, is a measure for the amount of work necessary to break a certain key. Single DES, for instance, provides a security level of 56, meaning that about 2^(56 - 1) operations are necessary to break a single DES key. Because of Moore's law one has to increase the security level about every 18 month in order to maintain the same security. It is for instance estimated that 40 Million Dollar-days are necessary to break a key of security level 74 in the year 2009. The number of 40 Million Dollar-days means with equipment for 40/x Million Dollars one can break such a key in x days.
The security level an RSA modulus provides is a bit difficult to estimate, because of the subexponential complexity of factoring and because of the steady advance in factoring algorithms. It is estimated that a 2048 bit modulus provides security level 100 in 2009 and level 87 in 2030, where level 87 roughly provides the 40 Million Dollar-days security in 2030.
For the discrete logarithm problem, Lenstra does not assume algorithmic progress and a key of length n roughly provides security level n/2. It follows, that for Brands selective disclosure protocols in 2009 a 2048 bit modulus should be paired with a 200 bit public exponent v, while in 2030 a 175 bit v would be adequate. To have 40 Million Dollar-days security up to 2030 one should therefore use a 2048 bit modulus and a 175 bit exponent. For the same security up to 2009 an 1120 bit modulus and a 148 bit exponent should suffice. See also www.keylength.org.
We integrated the formulas for key-length estimation in our performance tests and can therefore now provide estimates for the running time a given security level requires. Nice gnuplot charts are being generated now, however, the measurements takes several days.
October 31st: Bignat performance
In order to compare the performance of our Bignat library to other big integer libraries, we changed the Bignat library so that the underlying type used for a single digit can be chosen at compile time. Of course for Java Card there is little choice: One has to use byte for a digit and short for a double digit. But for the test frame one can now select int for a digit and long for a double digit (as all other big integer libraries do). After fixing a stupid measurement bug it turned out that our Bignat library is twice as fast as the standard BigInteger class. At least for the operation we are interested in: Computing multi-exponents for bases of 2000 bits and exponents of 200 bits. More precisely, for computing a multi exponent of 5 bases with 2048 bit base numbers and 204 bit exponents Bignat needs 29 milliseconds and BigInteger needs 64 milliseconds (averaged over 1000 runs).
Note however, that the comparison is highly unfair. Bignat has a special multi exponent method that uses simultaneous squaring, while BigInteger has to compute every exponent separately. Further, for Bignat, montgomerization and the computation of a table containing all possible base products is done beforehand, while Bignat computes such kind of data on the fly. Even though the comparison is unfair it shows that Bignat is a decently implemented big integer library.
With the possibility of precise measurements we also tried several optimizations. Most notably a special squaring method that saves about half of the single digit multiplications (see for instance algorithm 14.16 in Chapter 14 of The Handbook of Applied Cryptography. However, it turned out, that all these optimizations have no effect or slow down our code.
October 21st: Progress without performance charts
We have made some progress in the past that has not been reported here, because the corresponding performance measurements are not ready yet. The performance measurements show up nicely on this page, but they are not our main goal. Therefore I only do them if it fits my schedule. Further, some measurements take several hours, which easily explains the backlog.
The following points have been implemented and tested:
- Different Bignat sizes
- Although the sizes of our Bignat big integers is fixed, we need different sizes. For instance 2000 bit numbers for bases and the RSA modulus and 200 bit numbers for attributes. The necessary adoptions have now been done. The size is now specified in the constructor and stays constant for the live time of the Bignat.
- Bignat integration of RSA exponentiation
- Computing exponents via RSA public key encryption (using Cipher.ALG_RSA_NOPAD) has been integrated into our Bignat library. On this occasion I noticed that both SUN emulators, jcwde and cref, do not provide the ALG_RSA_NOPAD cipher. Therefore I can't use these emulators any more and I am forced to use the jcop emulator again.
- multi exponents
- A multi exponent is a number of the form a_1^b_1 * a_2^b_2 * ... * a_n^b_n (mod m). The blinded attribute expressions that appear in our protocols are such multi exponents and some of them must be computed in the card. One can compute multi exponents using simultaneous squaring with a pre-computed list of all possible products of the bases a_i. The pre-computation of the factors means that the computation time of the multi-exponent barely depends on the number n of bases. However, because big integer multiplication is so slow on the card, it takes more than 60 seconds to compute a multi exponent for bases of 250 bits and exponents of 25 bits.
- RSA multi exponents
- An alternative for computing multi-exponents is to use RSA exponentiation. One still has to multiply the different exponents a_i^b_i, though. RSA exponentiation is almost a constant operation, therefore in this case the computation time depends very much on n. It takes about 40 seconds to compute a multi exponent with n=3 bases, 900 bits base numbers and 90 bits exponents.
I'll add the performance measurements for RSA exponentiation and multi exponents when they are ready. Further, already in the beginning of September I made some measurements about data transmission rates to and back from the card. These measurements indicate that the communication alone for proving knowledge of an attribute expression (which is the main part of our protocols) takes about 500 ms. However, for these measurements the data sent to the card is written into EEPROM there. Writing into RAM should be faster, but we cannot easily adopt the measurements. Simply allocating the relevant buffers in RAM fails, because the buffers used in these tests are with 20KBytes ten times larger than the available RAM (about 1900 Bytes).
October 10th: Independence of Jcop tools and libraries
Until now we worked with the jcop emulator and the jcop libraries for making connections to the card and sending/receiving data. I don't know the full story, but now the jcop tools are no longer available. Therefore we decided to switch to the smartcardio library and to use the cref and jcwde emulators from SUN.
Switching to the smartcardio library was relatively easy and simplified our code, because the smartcardio library provides a higher level of abstraction, especially for APDU's.
Using the emulators from SUN was much more difficult. The smartcardio library cannot connect to the emulators. SUN provides the apdutool for talking to the emulators, but there is no hint anywhere, on how to connect to an emulator from an ordinary java host application without using apdutool. Only when I started to decipher the raw communication between apdutool and one of the emulators I found out that the apduio package (apduio package API) provides a programmable interface for making connections to cref or jcwde.
The two libraries smartcardio and apduio are of course completely incompatible, it looks like they come from two different planets (instead of from one company). There is however the possibility to extend the smartcardio library with a new service provider via TerminalFactorySpi. Only after finishing this exercise I found out that it has been done before, here in the group, for the JMRTD project on a free implementation of the electronic passport.
There was a last issue to cope with: Similar to the jcop emulator cref and jcwde communicate over a socket with the host application (or apdutool). But different from the jcop emulator, cref and jcwde exit when the host application closes the connection (jcwde is even dying with a null pointer exception). This means that one cannot separate applet installation and applet usage with the emulators. Further one cannot simulate multiple uses of a card. (Well for cref one can save the state in an eeprom mask file via -i and -o, but one still has to restart it.) I therefore decided to write a wrapper that forwards traffic to either cref or jcwde but keeps the connection to them open.
When writing this note I noticed that the Development Kit User's Guide, the Application Programming Notes, and apduio API seem not to be available on the internet (you get them for free included in the java development kit). I therefore set up a page with selected java card documentation.
September 25th: Tools to make the live easier II
While I am busy with implementing multi exponents via simultaneous repeated squaring I had to write the 13th Card RPC description. Although I have already blueprints for the description and the host stub code, making a new protocol was still tedious and time consuming.
I therefore decided to write a primitive IDL compiler, which takes the signatures of the methods to be invoked on the card (and some additional information) and then generates both, the Card RPC description and the host stub code. In addition the IDL compiler takes care of
- declaration and initialization of the buffer variables on the card
- simple data type translations in the stub code, like accepting/returning a regular BigInteger in the stub method while sending/receiving a Bignat.
I figured here documents and easy regular expression matching would simplify the IDL compiler, therefore I decided to implement it in perl. However, nested data structures are a pain in perl. It took me a while to get my arrays of hashes of arrays right. Now, after the exercise I am not so convinced perl was the right choice.
September 12th: Huge exponents in under 4 seconds
The measurements below for multiplication show that it is hopeless to do exponentiation for big numbers on the card. However, exploiting the crypto interface of Java Card API one can compute exponents much faster. In our first test we used an RSA public key with the ALG_RSA_NOPAD cipher. That is, in order to compute a^b (mod m) we encrypt a with a public RSA key consisting of b and m. There are some restrictions however
- first, for a given RSA key size the first byte of the modulus cannot be zero (otherwise both the jcop emulator and the cards crash)
- and second, accepted RSA key sizes are between 512 and 1952 bits, with 32 bit increments
Therefore, this method cannot be used if the modulus is, for instance, 300 or 600 bits long. There seem to be no restrictions on a or b, in particular one can compute the remainder by setting b = 1.
We measured our application case. That is the modulus m is constant, while a and b change. Therefore the measurement include copying b into the RSA key on the card and initializing the cipher with the new key in addition to encrypting a. It looks like encryption is more-or-less a constant operation and the time increase is caused by the key initialization.
Division of numbers of equal size
As pointed out before, division of equally sized numbers is much faster. Here are the charts.
For the measurements with the wired interface (red pluses) one can clearly see two lines. In the lower line the dividend is smaller than the divisor, requiring no subtraction round. In the upper line the dividend is bigger, requiring one subtraction round. Then there are five red runaway points at 128, 800, 960, 992 and 1312 bits. The latter four are more-or-less on one line. For those two subtraction rounds were required. The point at 128 bit shows a peculiarity of our division algorithm: To avoid overflow we make a conservative estimation of the multiple to subtract in each division round. For the runaway point at 128 bits the estimation was too small, requiring an additional, third subtraction round.
The green data for the wireless interface also shows two lines (for zero and one subtraction round) and some other point for two and three subtraction rounds. It's totally unclear, why there are breaks in the second green line. It looks like the card takes a rest of 50 milliseconds after computing for some time.
September 9th: 25 seconds for a 2048 bit division
We have now also performance numbers for our simple division algorithm, see the charts below. The charts for Montgomery multiplication have been updated. Montgomery multiplication contains one final division, but this does not mean that half of the time of a multiplication is spent in a division. Inside Montgomery multiplication numbers of equal size are divided (which is of course much faster).
The running time of division depends quite a bit on the arguments. We measured therefore 10 random samples for each size. The timings include making a copy of the dividend into transient memory (read RAM), where the division is carried out later.
September 5th: 50 seconds for a 2048 bit multiplication
The Bignat library has been ported to the card, making it possible to do some performance measurements. I only measured Montgomery multiplication now. The results are catastrophic: It takes about 50 seconds to multiply two 2048 bit numbers. For the RSA flavor of the protocols we need to compute two exponential expressions. In both cases the exponent needs to be about 200 bits, requiring about 800 multiplication in total. This will then take about 11 hours.
For the DL flavor it looks a slightly better. There numbers and exponents need to be about 300 bits long. A 300 bit multiplication takes about 1 second, requiring only 20 minutes in total.
In case you haven't guessed it yourself: The crosses in the pictures are the measurements and the curves are a quadratic approximation. I intended to measure up to 2048 bits but when the numbers are getting bigger I am getting strange errors. Update: There was a hard time out for any APDU command at 2 minutes which went away someday.
September 1st: RPC for cards
Invoking a method on a Java Card is rather cumbersome:
- First the arguments have to be converted into a byte array
- The serialised arguments are assembled into an APDU that is sent to the card.
- On the card the APDU command and the arguments have to be decoded, before invoking the right method.
- Same procedure again for sending the results back in a response APDU.
The Issue is further complicated by the following facts.
- APDU commands can only hold 255 bytes of data. For bigger arguments one has to chain several APDU's. Similarly for response APDU's.
- The card operating system might give the data to the applet in even smaller chunks, depending on the size of the APDU buffer.
In order to solve the data de- and encoding once and for all we designed and implemented some kind of RPC infrastructure for Java Cards.
- For a method to be called on the card, its arguments and results are described in a data type, called Protocol_step.
- Several such method calls or steps can be chained together in a Protocol.
- A general layer on the card and on the host side decodes and encodes arguments and results according to the description and calls the right method on the card.
- It works for an arbitrary number of arguments and results of arbitrary type (as long it is less the 32KByte), automatically using several APDU's for transferring arguments and results, if necessary.
August 28th: Tools to make the live easier
There is the java card converter. It must be run on the compiled class files to produce an applet, which can then be loaded onto a card. The error messages of the converter are specifically encrypted: They don't even contain the source file they refer to.
I wrote a little tool today that converts the error messages of the converter into a sensible format that can be parsed by Emacs, such that M-x next-error brings me to the right location. The tool is written in Ocaml. Couldn't resist.
August 22nd: Montgomery multiplication
Montgomery multiplication has been added to the Bignat library. Montgomery multiplication is a special algorithm for computing x * y (mod m) without the need of a double sized intermediate result.
August 20th: Bignat library for Java Card
There is no freely available big integer library for Java Card. For the cryptographic primitives we need however exponentiation of 2000 bit numbers. We therefore started our own big integer library. It will be called Bignat, because it will not support negative numbers. It will have the following features:
- it will be allocation free, except for the constructor
- no negative numbers
- the size of the big integers will be fixed
- the library will be developed and tested in a standard java environment and later, with some cpp magic, ported to Java Card.
Division and modulo is running fine already.
July 27th: Schnorr identification applet finished
The applet and a host driver implementing Schnorr's identification protocol have been finished. The applet runs both in the jcop emulator and on a real Java Card and acts as a Prover, proving it's identification to the host driver, using Schnorr's zero knowledge proof. The host driver, running on the computer and talking to the card reader, thus implements the verifier.
In addition to the prover applet a cheater applet has been developed as well. The cheater tries to prove a certain identity without knowing the necessary private keys simply by guessing the right answer. Because all arithmetics is done in shorts the chances of the Cheater are with 1/89 rather good.
Thanks to cpp we can compile the applet and a testframe from the same sources. Inside the testframe there are running the same classes that are used inside the applet and the host driver to implement the prover and the verifier. The testframe is a standard java program, which therefore permits easy debugging of the applet code.
July 18th: First minimal applet
Today we completed a first, minimal applet that runs both in the jcop emulator and on a real java card.
July 16th: Boosting Java with cpp
Our implementation language will be cpp + Java. The reason for that are twofold:
- Some classes are needed in the card as well as in the host driver talking to the card. We are not going to duplicate the source files of those classes only in order to change the package declaration at the beginning. Instead an old-fashioned but powerful #ifdef will take care of the problem.
- Java doesn't provide the possibility to implement a given interface with two different classes and choosing between them at compile or program start time. Java provides interfaces and abstract classes for this purpose, but neither can contain Constructors or static members.
Compilation consists then of the following steps:
- preprocessing the real source with cpp
- moving the line directives into comments with sed
- compiling with javac
The Makefile looks a bit complex, but that's far better then duplicated source files.
July 11th: First testframe version of Schnorr's identification protocol
As a starting excercise we implemented Schnorr's identification protocol. The first version, completed today, is a test frame connecting a class implementing the prover and one implementing the verifier in one program. All computations are currently done in shorts (16 bit integers) because short is the biggest data type on Java Card. Because of the short limitation the security parameter is basically non-existent.
July 1st: Hendrik Tews joined the OV-chip 2.0 project
Hendrik Tews joined the OV-chip 2.0 project. He will focus on the implementation of the new protocols on Java Card.