Using Password-based Encryption on Android

Why password-based encryption is needed


There are various reasons why one would want to encrypt data in an Android application: to make sure that files exported to shared storage (SD card, etc.) are not easily accessible to other apps; to encrypt sensitive information (such as authentication information for third-party services) stored by the app or to provide some sort of a DRM-like scheme where content is only accessible to users who own the appropriate key to access it. The Android SDK includes the Java Cryptography Extension (JCE) interfaces that provide easy access to common cryptographic operations, and all mainstream Android devices come with JCE providers that implement current symmetric encryption algorithms such as AES. Thus encrypting application data is fairly easily accomplished in Android by using standard APIs.

However, as in other systems, the harder part is not performing the actual cryptographic operations, but key management. If a key is stored along with the encrypted data, or even as a file private to the application, it is fairly easy to extract it, especially on a rooted device, and decrypt the data. The same is true for keys embedded in the application source code, even if they are somewhat obfuscated.There are generally two solutions to this problem: use a system service to protect the key, or don't store the key on the device at all, and have it entered each time access to protected data is needed. Android does provide a system key chain facility since version 4.0 (ICS), accessible via the KeyChain class. However, as discussed here, it can currently only be used to store RSA private keys and certificates. It is not generic enough to allow secure storage of arbitrary user data, including symmetric encryption keys. That leaves us with the other option: do not store encryption keys on the device. However, symmetric encryption keys are long random strings of bits, and it cannot be expected that someone would actually remember them, let alone enter them using an onscreen keyboard. On the other hand, users are quite familiar with passwords, and thus a way to generate strong cryptographic keys based on a humanly-manageable passwords is needed. There are standard and secure ways to do this, but let's first look at some non-standard, and generally not secure, but quite common ways of producing a key from a password. We will be using AES as the encryption algorithm for all examples, both because it is the current standard and is considered highly secure, and because it is practically the only symmetric algorithm guaranteed to be available on all Android versions. All key derivation methods presented here are implemented in the sample application (screenshot below, source is on github).


How not to generate a key from a password: padded password


Since a symmetric cipher's key has no structure and is just a string of bits with a predefined length, pretty much any string of sufficient length can be used to construct a key. For example, a 16 character password is easily converted to an 128 bit AES key by simply getting the raw bytes of the string in a particular encoding (such as UTF-8). While some implementations will reject known weak keys (such as ones composed only of zero bits), you will get a perfectly valid key, and will be able encrypt and decrypt with it. You might find 'helpful' sample code to achieve this on forums and the like that goes something like this:

int keyLength = 128; 
byte[] keyBytes = new byte[keyLength / 8];
// explicitly fill with zeros
Arrays.fill(keyBytes, (byte) 0x0);

// if password is shorter then key length, it will be zero-padded
// to key length
byte[] passwordBytes = password.getBytes("UTF-8");
int length = passwordeBytes.length < keyBytes.length ? passwordBytes.length
: keyBytes.length;
System.arraycopy(passwordBytes, 0, keyBytes, 0, length);
SecretKey key = new SecretKeySpec(keyBytes, "AES");

Since most people wouldn't pick a 16 character password (let alone a 32 character one for a 256 bit key), the key 'derivation' code makes due with what is available: if the password doesn't have enough characters for a full key, it pads it with zeros bytes (or some other fixed value) to create a valid key. Here's whey this (or variations of it) code generates weak keys:
  • it limits the range of bytes used for the key to those encoding printable characters, thus effectively reducing the key size (out of 256 possible values for a byte, only 95 are printable ASCII characters). While there are 2^128 possible 128 bit AES keys, if only printable characters are used to construct the key, there are about 2^105 possible keys (equivalent to using a 105 bit AES key if such a key were possible).
  • if the password is shorter than the key size, the fixed padding further reduces the key space. For example, if the user picks up an 8-character password, that would result in roughly 2^52 possible keys. That is less even than DES's 56 bit key which has been considered weak for ages and can be brute-forced in less than a day using commercial hardware.
  • since the password is used as is to construct the key, the cost of generating a key 'derived' using this method is practically zero. Thus an attacker can easily generate a bunch of keys based on a list of common passwords and use them for a brute force attack. Since the number of keys (=common passwords) is limited, such an attack is very efficient, and if a poor password has been chosen, more often than not it will succeed.
You might think that no one would use such a naive key derivation scheme, but as it turns out, even fairly popular key manager apps are known to have used it. 

To sum this up: a symmetric encryption key needs to be random to provide sufficient security, and user-entered passwords are a poor source of randomness. Don't use them as is to construct a key.

How not to generate a key from a password: SHA1PRNG

Since, as mentioned above, a key needs to be random, it stands to reason to use a random number generator (RNG) to generate one. There are two flavours of those: "true" random generators that base their output on physical phenomena that are regarded as random (e.g., radioactive decay), and pseudo-random generators (PRNG) whose output is determined by a fairly short initialization value, know as a seed. By using a "truly random" (or close) value as the seed, PRNG's can produce sufficiently random output. To generate a random symmetric key based on a password we can use the password (in some form) to seed a PRNG, and thus produce predictable keys. There are standard key derivation algorithms based on this idea, which we will introduce later, but let's first look at some fairly common derivation code that implements this idea quite literally. You might come across code similar to this on 'code snippet' sites or even StackOverflow:

KeyGenerator kgen = KeyGenerator.getInstance("AES");
SecureRandom sr = SecureRandom.getInstance("SHA1PRNG");
byte[] seed = password.getBytes("UTF-8");
sr.setSeed(seed);
kgen.init(KEY_LENGTH, sr);
SecretKey key = kgen.generateKey();

This creates a random generator instance (SecureRandom) using the SHA1PRNG PRNG algorithm (which is currently the only RNG algorithm available on commercial Android devices), and seeds it with the password bytes. A KeyGenerator is then initialized with the SecureRandom instance, making sure that our password-seeded PRNG will be used when generating keys. Lastly, since a KeyGenerator for a symmetric algorithm simply requests a number of bits equal to the key size from the underlying (or system) RNG, we get a pseudorandom secret key based on the used password.

This scheme is not as bad as the previous one, since it produces a pseudorandom key, and doesn't reduce key size, but it is still not a good idea to use it. The first reason is the same as the last one for the padding method: generating a key is cheap and thus keys based on a password list can be readily generated, facilitating a brute force attack. How cheap: essentially the cost of a SHA-1 hash round, which is generally implemented in native code and is pretty fast. The second reason is that it is neither standard, nor portable. Even the JavaDoc entry for Android's SecureRandom says so: 'Not guaranteed to be compatible with the SHA1PRNG algorithm on the reference implementation.' The code above when run on Android and on a desktop system using Java SE produces the following 128 bit keys from the password string 'password'. Note that those may differ even between different Android platform or Java SE versions:

Android: 80A4495EF27725345AB3AFA08CE3A692
Java SE: 2470C0C06DEE42FD1618BB99005ADCA2

In short: while this method is slightly better than the previous one, it doesn't effectively prevent from brute force attacks and is not portable. Don't use it. Update: As of Android 4.2, the default SHA1PRNG provider is based on OpenSSL and this method doesn't work out of the box. If you need to use it for compatibility reasons, you have to explicitly specify the "Crypto" provider when getting a SecureRandom instance. But again, don't use it.

Proper key derivation: PKCS#5 and PKCS#12


A standard way to derive a symmetric encryption key from a password is defined in PKCS#5 (Public Key Cryptography Standard) published by RSA (the company). It was originally developed for generating DES keys, but the current versions (2.0 and draft of 2.1) extend it to be algorithm independent. Version 2.0 is also published as RFC 2898

The standard is based on two main ideas: using a salt to protect from table-assisted (pre-computed) dictionary attacks (salting) and using a large iteration count to make the key derivation computationally expensive (key stretching). As mentioned above, if a key is directly constructed from a password, it is easy to use pre-generated keys based on a list of common passwords for a brute force attack. By using a random 'salt' (so called because it is used to 'season' the password), multiple keys can be constructed based on the same password, and thus an attacker needs to generate a new key table for each salt value, making pre-computed table attacks much harder. A key point to note is that, while the salt is used along with the password to derive the key, unlike the password, it does not need to be kept secret. Its purpose is only to make a dictionary attack more difficult and it is often stored along with the encrypted data. The other approach applied in PKCS#5 is repeating the key derivation operation multiple times to produce the final key. This has little effect on legitimate use, where only one try is needed to derive the key from the correct password, but considerably slows down brute force attacks which try out multiple passwords in a row. 

PKCS#5 defines two key derivation functions, aptly named PBKDF1 and PBKDF2. PBKDF1 applies a hash function (MD5 or SHA-1) multiple times to the salt and password, feeding the output of each round to next one to produce the final output. The length of the final key is thus bound by the hash function output length (16 bytes for MD5, 20 bytes for SHA-1). PBKDF1 was originally designed for DES and its 16 or 20 byte output was enough to derive both a key (56 bits) and an initialization vector (64 bits) to encrypt in CBC mode. However, since this is not enough for algorithms with longer keys such as 3DES and AES, PBKDF1 shouldn't be used and is only left in the standard for backward compatibility reasons.
PBKDF2 doesn't suffer from the limitations of PBKDF1: it can produce keys of arbitrary length by generating as many blocks as needed to construct the key. To generate each block, a pseudorandom function is repeatedly applied to to the concatenation of the password, salt and block index. The pseudorandom function is configurable, but in practice HMAC-SHA1/256/384/512 are used, with HMAC-SHA1 being the most common. The password is used as the HMAC key and the salt takes the role of the message. Unlike PBKDF1, PBKDF2 doesn't specify how to derive an IV, so a randomly generated one is used.

Android's main JCE provider (Bouncy Castle) currently only supports PBKDF2WithHmacSHA1. Let's see how to use it to encrypt data with a 256 bit AES key derived from a password:

String password  = "password";
int iterationCount = 1000;
int keyLength = 256;
int saltLength = keyLength / 8; // same size as key output

SecureRandom random = new SecureRandom();
byte[] salt = new byte[saltLength];
randomb.nextBytes(salt);
KeySpec keySpec = new PBEKeySpec(password.toCharArray(), salt,
iterationCount, keyLength);
SecretKeyFactory keyFactory = SecretKeyFactory
.getInstance("PBKDF2WithHmacSHA1");
byte[] keyBytes = keyFactory.generateSecret(keySpec).getEncoded();
SecretKey key = new SecretKeySpec(keyBytes, "AES");

Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
byte[] iv = new byte[cipher.getBlockSize());
random.nextBytes(iv);
IvParameterSpec ivParams = new IvParameterSpec(iv);
cipher.init(Cipher.ENCRYPT_MODE, key, ivParams);
byte[] ciphertext = cipher.doFinal(plaintext.getBytes("UTF-8"));

Here we generate a random salt and use 1000 iterations to initialize the SecretKeyFactory which generates our key. The last step of key generation might be a little confusing though: we don't use the SecretKey produced by the factory as is, but use its encoded value to create a new SecretKeySpec object. That is done because the output of generateSecret() is actually a PBEKey instance which does not contain an initialized IV -- the Cipher object expects that from a PBEKey and will throw an exception if it is not present. The iteration count is as recommended by PKCS#5, but that standard was written a while ago, so you might want to increase it. For some perspective, AES 256 bit keys used to encrypt backups in Android 4.0 (ICS) are derived using 10,000 iterations and a 512 bit salt; iOS 4.0 also uses 10,000 iterations. The size of the salt should typically match the key size, for example 16 bytes when using a AES with a 128 bit key (128 / 8 = 16). Next we generate a random IV, initialize the cipher and output the cipher text.

To be able to decrypt the cipher text we need: the password, the iteration count, the salt and the IV. The password will be input by the user, and the iteration count is generally fixed (if you decide to make it variable, you need to store it along with the other parameters), so that leaves the salt and the IV. As discussed above, the salt is not a secret, and neither is the IV. Thus they can be saved along with the cipher text. If they are stored in a single blob/file, some sort of structure is needed to be able the parse it into its components. The sample app 'saves' the encrypted message to a Base64-encoded string and simply concatenates the salt, IV and cipher text delimited by "]" (any character not used Base64 will do). Decryption is very similar to the code above, except that the salt and IV are not generated randomly, but retrieved from the encrypted message.

String[] fields = ciphertext.split("]");
byte[] salt = fromBase64(fields[0]);
byte[] iv = fromBase64(fields[1]);
byte[] cipherBytes = fromBase64(fields[2]);
// as above
SecretKey key = deriveKeyPbkdf2(salt, password);

Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
IvParameterSpec ivParams = new IvParameterSpec(iv);
cipher.init(Cipher.DECRYPT_MODE, key, ivParams);
byte[] plaintext = cipher.doFinal(cipherBytes);
String plainrStr = new String(plaintext , "UTF-8");

Another standard key derivation mechanism is the one defined in PKCS#12. It doesn't appear to have a catchy name like the previous two, and is generally only used for backward compatibility with Microsoft's original PFX format. Like PBKDF2, it can also generate keys and IV's with arbitrary length. The Bouncy Castle provider supports a bunch of variations compatible with AES such as PBEWITHSHA256AND256BITAES-CBC-BC. The IV is generated based on the password and salt, so you don't have to generate and store it separately. The sample app includes a PKCS#12 key derivation mode, refer to the source code if you want to check how the implementation differs from the code above.

Derivation speed

We've mentioned that the first two 'derivation' methods are very fast and thus provide no real protection against table assisted brute force attacks. PKCS#5 and PKCS#12 compliant derivation methods deliberately make the process slower to impede brute force attacks. But what exactly is the speed difference? The following table summarizes the average computation times for the four presented derivation methods. Measurements were performed on a Nexus One (1GHz CPU) using 1000 iterations and a 8 byte salt for both PKCS#5 and PKCS#12.  As you can see, even a relatively small number of iterations matters: iteration based methods are at least an order of magnitude slower, which in this case is a good thing since it makes brute force attacks harder.

Password derivation speed on Nexus One
Padding SHA1PRNG PKCS#12 PBKDF2
< 1 [ms] 32 [ms] 160 [ms] 370 [ms]

Of course, the actual password matters a lot. If it is easily guessable, an attacker can easily find the encryption key, no matter how many iterations you used in your implementation. Thus regular password selection policies apply for password-based encryption (PBE) as well: do not use common dictionary words, mix lower and upper case letters with numbers and symbols. If possible, generate passwords automatically, and do not entrust users with password selection.

Conclusion

Using symmetric encryption on Android is quite straightforward, but since a general purpose, system-level secure storage is not available, key management could be complicated. One solution is not to store keys, but derive them from user-entered passwords. Password strings cannot be used as symmetric encryption keys as is, so some sort of key derivation is required. There are a few ways to derive keys, but most of them are not particularly secure. To ensure encryption keys are both sufficiently random and hard to brute force, you should use standard PBE key derivation methods. Of those, the one currently regarded secure and available on Android is PBKDF2WithHmacSHA1. In short: when deriving a key from a password use PBKDF2WithHmacSHA1, a sufficiently long randomly generated salt and an iteration count suitable for your app.