Debunking the Myth: Base64 Encoding is Not a Security Measure
What is base64?
Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters.
Why is base64?
As with all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the World Wide Web where one of its uses is the ability to embed image files or other binary assets inside textual assets such as HTML and CSS files.
Base64 table from RFC 4648
This is the Base64 alphabet defined in RFC 4648 §4.
Example
The example below uses single unicode letter text for simplicity. The more typical use is to encode binary data (such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.
Here is first letter of Hindi language alphabet.
अ
When the letter is encoded into Base64, it is represented as a byte sequence of 8-bit-padded ASCII characters encoded in MIME's Base64 scheme as follows:
4KSF
In the above letter, the encoded value of अ is 4KSF. Encoded in Unicode, the character अ is stored as the byte values 0xE0 0xA4 and 0x85, which are the 8-bit binary values 11100000, 10100100, and 10000101. These three values are joined together into a 24-bit string, producing 111000001010010010000101. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.
As this example illustrates, Base64 encoding converts three octets into four encoded characters.
Encoding of the source string ⟨अ⟩ in Base64
Applications
Base64 can be used in a variety of contexts:
Embedding Images in HTML/CSS:
- Base64 encoding allows embedding image files directly into HTML or CSS files, reducing the number of HTTP requests.
Email Attachments:
- Base64 is used to encode binary files (like images or documents) as text to be sent as email attachments.
Data URLs:
- Base64 encoding is used to include small data files directly in URLs.
Storing Complex Data in Cookies:
- Base64 can be used to store complex data structures in cookies, which only support text.
Transmitting Binary Data over Text-Based Protocols:
- Base64 encoding is used to transmit binary data over protocols that are designed to handle text, such as JSON or XML.
Encoding Credentials:
- Base64 is used to encode credentials in HTTP Basic Authentication.
Avoiding Special Character Issues:
- Base64 encoding helps avoid issues with special characters in data transmission, ensuring data integrity.
Storing Binary Data in Databases:
- Base64 encoding allows binary data to be stored in text fields of databases.
Usage as security measure
Often we find web pages where query param values are encoded using base64 encoding. If you are passing values like integer object id in query param and using base64 and thinking you have added a layer of security, it is time to to reconsider your approach. Base64 encoding does not provide any security and can be easily decoded. Here are some common alternatives to passing an object ID in the query parameter of an unauthenticated page to enhance security:
UUIDs (Universally Unique Identifiers):
- Use UUIDs instead of sequential IDs. UUIDs are harder to guess and provide a higher level of security through obscurity.
Tokenization:
- Generate a secure token that maps to the object ID on the server side. This token can be a one-time use or have an expiration time.
Hashing:
- Hash the object ID with a secret key using a secure hashing algorithm (e.g., SHA-256). Ensure the hash cannot be easily reversed.
Encryption:
- Encrypt the object ID using a strong encryption algorithm (e.g., AES). Ensure the encryption key is securely managed.
Opaque Tokens:
- Use opaque tokens that do not reveal any information about the object ID. The server can map the token to the actual object ID.
Access Control:
- Implement proper access control mechanisms to ensure only authorized users can access the object, regardless of the identifier used.
Using multiple parameters:
- To render receipt or order details, instead of accepting only one parameter like order ID, ask for a combination of order ID and email ID used while placing the order which will make it slightly harder to access unauthorized details.
By implementing these alternatives, you can enhance the security of your application and protect sensitive data from being exposed.
Common misconceptions
Here are some common misconceptions about Base64 encoding:
Base64 is Encryption:
- Base64 is not an encryption method. It is an encoding scheme that transforms binary data into text. It does not provide any security or confidentiality
Base64 Reduces Data Size:
- Base64 encoding actually increases the size of the data by approximately 33%. It is not a compression method.
Base64 is Secure:
- Base64 does not provide any security features. It is easily reversible and should not be used for protecting sensitive data.
Base64 is Only for Text Data:
- Base64 is used to encode binary data into text, but it is not limited to text data. It is commonly used to encode images, files, and other binary data
Base64 is Efficient for Large Files:
- Base64 is not efficient for large files due to the increase in data size. It is better suited for small to medium-sized data.
Base64 Encoding is Always Necessary:
- Base64 is only necessary when binary data needs to be transmitted over text-based protocols or stored in text-based formats. It is not always required.
Base64 Encoding is Complex:
- Base64 encoding is a straightforward process and can be easily implemented using standard libraries in most programming languages.