Where the spirit does not
work with the hand,
there is no art.

- Leonardo da Vinci

0 1 0 1 0 5


I came across ionCube's HTML Obfuscator when I was digging for PHP security resources. Granted, they don't go as far as to say that they can encrypt HTML source, but their claim that the obfuscator makes analysis of [one's] HTML pages almost impossible is pretty laughable. Just for fun, I dismantled their obfuscation code the hard way and then built an on-line breaker to demonstrate how to de-obfuscate the easy way.

If you want to follow along, pull up a copy of the sample obfuscation they provide. This is the page that I tore apart.

Trick 1: Disable User Operations

A common technique that deluded Web designers employ to try to protect the content they deliver to end users is to disable as many browser features as possible. This is done by overriding the default handlers for events such as: opening the context menu (right-click), cut and copy operations, and drag and select. The worst offenders will use pop-up dialogs to warn the user that the actions are disabled when they try.

Much of this code works only in IE's implementation (embrace and extend) of the DOM, but even brain-dead IE users can still figure out how to get around this one: just turn off JavaScript! Fortunately, this drastic measure isn't necessary - we can view the source for the page in FireFox by clicking View | Page Source. Have I lost anyone yet?

Trick 2: CR/LF

Another common trick, which surprisingly fools a lot of people, is to put a bunch of whitespace at the beginning of the response body. At first glance, it appears that there is nothing in the HTML source but a commented usage warning with some legal-sounding yet grammatically unsound verbage. I'm trembling.

Of course, it's technically impossible for a browser to magically construct a page with no source code whatsoever. It must be there somewhere. Oops, why is that scrollbar there? At the end the document is our magical content. Even if long-line wrapping is enabled, it appears to occupy only a few lines, starting with a <SCRIPT> tag. Wow, what great compression!

Trick 3: Remove Whitespace

A staple of obfuscation involves removing the whitespace that is absolutely necessary for human readability, but machines couldn't care less about. This obviously only works for code blocks. However, since the entirety of page content has been encoded in JavaScript, the page is essentially one big code block, anyway. This explains why line wrapping does not help us - there is no whitespace for a natural break in flow.

As we will discover, the technique of removing whitespace is employed at every step of the obfuscation. I will take the trouble of adding it back in to make the demonstration a little clearer.

Trick 4: URL Encoding

Copying the full document body into a text editor which forces line wrapping reveals that there is quite a bit of data to work with. Everything of interest takes place within the <SCRIPT> tags, so we can safely discard the comment text and <NOSCRIPT> elements. By pulling apart the code separated by semicolons, we see that the entire script consists of only three statements, shown below: a variable initialization, a call to the eval function, and a call to some as yet unknown function x.


Assuming the designers wouldn't waste huge chunks of data on calls to undefined methods, the definition of x must be contained somewhere in the previous two statements. The only other active function call in the block is the call to eval, so this is the next subject of analysis.

Most Web developers recognize URL encoding when they see it. The preliminary call to unescape confirms that the data to be passed to eval is indeed URL encoded. This is a very wasteful encoding scheme, as it takes three bytes (%64) to encode a single character (@). Each set of three bytes contains a hexadecimal representation of the ASCII value of the original character. Translating the entire block, we now have the following argument for eval (notice the solitary whitespace character after the else keyword):

d="";for(i=0;i<c.length;i++)if(i%3==0)d+="%";else d+=c.charAt(i);eval(unescape(d));d="";

Trick 5: Obfuscating the Obfuscator

We can see that we still have not found the definition of the function x. The designer has placed an additional layer of obfuscation on top of the first layer to hide the underlying decoding function. Arranging the code so it is more legible, we have:

for (i=0; i < c.length; i++)
	if (i % 3 == 0) d += "%";
	else d += c.charAt(i);



At last we are making use of the data in the variable c. Sadly, the for loop does nothing but replace every third character of c with a percent sign and store the result in d. This is another tremendous waste of space, which accomplishes nothing other than obscuring the fact that the data in c is comprised of only hexadecimal digits. Not that this should be surprising, considering we already know the result is going to be run through the eval(unescape()) mill again. Yes, this block does nothing but convert c into another URL-encoded chunk and turn it into more JavaScript. We now have our function x:

function x(x) {
  var l=x.length, b=1024, i, j, r, p=0, s=0, w=0,
      t = Array(63,10,37,11,0,1,6,5,44,47,0,0,0,0,0,
  for (j=Math.ceil(l/b); j>0; j--) {
    for(i=Math.min(l,b); i>0; i--,l--) {
      w |= (t[x.charCodeAt(p++)-48]) << s;
      if (s) {
        r += String.fromCharCode(165 ^ w & 255);
        w >>= 8;
      } else {

Trick 6: Base64 Encoding

At last we have arrived at the final obfuscator. This function somehow converts the raw data supplied by the third statement (remember the original three statements?) into plain HTML and JavaScript. This is clear because of the visible calls to document.write. At least it is not totally obvious at the first pass how this obfuscator works.

The function initializes a host of variables. The string "length" of the function argument x is assigned to the variable l. b takes the value of 1024, the first clue that "binary" or "byte-level" operations will be taking place. Several other variables are declared and/or initialized to 0.

The final variable is a 75-element array, which is a somewhat unusual number given the assumption that we are dealing with base 2 calculations. However, if we examine the data stored in the array we see that every integer from 1 to 63 is represented exactly once. The remainder of the values are 0. Thus we have 64 distinct values, which is exactly 26. We also see the variable s set to 6 toward the end of the function. It is beginning to look like we are dealing with a variant of Base64 encoding, and t is our lookup "table".

The table t is referenced in only one place within the function. It is indexed by subtracting 48 from the ASCII character code of specified elements of the function argument x. We know that these indices must fall in the range 0..74, so our sample character range begins at ASCII 48 (0) and ends at ASCII 132 (z). We can map the ASCII table onto our lookup table as follows:

63,10,37,11,0,      // 01234
1,6,5,44,47,        // 56789
0,0,0,0,0,          // :;<=>
0,48,55,35,19,      // ?@ABC
52,34,33,38,18,     // DEFGH
20,26,12,56,49,     // IJKLM
22,4,17,40,50,      // NOPQR
62,61,60,16,32,     // STUVW
7,9,31,0,0,         // XYZ[\
0,0,43,0,24,        // ]^_`a
57,41,46,45,2,      // bcdef
25,27,13,54,53,     // ghijk
15,39,58,59,8,      // lmnop
36,51,30,21,42,     // qrstu
29,28,14,3,23       // vwxyz

It is easily seen that our 64-character alphabet consists of 0-9, a-z, A-Z, and the special characters (_) and (@). This is precisely what is reflected in the data passed to our deobfuscation function. The only one of these characters that maps to 0 in the lookup table is the number 4.

Trick 7: Bit-Shifting Shell Games

Now that we have determined the nature of the encoding scheme, the final step is to determine how the symbols each representing 6 bits are combined to recreate the original HTML and JavaScript source of the page.

The outer for loop is controlled by the variable j, which is a typical name for a loop controller. The inner loop is controlled by i, which is also typical. Both of these variables are initialized by comparing the length of the argument string l with the constant (because it is never modified) b. It can quickly be seen that these loops break up the parsing of the argument string into 1024 character "blocks"; after every block, the "recovered" data stored in r is written to the document and r is reset. 1024 characters of Base64 alphabet will ideally become 6144 bits of recovered source, or 768 characters. This is the most efficient encoding we have seen yet!

Only one index "pointer" variable is maintained to access data within the encoded argument. Aptly, this variable is called p. Each time an element is accessed, this pointer is increased by one. This means that each symbol representing a set of 6 bits is used in the order in which it appears in the original fully obfuscated source.

The only remaining challenge is to break down the bit-level operations being performed on the raw data to determine how sets of 6 bits are translated into the final document. This process is totally controlled by the variables s and w. In order to get a sense of the behavior of the code, we will examine the values of each of the important variables over several iterations of the for loop.

1g 256
2H <44
3a <!12
4L <!D00
5u <!D426
6h <!DO64
7s <!DOC12
8U <!DOCT00
9U <!DOCT606
10z <!DOCTY54
11l <!DOCTYP02

It should now be clear that for every four characters in the encoding, three bytes of plaintext are recovered. The variable w holds leftover bits which have not yet been used in the reconstruction of the original code. s holds the bit "shifter" which tracks how far the bits of w need to be shifted before combining them with the next 6 bits from x.

Automating Recovery

At this point, now that we know the algorithm used to decode the value passed to the function x, we could simply write a parser in any language to reconstruct the source for an arbitrary page encoded using this scheme. In fact, that is precisely what I did to obtain the values for the variables in the above table, using PHP. However, it is extremely unlikely that the obfuscator will use the same symbol table for every page. This is easily proven by reloading the obfuscated page from the ionCube site and comparing the values passed to the function x. If you have been following along with the examples, you probably already noticed that you have different data than I have been using for this example.

It turns out that there is a much easier way to gain access to the raw source hidden by this obfuscator, or any JavaScript obfuscator that makes use of document.write to output the decoded source: we can simply intercept the function calls and redirect the data to a more convenient location, such as a TEXTAREA element. This is a fantastically easy feat to accomplish, and is demonstrated by the following script. Simply copy the source from any page that uses this obfuscator into the text box, submit the form, and the PHP-generated page places the code into a context that is ready-made to intercept the data output. It's so simple even the script kiddies can figure it out.

. ionbreaker

Hopefully this drives home the point that there is no absolutely no way to implement copy protection. This has been proven time and time again. DVD encryption was broken. X-BOX encryption was broken. Adobe PDF encryption was broken. It is simply impossible to control encrypted content in the client-side environment when you have to give the client a key to unlock the content. Knock yourself out trying, if you want.

[ Next]

[identity] [home] [verbosity]


[ ]