1. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,229
    Ever looked at a webpage source while trying to extract some data/info from it but but noticed it's encoded? This happens a lot on streaming sites for example where they try to hide or protect filehost links. Most of the time it's something really simple like a base64 encoded string. Other times it's something a little more interesting, like this for example:

    Code:
    <a onclick="window.open(pyibuo('8927d4-8927e0-8927e0-8927dc-8927df-8927a6-89279b-89279b-8927e0-8927d4-8927d1-8927e2-8927d5-8927d0-8927d1-8927db-89279a-8927d9-8927d1-89279b-8927dd-8927d8-8927e6-8927d5-8927dd-8927db-8927e3-89279d-8927d3-8927de-8927ce-8927d8'), '_blank');return false;">...</a>
    Oh goody, a puzzle! Looking at this it's instantly obvious that the "pyibuo()" function takes an encoded URL, and returns the decoded URL since "window.open()" is a standard Javascript function that takes an URL as argument. Well then... Let's have a look at this "pyibuo" function:

    Code:
    var _$_e0da = ["", "-", "split", "length", "fromCharCode"];
    
    function pyibuo(_0x14809) {
        var _0x1478D = _$_e0da[0];
        var _0x14711 = 8988524;
        var _0x147CB = _0x14809[_$_e0da[2]](_$_e0da[1]);
        for (i = 0; i < _0x147CB[_$_e0da[3]]; i++) {
            xh = parseInt(_0x147CB[i], 16) - _0x14711;
            var _0x1474F = String[_$_e0da[4]](xh);
            var _0x1478D = _0x1478D + _0x1474F
        };
        return _0x1478D
    }
    Hey look, another puzzle! It's an obfuscated JS function, now what? Well, fortunately this looks like it was either obfuscated by hand or with a tool that is really bad at obfuscating. You'll see stuff like this frequently when looking at webpage sources that are trying to hide or protect data. And just like obfuscation and DRM in client/desktop software it is just as pointless. If you see something like this don't close the browser tab and go "ah fuck it". Don't try to Google for a solution. All you have to do is figure out how the code works, and in the case of obfuscated code you might want to de-obfuscate it into something more readable. First thing you'll want to do is pull the code through a beautifier (which I have already done) so it looks like properly formatted code instead of a long single compressed line. Next thing you'll want to do is copy/paste it into a text editor because you'll obviously want to edit it. As you'll see, find+replace-all does wonders.

    Alright so looking at the code there are a few things that are immediately obvious. There is an array of strings called " _$_e0da" that contains 2 normal strings and 3 strings representing the name of a JS String object member/function. So from that we can assume it's basically a lookup table that helps obfuscate the algorithm. Other than that it's just a bunch of weirdly named vars. The first part of solving this puzzle is to simply start renaming stuff in steps. We'll rename that array to "lookup_table", the function to "decode_url" (because that's what it does) and the function argument to "encoded_url":

    Code:
    var lookup_table = ["", "-", "split", "length", "fromCharCode"];
    
    function decode_url(encoded_url) {
        var _0x1478D = lookup_table[0];
        var _0x14711 = 8988524;
        var _0x147CB = encoded_url[lookup_table[2]](lookup_table[1]);
        for (i = 0; i < _0x147CB[lookup_table[3]]; i++) {
            xh = parseInt(_0x147CB[i], 16) - _0x14711;
            var _0x1474F = String[lookup_table[4]](xh);
            var _0x1478D = _0x1478D + _0x1474F
        };
        return _0x1478D
    }
    As you can see this already makes a huge difference because now it's really easy to tell what the weirdly named vars are. "_0x1478D" gets initialized as an empty string because that's what index 0 of lookup_table returns. "_0x1478D" also gets returned at the end of the function so we know for a fact that it holds the decoded url. Let's rename some more stuff based on what's obvious:

    Code:
    var lookup_table = ["", "-", "split", "length", "fromCharCode"];
    
    function decode_url(encoded_url) {
        var decoded_url = lookup_table[0];
        var seed = 8988524;
        var split_parts = encoded_url[lookup_table[2]](lookup_table[1]);
        for (i = 0; i < split_parts[lookup_table[3]]; i++) {
            xh = parseInt(split_parts[i], 16) - seed;
            var str = String[lookup_table[4]](xh);
            var decoded_url = decoded_url + str
        };
        return decoded_url
    }
    Like magic! This is basically more than enough to figure out what's going on. But for convenience let's replace all usages of lookup_table with the value the index maps to. And also replace index accessors (obj["member"] -> obj.member):

    Code:
    function decode_url(encoded_url)
    {
        var decoded_url = "";
        var seed = 8988524;
        var split_parts = encoded_url.split("-");
        
        for (i = 0; i < split_parts.length; i++)
        {
            xh = parseInt(split_parts[i], 16) - seed;
            var str = String.fromCharCode(xh);
            var decoded_url = decoded_url + str;
        }
        
        return decoded_url;
    }
    Done! See, it's actually pretty easy. Now that it has been fully de-obfuscated the code can be ported to any language and can be used to decode scraped data from that website.

    TL;DR: free wall of text because I was bored and found a puzzle (and some beer) :p.
     
    Last edited: Oct 25, 2017
  2. boyka2

    boyka2 Well-Known Member

    Aug 18, 2014
    97
    nice , i found this code on serie-top too
    but seems not working
     
  3. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,229
    it's because they change the offset (seed) every so often. This was just meant to show what it does. The encoding is very basic: for every character in a url convert the char to a number and add an offset to it, then join the char+offset into a string separated by '-' chars. Because you always know the first 4 characters of a url (http) you can detect the offset automatically by checking the value of the first part. Here's how I do it in Python:

    Code:
    def decode_url(encoded_url: str) -> str:
    
        parts = encoded_url.split('-')
        diff = int(parts[0], 16) - ord('h')
    
        return ''.join(chr(int(part, 16) - diff) for part in parts)
    
     
  4. andersonpeter046

    andersonpeter046 New Member

    Apr 3, 2018
    3
    I also don't understand. It is not working..
     
  5. Hyperz

    Hyperz Well-Known Member Respected

    Feb 8, 2009
    2,229
    What do you not understand? Did you read what I wrote above? I can tell you that it does work because I'm using it right now and have been using it for months.
     

Share This Page