my github
english | espaƱol
Waldo Urribarri HOME PROJECTS ABOUT ME


HTTP requests and how to make an app of a website without an API in Android

In October 2014 I started a project named MomoURU decided to make an app to access the student services system of the Rafael Urdaneta University in my hometown. At the time my wife was studying in that university, which had no easy or fast way to access the student's info, and their webpage is really "rudimentary". The idea was to make an app for her, but after we found that it would be really useful to other students, we decided to launch it on the Play Store.

At the time I didn't know how to do it, but now with a working example I'll show the steps I took. In this example, we'll do a simple app to enter your www.colourlovers.com account.

First step: Do your ColourLovers registration to get your username and password :)

Using the Chrome DevTools we can see a lot of what happens "underneath" a request to a website. If we check the headers of a request and its response, we can see the data used by the browser to "talk" with the server when we log in into our account. The idea would be to replicate this in our app. With this method, we could do an app for "any website", without the need of an API. For this we use the web scraping technique.

Here comes the fun.

In general, the steps to enter the site would be:

- Go to "www.colourlovers.com" with a GET request and obtain the session cookie.

- Login into the website with a POST request using the session cookie.

- After the log in, we obtain the user data (for this we'll enter the Account screen) with a GET request.

If we enter directly to "http://www.colourlovers.com" we'll find something like this:

Colourlovers headers

Using this data we'll create requests from our Android app, obtaining the session cookie and the raw HTML content of the website.

For this we'll write a class named HttpConn, which will maintain our "connection" with the website. Based on the data of the first request and response we create this constants:

private static final int REQUEST_TIMEOUT = 10000;
private static final String ENCODING = "UTF-8";
private static final String ACCEPT = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
private static final String ACCEPT_ENCODING = "gzip, deflate, sdch";
private static final String ACCEPT_LANGUAGE = "en-US,en;q=0.8,es;q=0.6";
private static final String CONNECTION = "keep-alive";
private static final String USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36";
private static final String CONTENT_TYPE = "application/x-www-form-urlencoded";

We create a method for the GET requests.

    public String get(String url) throws ProtocolException, MalformedURLException, IOException {

        // Setup the new connection.
        URL obj = new URL(url);
        HttpURLConnection conn = (HttpURLConnection) obj.openConnection();

        // We need to set up all the params for the request.
        conn.setRequestMethod("GET");
        conn.setUseCaches(false); //no-cache
        conn.setConnectTimeout(REQUEST_TIMEOUT); // In case the URL is unavailable we use this timeout.
        conn.setReadTimeout(REQUEST_TIMEOUT);
        conn.setRequestProperty("Host", host);
        conn.setRequestProperty("User-Agent", USER_AGENT);
        conn.setRequestProperty("Accept", ACCEPT);
        conn.setRequestProperty("Accept-Language", ACCEPT_LANGUAGE);
        conn.setRequestProperty("Accept-Encoding", ACCEPT_ENCODING);
        conn.setRequestProperty("Connection", CONNECTION);

        // This is used to not mess with the cookies already grabbed on a previous request
        if (cookies != null) {
            for (String cookie : cookies) {
                conn.addRequestProperty("Cookie", cookie.split(";", 1)[0]);
            }
        }

        // Check if the response is Gzip encoded
        InputStream rinput = conn.getInputStream();
        BufferedReader in = null;
        List content_encoding = conn.getHeaderFields().get("Content-Encoding");
        if(content_encoding != null) {
            String enc = content_encoding.get(0);
            if(enc.equals("gzip"))
                in = new BufferedReader(new InputStreamReader(new GZIPInputStream(conn.getInputStream()), ENCODING));
            else
                in = new BufferedReader( new InputStreamReader(conn.getInputStream(), ENCODING) );
        } else {
            in = new BufferedReader( new InputStreamReader(conn.getInputStream(), ENCODING) );
        }

        String line;
        StringBuffer response = new StringBuffer();

        // Get whole html response.
        while ((line = in.readLine()) != null) {
            response.append(line);
        }
        in.close();

        // Save the last response code for further checking if needed.
        lastResponseCode = conn.getResponseCode();

        // We store the cookies.
        if (cookies == null) {
            List cooks = conn.getHeaderFields().get("Set-Cookie");
            if(cooks != null)
                cookies = cooks;
        }

        return response.toString();

    }

After the GET request to the main URL, we'll have to log in into our user account. After checking I saw that the website does a request to a second URL to show the content of the FORM where we enter our credentials. The address is "https://www.colourlovers.com/ajax/header-log-in-form?r=http%3A%2F%2Fwww.colourlovers.com%2F". We could possibly replace the first GET with this address, and that will help us having a quicker access (because it has smaller content).

Using again the Chrome DevTools console we can see that when entering the data and clicking Log In, there is a POST request done to the URL "https://www.colourlovers.com/op/log-in/1" passing the following data from the FORM:

r:http%3A%2F%2Fwww.colourlovers.com%2F
userName:USER
userPassword:PASSWORD
x:29
y:12

Here we see that there are two variables X and Y which seem odd. Doing a second log in I got:

r:http%3A%2F%2Fwww.colourlovers.com%2F
userName:USER
userPassword: PASSWORD
x:32
y:12

As I thought, there is data that changes with each request. Usually these would be for security, but in this case they don't matter. Anyways, this values should be on the raw HTML content of the first request, we would just have to find them.

Colourlovers form data

The following is the POST method which will be used for the log in:

    public String post(String url, String referer, String postParams) throws ProtocolException, MalformedURLException, IOException {

        // Setup the new connection.
        URL obj = new URL(url);
        HttpURLConnection conn = (HttpURLConnection) obj.openConnection();

        // We need to set up all the params for the request.
        conn.setConnectTimeout(REQUEST_TIMEOUT);
        conn.setReadTimeout(REQUEST_TIMEOUT);
        conn.setUseCaches(false);
        conn.setRequestMethod("POST");
        conn.setRequestProperty("Host", host);
        conn.setRequestProperty("User-Agent", USER_AGENT);
        conn.setRequestProperty("Accept", ACCEPT);
        conn.setRequestProperty("Accept-Language", ACCEPT_LANGUAGE);
        conn.setRequestProperty("Connection", CONNECTION);
        conn.setRequestProperty("Referer", referer);
        conn.setRequestProperty("Content-Type", CONTENT_TYPE);
        conn.setRequestProperty("Content-Length", Integer.toString(postParams.length()));

        // This is used to not mess with the cookies already grabbed on a previous request
        if (cookies != null) {
            for (String cookie : cookies) {
                conn.addRequestProperty("Cookie", cookie.split(";", 1)[0]);
            }
        }

        conn.setDoOutput(true);
        conn.setDoInput(true);

        // Send post params
        DataOutputStream wr = new DataOutputStream(conn.getOutputStream());
        wr.writeBytes(postParams);
        wr.flush();
        wr.close();

        lastResponseCode = conn.getResponseCode();

        BufferedReader in = new BufferedReader( new InputStreamReader(conn.getInputStream(), ENCODING) );

        String line;
        StringBuffer response = new StringBuffer();

        while ((line = in.readLine()) != null) {
            response.append(line);
        }
        in.close();

        return response.toString();

    }

After doing the log in request, we can access areas just available to a valid user. We'll test it with the URL "http://www.colourlovers.com/account" and then extracting the text we want.

You can see the complete code in my Github.

Final app

With this we've seen how we can have access to a logged in user's data from our app. In MomoURU's case the tasks where to simply process (in a big way) the HTML responses, and that's it. We can have potentially a working app of any site :)


www.000webhost.com