What is a non-directly accessible page

Posted at 2016/03/28  23:52:10

session timeout error pageAssuming you are planning a trip with best friend, unfortunately tickets of your favorite flight are sold out. You don’t give up and keep refreshing the airline’s homepage every day. Finally you find a ticket available. You copy the address and send to friend with excitement, but your friend only get an error page saying “Session timeout.” Do you have similar experience? If yes, you met a page that is not directly accessible with URL.

The secret behind is, to query on ticket balance you need submit some data to server, such as departure city/date or flight number. In technique world there are two methods to submit data, one is called GET while the other is called POST.

GET method simply appends your data to URL. It is preferred in most cases because the same data could be shared with others by copying the URL. So a page generated by GET method does be directly accessible by URL. However everything has its downside. GET method should not be the right choice if privacy is an issue.  Well, I am definitely sure you don’t want your password to show up at address bar when someone starts typing the site address that you ever visited several days ago, and there should be people who doesn't want others to know the flight that he is going to take, even though you never mind.

Therefore, it is always POST's stage for applications with privacy to be correctly managed. POST method packs data in a chunk that is invisible to user, and not accessible once page closed. Perfect? The downside is you can neither share query result with friends, nor save it in bookmark. So every time you want to recheck the result, you need start from the input page, fill out the form and click submit button. In more complicated cases it may take 4 or more steps (pages) to reach final result.

Now it is showtime of deep-watch! Most page change monitoring services, whether free or not, do not support POST-generated pages. You paste the URL in the input field, their crawler fetches the page with that URL. So if your friend can't see the result that you are interested in, neither the crawler can. Moreover, an ugly designed crawler may not understand it is an error, and keeps telling you the page never changed.

Deep-watch's crawler acts in smarter way, because it's specially configured by our engineers. Engineers know how to deal with those pages. The crawler will start from input page, submit data in the same way as human being does, and decide what to do according to server response. It can keep interacting with server till desired information acquired. If necessary, it can even hack a CAPTCHA-protected page!