Finding deals is always a hot topic, especially under current economic conditions. dealsea/bensgargin/dealnews are just samples of hundreds of deal websites that collect sales/deal information online. But the nature of the collection process determines that you are always going to be behind the deal publishers. Is there a way to act before the deal is even published?
Let’s take a look at how Neiman Marcus releases their sales items: admins first updates the inventory, then the website releases/links these items onto the sales page. Those lucky web-surfers catch a glimpse of the 60% off prada bags while refreshing the sales page, and in one second, those deals/steals are gone.
It turns out that with some engineering efforts, it’s possible to grab on deals before they even get published onto the sales page.
Imagine you are one of those Prada fans that are desperate for Prada handbags. Here is what you need to do to get the bags you want:
- Scrape the entire inventory of Neiman Marcus.
Neiman Marcus website is designed in such a way that inventory of each product can be accessed via: http://www.neimanmarcus.com/store/catalog/prod.jhtml?itemId=prodXXXXXXXX
So if you can enumerate all possible product IDs, you can scrape the entire inventory of Neiman Marcus. With cloud computing, it took me less than 8 hours to collect the inventory using 16 nodes.
- Filter out all items but Prada Bags.
From millions of item pages, you can classify the pages by brand, and item type. If you are only interested in Prada handbags, that will narrow down the item list to a couple of hundred items only. Note you want to store available items as well as unavailable ones, as they might turn available with a sales tag and deep cut price.
- Online monitoring all Prada Bags.
Now that you have collected the item list, all you need to do is to constantly check whether there is any deal popping out on these items in your background processes. An email service could come in handy to alert you as soon as your websearch routine returns a bag updated with a sales tag in the inventory.
I know you would ask: how effective is this and how soon do I have to act upon the deals?
After getting to see all the discounted prada bags, what I found out was: it usually took websites over 20 minutes to publish/update their inventory change, since they have millions of products in stock. But even if you are on a single PC, it took only 5 minutes to check through the entire Prada bag list. So you win over on average 15 minutes, which is about enough for you to call your friends, search reviews and make orders!
I actually built a system to do exactly mentioned above, not because I love Prada bags (although I did buy a few after seeing the discount and the price). I was looking for building some interesting internet search applications, and this turns out to be perfect given all the technical elements in it: web crawling, cloud computing, text processing, and web programing.
*ps: Again, I don’t recommend/encourage anyone to do this for commercial purpose. It’s all about getting your friends happy and convincing them IT can do good deeds!