The Second Generation Of Job Aggregators

Julian Stopps wrote an article: “The Second Generation Of Job Aggregators” that made me thinking. What is the next generation of the online publishing and advertising media going to look like (and function like)?

The following are the basics:
1. Web 1.0 – a ‘Classic Job board’, where people go and advertise jobs, and job seekers browse and search.
2. Web 2.0 – A ‘Jobs Aggregator’, a site that displays jobs from different job boards.
3. Web 3.0 – this is the question….

It is strange actually that the modern job boards do not include any job seekers generated contents. That would actually bring a job board to a Web 2.0 Recruitment era. There is obviously an issue in sticking a Wiki on a job board. This actually prevents the ‘Classic Job Board’ to even enter the Web 2.0 group of web applications.

Job Aggregators in Ireland and worldwide, have their fair share of success and problems. If they pass on the queries in the real time to the job boards the speed (and traffic) becomes an issue. If the results are spidered or crawled and stored up front, the possibility of the removed jobs gets higher the older the spidered data is kept.

So a Jobs Aggregator would be ‘perfect’ if it could quickly display the fresh and accurate data from the jobs boards. For that to happen there needs to be a process of the job board ‘pushing’ the new data to the Job Aggregator. The second requirement is for the job board to ‘announce’ any changes like a job updated or a job deleted.


The first part of the technology required actually already exists and is used within most of the blogging engines. There is a facility in the job blogging engines to ‘PING’ other web sites – to announce a new content published. The example is this blog that you are reading, published on the WordPress blogging platform, that can be (and is this case is) set up to PING the Google search engine every time a new blog post is published. What does Google do when it receives a PING notification with the URL of the new post? Google sends it’s crawler, and in most cases within minutes crawls the new page, and includes the new page in the search index. The whole process takes a few minutes only, and that is with a very, very busy Google search engine that crawls the whole Internet.

So imagine a scenario where there would be a jobs web site that would let other sites PING it, and would send a crawler to the site path pinged her, and included the new page in its search index. The process would not take more than a couple of minutes. What that would mean is that the new job site would have the new jobs added to it in almost real time. There would be a couple of minutes lag, and that is more than tolerable. What is really needed for this to happen is that the recruitment sites build in the automatic facility to be able to send a PING when a new job is posted. That as we know is not really that hard since a PING in reality is a pure simple http request containing the URL of the new page advertised.

The next steps?

The technology of updating the jobs and especially removing them does not really exist. Updating might actually be implemented on a recipient side in the sense when a PING request sends a same URL that is already stored in the database, for the crawler to be sent to the originating page gain, and the old record overwritten. The deletion is a totally different story. The whole PING technology just does not support it jet (at least not that I am aware of it?). Perhaps an implementation of the existing PING technology could be used with a small extension, perhaps with a DELETE command sent somewhere in the PING http request.