Thursday, July 29, 2010

Mining Jobmine: Part 3. From the Employer's Perspective

Recently, Paul asked the question of what would make his resume more effective. I now ask a very similar question from the employers point of view: What can employers do to make job postings more effective? While AB Testing job postings is not an option for me, it is possible to look at Jobmine data to find attributes of job postings that correlate with number of applications.

Keep job postings short

There is a negative correlation between the number of words in a job posting, and application rate. This correlation is very small, but still statistically significant. Below is a smoothed scatter plot of words per job postings vs. applications (darker colours mean a denser packing of points), with a curve of best fit [1].

The curve implies a loss of about one application for every 50-60 words added. Again, the decrease is slight, and the length of a job description explains very little of the variation amongst application rates. This is not surprising: many factors affect application rate of a job, such as the actual job, and we expect the effect of the length of a job description to be minor compared to more important factors.

To uncover other subtle factors affecting application rate, I tried a technique I learned at Facebook: for each job posting, I calculated the percentage of words used in each of the approximately 100 word categories in Harvard’s General Inquirer dictionary (e.g. percentage of positive words, food-related words, law-related words, etc). While this method did not yield as much insight as I had hoped [2], there was one interesting observation...

Talk about the company, not the candidate

There is a negative correlation between “you” pronouns (“you”, “your”, etc) and application, and a weaker positive correlation between “our” pronouns (“we”, “our”, etc) and application. This makes some sense: perhaps students enjoy reading about what a potential employer is like, rather than about what they must do or be. Perhaps seeing someone say that "you should have a solid knowledge of spreadsheet applications" is taken to be a bit aggressive. Incidentally, there is a negative correlation between “ought” words (“must”, “should”, etc) and application.

The word “you” came back again when I analyzed the correlations between application rates and the appearance [3] or increased use [4] of individual words (as opposed to word groups). Indeed there is a negative correlation between repeated use of the word “you” and application.

Good words, bad words

Several other words are correlated with application rates. Here are some words whose appearance or increased use is positively correlated with application rates:
Analysis, Capital, Construction, Design, Electrical, Energy, Engineers, Engineering, Excel, Mechanical, Projects, Toronto
Many of these words relate to the previous parts of “Mining Jobmine”, as they identify fields in low supply or high demand (which are apparently finance and engineering, especially mechanical engineering), and places that Waterloo students want to be (well, Toronto...). I’m not sure how to interpret the word “projects”.

As for words whose appearances are negatively correlated with application rates [5], there are actually more of these than "positive" words. Below is a partial list consisting of the most statistically significant words.
Application, Community, Development, Framework, fulltime, hours, HTML, Java, need, .NET, open, planning, Server, SQL, title, Unix, users, Web, Windows, within, XML
Again, the programming words in this list suggest that programming jobs are in low demand or high supply. Other words are hard to interpret: should employers refrain from talking about its hours, its fulltime employees, or about its users’ needs? Perhaps some of these correlations are spurious.

Junior, Intermediate, AND Senior

Each job posting on Jobmine has one or more “level” tags associated with it: Junior, Intermediate, and Senior. These tags describe the “level” of students that an employer seeks, and are used by students to search for jobs appropriate to their level. The plot below shows the mean application rates (and 95% confidence interval) of jobs with each set of tags, with the red line showing the mean application over all jobs.

In most cases, adding an extra “level” tag increases application rates by about 10. Adding an extra “level” tag would mean that more students are likely to see your job. The exceptions are, of course, those 7 jobs that are tagged Junior and Senior...

Avoid special instructions

Special instructions are red-coloured messages that appear above a job description in Jobmine. Employers use it to announce information sessions, to remind students to apply through their website, or for other reasons. Around 40% of job postings on Jobmine have special instructions, and these postings receive 6 fewer applications on average than postings without special instructions. This is quite a large difference - and statistically significant, too. Perhaps the contents of special messages turn applicants away? Perhaps people don’t like seeing big bright red messages when reading a job posting? Either way, including special instructions might have drawbacks that employers do not expect.


While most students spend hours perfecting their resumes, employers don’t always think as much about job descriptions. Yet these analyses show that a student’s decision to apply for a job can be influenced by factors other than the job itself. Some of these influences are marginal, while others are large. The analyses suggest that employers can increase the candidate pool by shortening job postings, rewording job descriptions, or by being cautious about using special instructions. Of course, an employer’s end goal is to find a suitable candidate, and so the quality of the candidate pool is more important than its size. Whether or not improving a job description is worth an employer’s time is another story -- especially since the effects of changing an individual job posting are uncertain.


[1] Application numbers are heavily skewed, so to satisfy the assumptions of the linear regression model we take the square root of application rate as our dependent variable. Number of words in a job posting is still our independent variable, and the curve we get is a quadratic.
[2] Several word categories showed statistically significant correlations with applications, but these correlations are hard to interpret because many word categories are filled with homonyms and questionable words. For example, the category “Land” contains words describing places occurring in nature, and is correlated with applications. However top words contained in this category are “field”, “range”, “bank” and “fall”. As another example, the words “time”, “service” and “fun” are considered “hostile” words in General Inquirer.
[3] To test the effect of the appearance of a word, I split up the jobs based on whether or not a particular word appeared in its job description, and used a two-sample non-paired t-test. Very uncommon words or very common words were ignored.
[4] To test the effect of the number of appearances of a word, I correlated the number of times a word appears in a job description and application, and calculated the p-value. This analysis was done only on words that appear more than 10 times in at least one job posting.
[5] All of these words are significant when [3] is applied to them.

Thursday, July 22, 2010

An Ideal Society

Designing the ideal society is an old puzzle. Many ideas were generated over time, and some have even been put into practise. We have tried many different ways to organize society: everything from monarchy to democracy to communism. Yet none of these systems have yet stood the test of time. Ideas that look great on paper often fail in practise.

I think that this is because when we design an ideal society, we allow ourselves to also design the citizens of that society. We allow our society to dictate how a human being should behave, and assume that they will behave as expected. For example, a communist society assumes that its citizens would give their best in return for others' best, and have all citizens' needs met together; that its citizens are willing to stand by the mantra: "from each according to his ability, to each according to his need".

But it's difficult, if not impossible, to convince every person to behave in a certain way. Nobody is perfect, and certainly there will be people whose interests conflict with that of society. Indeed some people will do anything they can to game whatever system that is in place.

So perhaps an ideal society is not what we need, because we are not ideal people. Perhaps we have been considering the wrong question all along. Instead of designing the ideal society, perhaps we should be designing a robust society. By “robust” I actually mean two things: First, that the society should still function if certain assumptions about the nature of its citizens are violated. Second, that the "locally optimal" behaviour for an individual should also be optimal for the society. (This is akin to the idea of evolutionary stability in "The Selfish Gene".)

As an example, we can see that communism fails at robustness: the "locally optimal" behaviour for a person would be to produce less and consume more, which is not optimal for society; and if a few people decide not to give their best, this game would become quite unfair to those who play by the rules, and so others are likely to also cheat.

Declaring that we have designed an ideal society when we take the liberty to design its citizens seems like a rather strange exercise. If we can decide how people would think and act, wouldn't any reasonable society we create be an ideal society? Design a society where citizens are required to give up their own children and raise a random person's, but design the citizens so that they understand why this is done (equal opportunity, perhaps?), and you have an “ideal” society.

... and yet we’ve only gone in circles. Tautologies are tautological.

End of Entry