A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.



MAS.S70 Applied Data Visualizations

2 minute read


Machine Learning or ML? - How words enter the public domain


For this project I am going to look at how new words enter the common language realm.


One of the responsibilities of a journalist is to teach his readers. This is not just limited to conveying news, but also includes teaching new vocabulary. This is particularly relevant in the fast-paced realm of technology, where artificial intelligence, the cloud, machine learning, and big data have become significantly more newsworthy as these concepts transform the industry and the process of innovation. But when do these words enter the public realm and vocabulary and become ML and AI? We should expect that journalists initially only use the unabbreviated concept. As the concept starts to enter the public domain, journalists may use both the abbreviation and the full word side by side, until the abbreviation is eventually predominantly used. Let's see if this is true!


I decided to focus on the following words and abbreviations:
  • Artificial Intelligence, AI, and A.I.
  • Machine Learning, ML, and M.L.
  • Natural Language Processing and NLP
  • Neural Network and Neural Net
  • Generative adversarial network and GANS
  • Recurrent Neural Network, Recurrent Neural Net, RNN, and R.N.N.
  • Application Programming Interface and API
  • Deep Neural Network, Deep Neural Net, Deepmind, and Deep Mind
  • Supervised Machine Learning, Unsupervised Machine Learning, and Reinforcement Learning
  • LSTM, Embedding space
  • Cloud, Big Data, Technology, Automation, Robot, AOL, Cyber Crime


I scraped articles from the Guardian between 1999 and 2017 and count the number of occurences and co-occurences of the se words. The Guardian's online edition was the fifth most widely read in the world in 2014 (Source) and is thus a reasonable proxy for journalistic activity.


The most interesting results came from AI and ML. According to the 'Journalist Educator Hypothesis' above I expected that the number of occurences of the abbreviations would eventually overtake those of the complete words. However, we observe the opposite!

Timeline for AI versus Artificial Intelligence

<div class="single_viz" id="timeline_ai_c" margin: 0 auto> </div> <div class="single_viz" id="timeline_ml_c" margin: 0 auto>

Timeline for ML versus Machine Learning

One explanation may be that the target group changed. Whereas initially these kinds of tech articles may have been directed at the already knowledgeable readers, as these topics became more popular over time, the full word usage became necessary. It may also be indicative of journalists preferring to use the full word as the abbreviation comes as across as more and more 'buzzwordy' as the popularity of the concept rises. Speaking of buzzwords, let's have a look at a couple.
<div class="single_viz" id="timeline_other" margin: 0 auto>

Timeline for Buzzwords

We can see, perhaps surprisingly, that 'Cloud' and 'Big Data' are actually on the downturn, whereas 'automation' and 'robot' have become much more common. If this is at all indicative of company behavior, it implies that there may have been a shift from virtual innovation to physical innovation. Finally, most of the technical terms, like embedding space or the different types of machine learning almost never occur, presumably because the Guardian is a news outlet accessible to a general audience.
Just for fun I also tried to look into co-occurences of words, combining the full word with their abbreviations into single categories. We can see that there are actually not that many co-occurences. The most common ones were AI with Robots, AI with Automation, AI with ML, and Big Data with Cloud.
<div class="single_viz" id="chord_diagram" margin: 0 auto>

Chord Diagram

</div> Read more

Blog Post number 1

less than 1 minute read


This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool. Headings are cool ======

You can have many headings

Aren’t headings cool?

Read more



Learning Occupational Task-Shares Dynamics for the Future of Work

Published in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 2020

The recent wave of AI and automation has been argued to differ from previous General Purpose Technologies (GPTs), in that it may lead to rapid change in occupations’ underlying task requirements and persistent technological unemployment. In this paper, we apply a novel methodology of dynamic task shares to a large dataset of online job postings to explore how exactly occupational task demands have changed over the past decade of AI innovation, especially across high, mid and low wage occupations. Notably, big data and AI have risen significantly among high wage occupations since 2012 and 2016, respectively. We built an ARIMA model to predict future occupational task demands and showcase several relevant examples in Healthcare, Administration, and IT. Such task demands predictions across occupations will play a pivotal role in retraining the workforce of the future. Read more

Recommended citation: Das, S., Steffen, S., Clarke, W., Reddy, P., Brynjolfsson, E., and Fleming, M. (2020). "Learning Occupational Task-Shares Dynamics for the Future of Work." in Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 36-42.

Occupational Change: Automation and Reskilling Risks

Published in MIT Sloan - Master of Science in Management Research, 2020

We derive a novel occupation-industry level panel of skill demands from the near-universe of tagged online job postings in the US for the last decade (2010-2018).We use this data to study how the skill demands of occupations have changed andhow these changes affect the returns to skills. Low- and medium-wage occupations’skill demands changed more than those of high-wage ones. Thus, lower-wage work-ers face not only higher risks of technological displacement but also increased risksof reskilling in order to stay productive. We show that routine-biased technologicalchange (RBTC) due to automation technologies such as ML can best explain theseresults, while skill-biased and (endogenously) directed technological change cannot.Technical skills, such as ML, Business, Software, and Data Skills have particularlyhigh implied market values, as do Social Skills and Creativity. These therefore represent lucrative (re-)skill investment opportunities for workers, unlike writing andnon-cognitive skills. Finally, there is significant heterogeneity in industry fixed effects with the Utilities, Mining, Management and IT Industries offering much higherreturns than the Food and Retail industries, even after controlling for skills. Read more

Recommended citation: Steffen (2020). "Occupational Change: Automation and Reskilling Risks.".

Job2Vec: Learning a Representation of Jobs

Published in , 2021

Job postings provide unique insights about the demand for skills, tasks, and occupations. Using the full text of data from millions of online job postings, we train and evaluate a natural language processing (NLP) model with over 100 million parameters to classify job postings’ occupation labels and salaries. To derive additional insights from the model, we develop a method of injecting deliberately constructed text snippets reflecting occupational content into postings. We apply this text injection technique to understand the returns to several information technology skills including machine learning itself. We further extract measurements of the topology of the labor market, building a “jobspace” using the relationships learned in the text structure. Our measurements of the jobspace imply expansion of the types of work available in the U.S. labor market from 2010 to 2019. We also demonstrate that this technique can be used to construct indices of occupational technology exposure with an application to remote work. Moreover, our analysis shows that data-driven hierarchical taxonomies can be constructed from job postings to augment existing occupational taxonomies like the SOC (Standard Occupational Classification) system. Exploring further the model structure, we find that between 2010 and 2019, occupations have become increasingly distinct from each other in their language, suggesting a rise in specialization of tasks in the economy. This trend is strongest for managerial, computer science, and sales occupations. Read more

Recommended citation: Bana, S., Brynjolfsson, E., Rock, D., Steffen, S. (2021). "Job2Vec: Learning a Representation of Jobs.".

Cybersecurity Hiring in Response to Data Breaches

Published in SSRN Working Paper No. 3806060, 2021

Do firms react to data breaches by investing in cybersecurity talent? We assemble a unique dataset on firm responses from the last decade, combining data breach information with detailed firm-level hiring data from online job postings. Using a difference-in-differences design, we find that firms indeed increase their hiring for cybersecurity workers. While this effect is statistically significant, the economic magnitude is small, which is consistent with firms’ lack of incentives to improve their cybersecurity infrastructure. Further, we collect data from the MIT MediaCloud and Google Trends to measure media and public attention following breach events. We find that firms with greater media and search attention after a breach are three times as likely to post a cybersecurity job. With an increase in both the value of data as well as the number of cyberattacks, our research provides important insight into how media coverage and public attention can provide proper incentives for firms to make substantive IT investments to safeguard their customer data. Read more

Recommended citation: Bana, S., Brynjolfsson, E., Jin, W., Steffen, S, Wang, X. (2021). "Cybersecurity Hiring in Response to Data Breaches." SSRN Working Paper No. 3806060.

Digital Resilience: How Work-From-Home Feasibility Affects Firm Performance

Published in NBER Working Paper No. 28588, 2021

Digital technologies may make some tasks, jobs and firms more resilient to unanticipated shocks. We extract data from over 200 million U.S. job postings to construct an index for firms’ resilience to the Covid-19 pandemic by assessing the work-from-home (WFH) feasibility of their labor demand. Using a difference-in-differences framework, we find that public firms with high pre-pandemic WFH index values had significantly higher sales, net incomes, and stock returns than their peers during the pandemic. Our results indicate that firms with higher digital resilience, as measured through our pre-pandemic WFH index, performed significantly better in general, and in non-essential industries in particular, where WFH feasibility was necessary to continue operation. The ability to use digital technologies to work remotely also mattered more in non-hightech industries than in high-tech ones. Lastly, we find evidence that firms with lower pre-pandemic WFH feasibility attempted to catch up to their more resilient competitors via greater software investment. This is consistent with a complementarity between digital technologies and WFH practices. Our study’s results are robust to a variety of empirical specifications and provide a first look at how WFH practices improved resilience to a major, unanticipated social and economic shock. Read more

Recommended citation: Bai, J., Brynjolfsson, E., Jin, W., Wan, C. (2021). "Digital Resilience: How Work-From-Home Feasibility Affects Firm Performance." NBER Working Paper No. 28588.