Natural Language Processing

The best investors tend to be voracious readers. Two of my favourite investors – the young Warren Buffett and Michael Burry (before the events in the Big Short) are prime examples of investors with strong reading habits. In his earlier days, Buffett read about 500 pages every week. He may have slowed down over the years but I imagine he still spends the vast majority of his time reading annual reports and trade journals.

Michael Burry read the prospectus of several mortgage bonds before shorting subprime mortgage backed securities. These are probably the driest documents written by humans but he managed to plow through them and find value where few were looking. In his earlier days before the events in the Big Short, he was a vanilla value investor who read a lot about different companies and industries.

The question I am trying to answer is if there is a way to automate some of the reading done by Buffett and Burry using natural language processing. There are two themes that I have picked up from both investors:

  1. They invest in things they understand and stay away from overly complex businesses. The exception here is Burry who read the impenetrable and complex prospectus of mortgage bonds however, he ended up shorting them. Too much complexity is usually a good sign to stay away from an investment (or in Burry’s case, to go short).
  2. They look for things that are unlikely to change. This is especially true of modern day Buffett who buys quality companies with wide moats.

The recent article I wrote about the Flesch score showed that the complexity of annual shareholder letters by the CEO was a good predictor of subsequent share price movement. The letters written by CEOs which were easy to read (i.e. a high Flesch score) performed well for their shareholders. This was however not a detailed statistical study. The sample size was very small – the annual letters of only 8 CEOs were analysed. I am currently working on analysing the shareholder letters from 100 CEOs and CIOs of closed ended funds. It will be interesting to see if there are many false positives (i.e. good performance with a low Flesch score).

From the small sample size considered, the results seem to be binary. Flesch scores higher than 40 were good whereas scores less than 30 were bad. Comparing relative values above 40 was meaningless.

This makes sense because a score below 30 means the text is quite complex. The UK income tax act has a score of 26. An annual letter from the CEO should not be as complex as a tax code. Also, a piece of text only needs to be simple enough. Beyond a certain threshold, the Flesch score is meaningless.

The other tools I am looking at using Python and R is the naïve Bayes classifier and named entity recognition (NER). What I am looking for is consistency in writing style. A drastic change in style or the metrics the CEO chooses to focus on could be red flags. A CEO who has always boasted about margins and suddenly starts obsessing over revenue growth could be red flag. It could also be meaningless but using natural language processing can help you spot this without reading through 20 years of a company’s annual reports.

Using natural language processing should not be considered as a replacement to doing the leg-work of reading annual reports. If done properly, it can be good time saving tool.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.