Classifying Tornado Intensity

5 min readAug 28, 2020

I will be using the Historical Tornado Tracks dataset from the National Oceanic and Atmospheric Administration (NOAA) and the National Weather Service (NWS) to predict a tornado’s F-scale rating. The data I am using can be found here.

First, lets discuss what the F-scale is. Also called the Fujita scale, the F-scale rating was introduced in 1971 by Ted Fujita and Allen Pearson. It was designed as a way to measure the impact a tornado had on man-made structures and vegetation. It is not a method of measuring a tornado’s width, path length, or wind speed.

While the original scale had 13 theoretical categories, Fujita intended for only six to be used. These range from F0-F5 on the scale. The categories and their descriptions can be seen below:

More info on the Fujita Scale can be found here.

Now that we’re all familiar with the rating system, I’ll dive into the data. While the dataset includes all US tornadoes from 1950–2018, tornadoes from 1950–1972 were retroactively rated by NOAA. Included are features describing starting and ending latitudes and longitudes, fatalities, injuries, loss and crop loss (in millions of dollars), path length, path width, and more.

First, let’s look at the distribution of F-ratings in the data:

As you can see in the image above, the data has very imbalanced classes. Over 46% of all tornadoes since 1950 are classified as F0's, while not even 0.1% of them were F5’s. This will introduce several challenges with classification which I will go over later.

To start any classification problem, let’s determine our baseline. We’ll use the majority class as our baseline, which, as we can see in the distribution charts, is 46 percent. For those not familiar with classification, all this means is that if we guessed that every entry in the dataset was an F0 (the majority class), we would be correct almost half of the time. So how can we improve our guess rate?

Let’s move on to a simple model: Random Forest classifier (RF). One thing to note is that at this point I have already done some feature selection to produce a better result out of the Random Forest model. More on that in a minute.

So running a RF model on the data, we get this confusion matrix. More info about confusion matrices can be found here. The accuracy we get from this model is 66 percent, a significant improvement over our baseline of 46 percent.

As I said earlier, I have already done some feature selection to improve two factors: general accuracy, and the accuracy of the minority classes. This is a list of the current features and their importance in the model. Now let’s think back to our definition of the F-scale — it is simply a measurement of a tornado’s damage to man-made structures and crops. So how does the model perform when it is only fed those columns?

Here is the confusion matrix when the model is trained on only the Fatalities, Injuries, Loss, and Crop Loss columns. The model had a 62 percent accuracy on these four columns. While this may be a surprisingly small drop in accuracy, it is understandable considering these are the most important features to the Fujita Scale.

So what else can be done to improve the accuracy of the predictions? As we can see in the distributions image, our classes are extremely imbalanced. To counter this natural disparity in the data, there are a couple possible solutions.

The first option is to use the SMOTE algorithm to over-sample the minority classes. The algorithm does this by creating new entries that lie in between the existing entries. While this method doesn’t automatically guarantee that the new entries are realistic, the nearest neighbors methodology should produce realistic entries in my case.

Here is the confusion matrix and the classification report for the SMOTE algorithm. I have added back in all of the columns listed in the feature importance image.

While the 62 percent accuracy is identical to our previous model, the confusion matrix clearly shows an improvement in correct predictions.

The second option would be utilizing the NearMiss algorithm. This method under-samples the dataset to include less counts of the majority classes.

The confusion matrix plot looks nearly identical to the SMOTE plot, with slightly tighter spreads around correct predictions and a minor improvement predicting F5’s. As shown by the accuracy and recall scores, the NearMiss model is a small improvement over the SMOTE algorithm.

I’ve learned two major things during the course of this project. The first thing being that tornadoes are difficult to classify. While 63 percent accuracy certainly isn’t terrible, I know that better scores are achievable (though outside of the scope of this project).

The second thing was the knowledge I gained from working with data that contained such divided classes. Before working on this project, I didn’t know about the different techniques used to deal with minority classes. The results I have achieved implementing these techniques should speak for themselves in proving their value.

Classifying Tornado Intensity

Written by Luke Melto