Trick or treat?
Yes, the time has come for all to dress up in Halloween costumes, go trick-or-treating, and sit by the fireplace to discuss the horrific ghost stories that people have encountered since their childhood. That said, did you know that in the US, Halloween is the second-highest commercial holiday in which total expenditure goes up to $9 billion? So, considering the love between the US and Halloween, it’d be interesting to dig deep and find out the spooky elements of the country. We’ll find out if your city falls in the list of haunted places and whether you should be a bit extra careful this Halloween.
For this study, we extracted data from a website called Shadow Lands (feeling spooked yet?) to build the data set. Not only does it list haunted locations in the US, but also mentions the history behind each place. Visitors of the site have an option to add their own haunted place in case it is missing! Using the data from the website, several data fields related to each location was captured. Here is the list:
- location
- city
- state
- description text
- latitude and longitude of the city
Goals
Here is what we’re unraveling from the analysis:
- Top 30 haunted cities
- States based the number of haunted places
- Heatmap of haunted places across the US
- Frequently used words in the description
- Underlying connections of the words
The most haunted city
The chart shows the top thirty cities according to the number of haunted places. We see that Los Angeles, San Antonio, and Honolulu are at the top spots when it comes to haunted places.
It would be interesting to notice that Los Angeles has spooky locations with descriptions referring to Hollywood twenty five times and Universal Studios twice. And the following locations in LA are also prevalent:
- Boyle Heights
- Loyola Marymount University
- Occidental College
Be careful in these areas!
The most haunted state
Of all the states, California tops (not that it is a surprise), but it is closely followed by Texas and Pennsylvania. In case you’d rather stay in a less spooky city with lower number of “incidents”, I would recommend Montana, Delaware and Alaska since they are the least haunted states.
Heatmap of the haunted places
The charts for haunted cities and states give a fair idea, but is there another way to visualize how the haunted places are spread across the US? That’s when a heatmap comes into play to give an idea of the density of the locations.
Clearly, the East Coast has denser clusters of haunted places in comparison to the West Coast (only epicenters like LA, San Francisco, and Seattle contribute to spookiness here). Apart from that we see the Southern US is more haunted than the Northwestern US.
Frequently used words
Now, we will look at the most frequently used words in the description text of the data set. The following word cloud shows the top 300 terms:
It shows that words such as ‘night’, ‘people’, ‘old’, ‘see’, ‘house’, ‘ghost’, ‘room’, ‘building’, ‘room’, etc. are prevalent. Some of the interesting findings are the following:
- It seems the chance of encountering mysterious beings is higher in house, building and road in comparison to cemetery.
- The cumulative word count for female (women/girls/lady) is more than male (men/boys). Word count for ‘old’ is higher than ‘young’.
- Vampires outnumber werewolves in terms of word count.
Relationship between words
Although we figured out the frequently used words, it’d be much more insightful if we could find out the relationship between the words used in the description text. Here we will focus on bi-grams (a pair of consecutive written units) and visualize the relationship via network graph.
This network graph shows some interesting connections. For instance, there is a cluster of words related to soldiers and civil war which means some of the haunted places have emerged from the death and destruction caused by civil war. The larger cluster at the bottom associates ghost with haunt, hunters and stories (which makes sense). We also see words such as shadowy, ghostly and dark are associated with figures, which is connected to walking. Check out how the word poltergeist (noisy ghost) is associated with paranormal activity! This is mostly because of the nature of the poltergeists — they are known to levitate objects and horrify people by pinching, hitting and tripping humans.
So, that was some fun use of analytics and data sourcing via web scraping. Now it’s time for you to carve a pumpkin and impress people by capturing the talking points for the Halloween party.
Have a spooktacular Halloween and may the holy ghost bless you!