To the layman, it would seem obvious to determine the popularity of an app category by comparing the number of downloads under each category. In fact, many articles comparing ios and android often do so by comparing the number of downloads under each software. Multiple sites like Statistica which aim to analyze which categories are the most popular in the market do so by looking at the total number of downloads under each category. My analysis of the google play store data to find the ten most popular app categories using the total number of downloads per category yielded the following results:

As is evident from the visual gaming proves to be the category with the most number of downloads. Is gaming followed by communications. On the other hand, the least popular of the ten categories seem to be news and magazines followed by travel.

The 10 most popular categories in the google play store app in order from most popular to least popular when looking at the total number of downloads:

Gaming

Communications

Tools

Productivity

Social

Photography

Family

Video Players

News and Magazines

Travel and Local

Although looking at the total number of downloads is the most commonly used method of analyzing data, you could also identify the ten most popular categories by looking at the mean number of downloads per category. Essentially this means that you find the number of downloads, on average, for an app under a particular category. You then find the ten categories with the highest number of downloads, on average, for an app under a particular category. The data becomes more normalized when the means of a sample are used. This normalization of the data is often advantageous as it helps reduce the effect of outliers on the results. Thus, using this method reduces the effect certain highly popular apps have on the overall popularity of the category. This means that a particular category wonâ€™t artificially appear to be more popular due to one or two outlier categories under that category being highly popular when the means are taken and the normalized data is used to find the popularity of a category. This is evident in the variations in the popularity of categories when the data is analyzed using the mean number of downloads rather than the total number of downloads.

As is evident from the visual, communication is the most popular category and not gaming. This is because of the unusual popularity of certain games while most other games remain unpopular. These highly popular games contribute in large numbers toward the total number of downloads and artificially increase the popularity of these categories. Thus, by averaging the total number of downloads in the gaming category with the total number of apps in the gaming category, we reduce the effect of these highly popular games thus revealing the true popularity of these categories. In contrast to gaming and communication being the two most popular categories while looking at the total number of downloads when looking at the mean number of downloads per category, communications and social appear to be the two most popular categories.

The 10 most popular categories in the google play store app in order from most popular to least popular when looking at the mean number of downloads:

Communications

Social

Video Players

Entertainment

Productivity

Photography

Travel and Local

Gaming

News and Magazines

Tools

From the top 10, most popular categories vary drastically between the two methods. Therefore it is clear that in a large number of cases outliers are artificially popularising certain categories. This can have large implications in the real world where data has a wide range of applications. Small businesses that may not have analysts on hand or many students may suffer the consequences when the wrong method is used and some categories might appear to be more popular than they actually are. It is important to keep in mind though, that in some instances it may be more appropriate to use the total number of downloads to find the most popular categories. Therefore, it is imperative that people consider the pros and cons of using a particular method before doing so.

Nice article, Siddhanth! Your topic is really relevant and interesting. I liked how you addressed both total and mean number of downloads and the impact of outliers. Do you have any inclinations as to why certain categories consistently fall lower or higher on the spectrum?

Woah, this data is really interesting! How did you find/get it?

Wow! I think the manner in which you chose to analyze your data, looking at both the total number of downloads and the mean number of downloads and then comparing, was a very effective way to show both the most popular categories as well as the effects of outliers/ differences in the number of apps within each category. Your analysis was also very thorough and well thought out. I am impressed by how short and well organized your code is considering how detailed your analysis looks. One thing I may suggest is either eliminating the legends on your graph since all the bars are already labels, there is no difference in the color, and it blocks some of your bars or changing the color and then eliminating the redundant labels at the bottom. Otherwise, very well done!!

This is very cool and interesting! I would recommend not blocking the data, but it was still very informative.

Nice article! It’s not surprising that people download communications and social networking apps the most. So many of us are really social and won’t pass down a good digital conversation.