Android device fragmentation, quantified

Fragmented Android

Nearly everyone involved in the Android ecosystem has heard of the issue of Android device fragmentation: it has been covered in blogs again and again and again  and even Eric Schmidt, Google’s CEO, has taken some time to address the issue last week in CES.

But how can we quantify this problem? How can we tell how many different Android device types are in use?

A few weeks ago, somewhat in response to this issue, the Android Market started listing device type (e.g. “Samsung Galaxy S2″) for each app review. This means that it is now possible to estimate the number of different Android devices types that are out there. Care to guess?

The answer is 170. There are currently 170 different types of Android devices using the Android Market.

This is only the lower limit – bear in mind that this figure represents the number of devices from which app reviews have been posted, so it is possible that there are device types we haven’t yet seen.

With device types now being listed in the Market, all sorts of interesting possibilities arise: for example, determining whether there’s a relation between the device a user has and the apps he’s likely to download. Another example would be find out device-specific issues reported in reviews. All of these and much more – coming soon.


LeWeb Ignite presentation now available online

On Thursday 08/12 I gave a lecture at the LeWeb Ignite session, where I demonstrated how language processing techniques can be applied to Android Market reviews in order to produce some pretty interesting conclusions about the Android ecosystem. Thanks to the awesome production team @ LeWeb, the presentation is now available on YouTube:

Also, the slides themselves are available on slideshare.

In the next couple of weeks, I will be releasing more detailed reports covering various aspects of the Android Market. Stay tuned.


Android Market reviews visualized, tomorrow at Ignite LeWeb

In just under 24 hours, I’ll be on stage at the LeWeb Ignite session, where I’ll be giving an overview of my project concerning Android Market reviews. I plan to use my 5 minutes and 20 slides to demonstrate how the enormous data set that is the Android Market can be processed and analyzed to derive interesting conclusions about the Android ecosystem, such as:

  • What software issues are Android users most concerned about?
  • Why do Angry Birds Seasons and Angry Birds Rio have totally different review patterns?
  • How is the Android Market review system being abused?

I plan on publishing the results of the entire project in more detail in the next couple of weeks, so feel free to follow me and stay updated.

Also, the presentation itself will be made available shortly after LeWeb, so stay tuned. In the meantime, here’s a sneak peek of what it’s going to look like:


And the top word for a low-rated Android app is…

As any seasoned mobile app developer will tell you, one of the most important factors in an app’s success or failure is how it’s reviewed and rated. Reviews allow the developer to receive feedback from the users, and ratings play a major part in an app’s visibility and ranking in the Android Market. Perhaps more importantly, rated reviews are users’ first genuine impression of an app prior to downloading it: if an app’s description is its business card, its reviews are its reputation.

App reviews exist by the millions, containing numerous users’ opinions about how apps should and shouldn’t behave. By zooming out and looking at very large sets of such reviews, it is possible to provide insights that can help developers design and implement better apps.

So how do we tap the collective wisdom of the Android-wielding masses? I’ll give you a very simple example in the form of a riddle: what does the following word-cloud represent?

App descriptions in low rated reviews

Got it? No?

The simple answer is that the image above represents roughly what Android users think of low-rated apps on the Android Market. Besides cluing us into Android users’ favorite put-down words, it also demonstrates how reviews can be analyzed to gain insights into the world of Android users.

The word cloud above is based on data generated using a proprietary tool that analyzes Android app reviews from different sources and singles out words used to describe the apps themselves. For sake of completeness, here’s what the same technique yielded on high-rated apps:
App descriptions in highly rated reviews

Well, it looks as though if an Android app is OK, it can be a lot of different kinds of OK, but if it’s bad… well… it mostly just sucks… :-)

In the follwing posts, we’ll see additional analysis work of publicly available data from various sources. The results will show several conclusions about Android apps, about how users perceive them and their bugs/features, and about the Android ecosystem in general. Stay tuned!


Visualizing the U.S. national debt

The last couple of weeks saw countless items in the worldwide press dealing with just one number: the U.S. national debt. The economic whiplash caused by the downgraded U.S. credit rating earlier this week is a testimony to the power of this single figure: $14 trillion (according to an official White House report).

Wow, that’s a big number: fourteen trillion dollars. As in, $14,000,000,000,000. Such a large number gets most people usually thinking of this:

Naturally, an impressive number such as this, getting so much press is bound to lead to interesting visualisations. Geekographics.com has collected some of these over the last couple of days, each trying to make sense of this 14-digit montrosity in its own unique way:

The first one is an official infographic published by the white house, which gives a very high-level breakdown of the debt, and explains how and why it has increased over the last decade. Notice how decisions and policies that affected the debt are color-coded to reflect the different administrations.

The second live visualization, dubbed “The U.S. debt clock” can be found here. Regardless of accuracy or validity of the data, it is clear that whoever designed this page has taken an opposite approach to the one taken by the White House designers: many running numbers, each representing a different component or aspect of the debt. Notice how the bottom-line figure is in the top left corner, not underscored in any special way.

The last infographic is geekographics.com’s personal favorite: instead of trying to explain what affects the debt or to break it down, this one tries to size it up, trying to quantify what is an otherwise unimaginable number. The coolest thing about this infographic, except for its slick design, is that it bypasses people’s “ONE MILLION DOLLARS!” mechanism and actually gives them a pretty good idea of how collosal the debt it.