March 18, 2007

The Data Gap: When The Tools Are There But The Data's Not

As readers of this blog know, data sets make my heart beat faster. Data sets that have been analyzed/visualized and made to tell a story are even better. The work being done by organizations like the Sunlight Foundation Labs (who are taking publicly available data sets about US congressional/political issues and making them do interesting things online) is tremendously exciting, and inspiring to many other organizations around the world. I spoke to Tate Hausman of dotOrganize earlier this week about the Integration Proclamation, a data sharing manifesto that he hopes will lead to real action on the part of vendors and open source communities. The goal of dotOrganize's latest project: best-practice open APIs ("application programming interface") for all data applications used in the non-profit sector. The end result, in an open API world, will be data that flows easily between applications, and
perhaps equally exciting, out of applications and into the world of data analysis.

It's easy to get excited about these moves towards a world where data is free (as in freedom) and accessible, and the tools to analyze it, like Swivel and Many Eyes, are available to anyone with a web connection. Where it all grinds to a screeching halt is when you get to places where there really isn't much data. In most of the world, governments don't make information publicly available, even if they're supposed to -- see fabulous FarmSubsidy.org's efforts to make EU governments cough up information on national distributions of Common Agricultural Policy funds -- that is, EU taxpayer money. Except in wonderfully transparent countries like Slovenia (surprise!), this is pulling teeth, and only works through an application of Freedom of Information Act laws, a lengthy and complicated process that can end in stalemates and government hedging or partial release of information.

But at least in this case, the information exists -- not with easy access, and not in nice XML-y formats that lend themselves to clear comparison across EU countries. Where it gets trickier is when you move to the parts of the world where my employer, the Open Society Institute, does a lot of work. In sub-Saharan Africa, for instance, data is rarely available from governments on many issues important to civil society -- educational budget distributions, public health budget distributions, etc. Forget about information on things like political contributions or, in many cases, basic public information like parliamentary voting records. Where information is available through a government, there may be little accountability on where it came from or its accuracy. What happens is that data sets need to be painstakingly created through the hard work of civil society monitoring groups, like the Public Service Accountability Monitor in Grahamstown, South Africa, which follows the activities of the government of the Eastern Cape, or IDASA, also in South Africa, which undertakes monitoring projects on a range of issues from local government accounability to distribution of national HIV/AIDS budgets on an extremely granular level, or Mzalendo, the online project run by Ory Okolloh and anonymous blogger M out of Kenya to track parliamentary activity in their country. But this can be expensive and/or time-consuming work, and fraught with difficulties -- not the least of which are that the people who are good at this kind of monitoring are not necessarily directly in touch with the advocacy groups that can use the data effectively to change policies. Even when serious data is available in the developing world, the skills to analyze and put it to compelling use are often spread very thinly on the ground.

My guess is that the data gap will continue to plague the developing world for years to come. A combination of factors are behind it -- government corruption and self-dealing are tremendous disincentives for transparency, as well as skill gaps, technology gaps, and perhaps most importantly, the lack of consistent demand for (and support for) this kind of transparency from international donor organizations.

February 23, 2007

The (Grand) Challenge of Visualization

If you're spending your weekends figuring out how to use 3D geospatial tools to visualize the human impact on climate change, or better yet, you're devising new tools to do Google Earth and ESRI one better, make sure you get yourself over to the International Symposium on Digital Earth's Grand Challenge 2007 by April 1st. As the site says:

How can we better experience this world of ours at the cross roads of human impacts and climate change? How can we best communicate these experiences, particularly in light of the major changes Earth now faces, as one world? How can we most compellingly understand and communicate those experiences and processes? What 3D experiences or 3D tools can you share that might encourage the opportunity for a better world?

If you think you can do this in a way that demonstrates how people can more easily and effectively communicate, YOU COULD WIN BIG!

Although there is a Publishers' Clearinghouse element to this come-on (in fact, you may ALREADY be a winner!), their hearts seem in the right place, and certainly my heart beats faster when someone talks about innovative visualizations of social issues. So, what is it exactly you're supposed to do to win this contest? Keep scrolling down, and they finally tell you at the bottom that "Entries must demonstrate unique or innovative applications, tools, or utilities for 3D Visualization". So in other words, maybe you've found something interesting to do with an existing web 2.0 app, or maybe you've gone ahead and coded your own. Given the competition for uptake among new software tools, I'd be more interested in the former -- what new stuff can you do with what's already out there? However, my suspicion is that the contest will favor the latter -- new tools are more impressive than new applications of old ones.

But whichever way your heart lies, I'm delighted that Google Earth, NASA, ESRI and other sponsors are supporting this contest. And note, on the intellectual property issue: "Copyrights and ownership will remain with the author/creator; however, copyright permission to publish the entry and announce the winner's name will be retained by the ISDE5 Secretariat.". So if you are building a tool, make sure it's open source, would you? The rest of the world will love you even more.

February 10, 2007

I'll Show You My Data If You Show Me Yours

Will the web 2.0-ness never end? Now we're visualizing shared data sets, with two new projects just launched that encourage users to upload their data sets and map them against each other. Why would you want to do that? Take a look at the map below, which a user on one of the services, Swivel, created to show the relationship between global GDP and yearly average global temperatures. Interesting, no?

swivelgraph.gif

Data visualization may sound a bit complicated and off-putting, but it's all about making information sets easier to grasp. Instead of looking at a bunch of tables and numbers, you look at a picture which depicts those tables and numbers. Some simple well-known examples of this are the beloved pie chart, the bar chart, and the x/y graph, although more intricate data visualization can involve graphics, colors, maps, and other design elements. Also sometimes known as "information design" by the dedicated followers of visualization kingpin Edward Tufte (of which we at janethaven.com are one, incidentally), the display of data in quick-to-understand graphics is a skill worth exploring. Better yet, good information design allows you to set apparently unrelated data sets against one another to tease out relationships that are not necessarily obvious in a table of figures set side-by-side.

Data visualization tools have been on the web for some time now. From govcom.org's Issue Crawler to Hans Rosling's GapMinder to Google Lab's new Trends visualization project (here's one on searches on Repblicans/Democrats) to Data360, which has been around for about a year, there are lots of tools out there to let you look at data in graphical format.

Love of visualized data sets, however, is clearly a growth business, if the launch of two web 2.0-style data sharing-and- visualization services, Swivel and Many Eyes is any indicator. Where Flickr encourages you to share your photos, and youtube your videos, Swivel and Many Eyes both want you to share your data sets, and then visualize them. Swivel encourages you to mash up various data sets, while Many Eyes lets you work with one data set at a time, but with more options for visualization tools than Swivel currently offers. Both of them are very recent launches -- Swivel in early December 2006, and Many Eyes (a project of IBM's Collaborative User Experience research group) in January 2007.

Both projects also emphasize the social value of sharing data. Many Eyes explains:

Many Eyes is a bet on the power of human visual intelligence to find patterns. Our goal is to "democratize" visualization and to enable a new social kind of data analysis. Jump right to our visualizations now, take a tour, or read on for a leisurely explanation of the project.

All of us in CUE's Visual Communication Lab are passionate about the potential of data visualization to spark insight. It is that magical moment we live for: an unwieldy, unyielding data set is transformed into an image on the screen, and suddenly the user can perceive an unexpected pattern. As visualization designers we have witnessed and experienced many of those wondrous sparks. But in recent years, we have become acutely aware that the visualizations and the sparks they generate, take on new value in a social setting. Visualization is a catalyst for discussion and collective insight about data.

Great. Swivel is even hoping to make some money off their service, by allowing public data accounts to be free and private data accounts to be run for a fee. Both services also encourage community and data-sharing across platforms: you can blog your visualizations with copy-and-paste HTML, and Swivel is even more hooked into the web2.0-ness of it all with community features and automatic Google and Wikipedia search links. For more on the similarities/differences between the services, see the post on Tim O'Reilly's blog from a couple of weeks back.

The question all this activity around social visualization of data sets raises for me is whether people are seeing the information around them in more structured terms. To put it another way, I wonder if more people will come to these tools without their own data sets, play around with what's up there already, and go back to their own work with a new eye for what they might be able to extract usefully from the babble of infomation that surrounds us all -- or will these types of sites only appeal to people who are already data geeks, and who already see the world in terms of what data they can scrape, create or download from publically available sources.

This is an important question in my work as one of the problems we've been thinking about at the Civil Society Communications project is how to get non-profit organizations who often collect large amounts of data for advocacy purposes to think about visualizing that information rather than only collating it into a written report or a set of flat tables. The written report is important to establish a baseline set of facts and to look at trends in detail, but the information visualization piece, which is almost entirely missing from the work of most advocacy groups, particularly those working in the global south, can quickly catch the eye of new supportors and decision-makers alike. These types of organizations may not even see the information collection they are doing as generating data sets, and depending on how they go about it, they may miss that opportunity...if you think you are collecting information only for a written report, you might collect it, store it, and categorize it quite differently than if you are thinking of using it to tell a visual story.

So my hope, when I look at these types of tools that "democratize visualization", is that they will not only fulfill their stated mission, but also help with education and inspiration among those who may not yet find themselves toe-tappingly excited when someone mentions "data sets" and "visualization" in the same breath.

Disclosure: My employer, the Open Society Institute, a private grant-making foundation, provided financial support to the development of the Issue Crawler software mentioned above.