Data Visualization of Bengfort.com Keywords
See the live visualization and play with the Spring-Graph structure at: http://www.bengfort.com/keywords/
I’m constantly amazed about how people can manipulate data and statistics any way they see fit to make their own point. I don’t know about you, but whenever someone gives me the “numbers” I’m very skeptical of where they came from. Just consider the fodder from our so called major news networks that John Stewart has to make fun of! I think I’m right in saying that people are all too willing to believe “numbers” just because they look science-y or there is a pretty bar graph. Even simple inspection will reveal flaws- percentages that don’t add up to 100 or whose summation far exceeds 100. Graphs that use highlighting and weighted fonts that don’t necessarily apply to a distribution, or the simple omission of keys (legends) that would prove an opposite point.
That’s why the science (and art) of data visualization so appeals to me. We have learned bar graphs, line graphs, and pie charts since we were in elementary school, but these are the tools that are so often used to mislead us: simply because they are too simple to hold the complex data that we are now used to analyzing on a daily basis. Data visualization attempts to take complex data sets and graphically represent them in a way that humans can instantly comprehend their meaning. Visual cues including size, color, shape, and difference are all used to represent some form of data. With the growth of web technologies and web databases, an ever increasing number of amazing and interesting data visualizations has appeared, and soon I believe that elementary school kids will be taught even more complex data structures.
So, when I got a tweet from Sitepoint.com concerning building a keyword visualizer with Flex- I knew that this would be perfect for our website. So I read the article and built a version of what they used for our website! (Note that at this point, I’m still awaiting the third part of this three part series, and then I will continue to make my own customizations, so stay tuned for more updates with the visualization!) Essentially what is happening is that a script goes through our blog database and picks out keywords in all the posts. Keywords that appear in the same post are considered linked. For instance, by writing science and technology in this post together, those two words will now have a link between them. In addition the script counts the occurrences of the keyword as well as the occurrences of the links. (If you’re keeping score- this is a server side PHP script that outputs the results in a JSON file format).
The visualization is handled by Adobe’s Flex framework combined with the SpringGraph API. The more a keyword appears, the larger its nodes will be- in addition, the higher the count of links between keywords, the larger the link will be. Distance is also a factor- the larger keywords are on the outside, with the lesser keywords on the inside- they “repulse” each other by the strength of their links. Now, by simple inspection we can see that Guyana linked with Recipes and Cookbook- is by far the largest part of our website. Benjamin is connected to China and Ballet (don’t know why) while cat and dog are so closely connected that they are almost touching! You can see how this provides basically a topical analysis of our blog!
I know you guys may not find this particularly interesting, but I hope you can grasp how much data has been distilled into an easily viewable graph- we have over 600 posts in our blog, each with about 700 words in them- all distilled into an easily comprehensible visual medium. As our blog changes, so will the graph. I think that in all our fields- International Relations, Political Science, Anthropology, Business, International Education, and Computer Science- this is extremely relevant, and I hope that you guys will make use of the tools that I have shown you. (speaking of those fields, I should probably add them as key words!). If you are interested in doing any sort of complex visualization- trust me, I’m your guy to develop an application for you that will do it!
See the live visualization and play with the Spring-Graph structure at: http://www.bengfort.com/keywords/

<>
When I took Statistics in College, my Prof. said the real reason he taught that class was to show how statistics could be skewed. It is one of the few things I remember from that class.
Sue
Agreed, I believe my professor said something similar. There is a problem with statistics that goes beyond simple misleading visualizations- there is a disconnect from the analysis of what statistics mean and the math that calculates correlation and causation via probability. 95% certainty, while mathematically relevant is not necessarily sound enough for life or death decisions!
90% of all statistics are made up anyway