I’ve been playing with Apache Hadoop lately to get my head around MapReduce and Distributed File Systems. Its quite an interesting subject. Facebook, Google, Twitter and all the big players of the internet use the techniques to scale at an insane level. However the issue we are finding is that they are closed systems. Unlike email which had a POP protocol, a gmail user can send emails to hotmail users and vice versa, most of the web systems are closed and don’t seem to work well with each other.
One of the other big problems is that machines don’t know they are machines and they can talk to other machines. Programmers still need to code custom things, machines aren’t able to self program themselves and adapt their behaviour to make our lives more streamlined.
The worst part is every application comes with a “Help” manual but normal users can never figure out how to use help to solve their problem. Google does a far better job. Are our books and the way we write text manuals flawed in a way that we don’t yet understand how to properly represent knowledge and compute it ? But even google is naive. Google is a crawler and indexer. They can give us a list of pages where the information exists, unlike wikipedia or stackoverflow which allows generation of massive amounts of organized data.
My gut feeling is that to make the semantic web work, we need to somehow understand how knowledge works, how we can efficiently represent insane amounts of data and be able to compute it to make smarter decisions.
