- De |
- En
Research Blog
Here you find news and ideas from the web science research community.
Shortcut LWS means Laboratory for Web Science
12.05.11
We got our first CTI Project
We got our first CTI Project. The project will be launched
together with the ETH. The LWS will implement a recommender
for a special job hunting platform.
This recommender has not only to draw qualitatively
high job propositions to the platform members, but also
to respect special constraints (profit-maximization) for the
platform users.
This aspect has never been considered in a recommender
so far.
We are looking forward for the forthcoming challenges!
09.03.11
Yahoo! KDD Cup
Yahoo launched a competition in the area of recommendation systems. We form a team with the Technical University of Graz. The tasks are rather challenging.
Check out:
07.03.11
Miscellaneous
Meanwhile we had a lot to do. We are in the second round of a CTI project proposal and about to hand in a second CTI proposal together with the ETH Zurich.
We started a collaboration with BASF Switzerland. We developed a system, which is able to forecast experimental results under certain boundary conditions.
We started the DODE project together with the Kantonsspital St. Gallen. Our prototype will be presented at the next conference for medical informatics in Switzerland.
18.06.10
DODE
Finally we did it. We sent our project proposal to the Hasler Foundation. Now we hope to get the requested money for our project.
Today I was querying 'Web Science' on google.ch. We are on the first page, second position! Great!
06.04.10
IMCIC Conference Part I
Today, the IMCIC Conference in Orlando/Florida started. In fact it's not one conference but three different ones taking place these days. The first day agenda was filled by administrative duties. However, that was ok. The trip to Orlando was quite exhausting and the next days will be quite busy. The program is diverse. From Meta-Engineering, Web Science to Robotics. I also had the pleasure to talk to Eldar Sultanow. He is doing his PhD and works on several interesting projects. One project is related to Semantic Web. They visualize data streams, generated by big companies, on different levels (macro, meso, micro). To do this they combine diverse technologies. He gave me a small demonstration and I was quite impressed. I will keep in touch with him, since we recognized quite big overlaps concerning our research programs.
16.03.10
In many science fiction novels - like Accelerando from Charles Stross - mankind is near what Ray Kurzweil called the Singularity. At this stage humans are able to 'upload' their thoughts to a collective - a network. This sounds quite futuristic. But hold on. Let's think on the WorldWideWeb. What is it? Just a bunch of loosly coupled documents? A unstructured information pool? By no means! The WWW is much more than this. Social networks, blogs etc. contain thoughts and feelings of so many people. You can find everything: gladness, teariness and all sort of tragedies. Everybody with a net connection is able to download those 'soul-traces', to comment, to participate. Isn't it the first step to what is described in so many novels? Isn't it the first step towards 'mind-uploading'.
Contact and suggestions: Dr. Marcel Blattner
08.03.10
New team members
We are glad to welcome two new team members starting first of April: Dr. Beatrice Paoli and Dr. Fabio Mariotti. Both well educated in Natural Science and equipped with strong Computer Scientific background.
08.02.10
Morphological classification of galaxies.
Like in biology and other natural sciences there is a classification-schema for cosmology/astrophysics to classify stars, galaxies and other objects. I naively believed, that the task to classify the morphology of galaxies (E,S types etc.) is done by state-of-the-art pattern recognition algorithms.
It seems that this is not the case. The project Galaxy Zoo for example instructs layman to classify galaxies according some simple rules.
I was looking for publications about morphological galaxy classification and was surprised about my findings. There is not much published on this topic. Now, there are two options: a) there is not much to say about it, because the problem is that trivial or b) there has been not much research done into this area so far.
I favor b) because of the existence of the GalaxyZoo Project. I wonder how state-of-the-art algorithm perform dealing with these problems.
Contact and suggestions: Dr. Marcel Blattner
01.02.10
LWS goes to the IMCIC (international multi-conference on complexity, informatics and cybernetics), Orlando Florida, 6.4.10-9.4.10
Our Paper, B-Rank: A top N recommendation Algorithm was accepted by the IMCIC and we have a slot for a presentation. We are glad to have this opportunity.
Contact and suggestions: Dr. Marcel Blattner
08.01.10
Knowing your users inside out - mission impossible?
Since several weeks I collaborate with Dr. Matus Medo (Physics Department University Fribourg). We work on a model. The aim of the model is to better understand noisy data, generated by recommender systems. Or in other words: to better 'understand' users. At the core of the model two mechanisms are at work:
- Self-inconsistency: from experiments it is known, that users rate the same object differently at different times. So there is no 'true' rate but rather a probability distribution over the rating scale for each user.
- Social pressure: users tend to synchronize their opinions towards the average opinion given by peers. This has been demonstrated impressively by G. Berns et al. in diverse experiments. This social pressure forces all opinions in one direction.
We have first results. To see, if our model makes sense at all, we have to compare our results to real world data. It seems that this is the harder part. More precisely we are looking for data having the following properties:
- Votes: users have voted for objects/information
- Users had knowledge about other users rating
- Each data record has a timestamp
- The platform design (experiment) didn't change much during a long time period
If somebody knows about such data or knows somebody owning such data we would be very thankful for an e-mail.
Contact and suggestions: Dr. Marcel Blattner
15.12.09
Artificial intelligence reloaded
The field of artificial intelligence (AI) was founded 50 years ago. The goals were ambitious, too ambitious. It is accepted that AI has failed to realize many of early promises. Some of the pioneers and new scientists want to start a "do-over". But this time they want to get it right.
They launched a new project (MMP) Mind Machine Project, with a budget of 5 Mio. Dollars. The project is limited for five years. Researchers are determined to revise fundamental assumptions in all areas encompassed by the field of AI, including the nature of mind, memory and body. Gershenfeld, a professor of media arts and science says: "Essentially, we want to rewind to 30 years ago, and revisit some ideas that had gotten frozen".
The researchers will focus on the following areas:
1. Mind: how do you model thought?
What's been missing so far is to solve a problem in different ways. But that's how humans mind works.
2. Memory:
Because we don't reason with precise truth, computers need to learn ways to reason, that work with, rather than avoid, ambiguity and inconsistency.
3. Body:
Computer science and physical science diverged decades ago. Computer work by processing sequences of code. But the mind doesn't work that way. In the mind everything happens all the time. A new programming approach, called RALA (reconfigurable asynchronous logic automata) attempts to re-implement all of computer science on base that looks like physics.
Reading the goals of the project, one is left with: let's wait another 50 years.
Contact and suggestions: Dr. Marcel Blattner
30.11.09
Search engines are hard workers. They are our information retrieval slaves. Are we satisfied with them? Do they a good job for us? ‘Yes’, would be the most quoted answer, I guess. But what does ‘most’ mean here? Does it mean 99%, 80% or 50.0003%? Giving an answer to this question implies to measure user satisfaction and we all know: that’s tough. Because of this, one is not in a position to give definitive answers.
However, there are some studies giving evidence, that 30%-60% are rather frustrated and therefore NOT satisfied. Before giving some reasons, I quote some interesting figures:
Between 67% and 78% of the users solve their information needs with two queries or less . Between 60% and 85% of the users don’t go beyond the first result page. 66% of all users review less then 5 results. Jansen and Spink conclude that users’ queries are not complex in general and the first results are mostly relevant to the information needs. Furthermore, 50% of the results are relevant to the users.
But this means: on average 50% of the provided results are not relevant. This figure further increases, if users put more complex queries. This result was reproduced by Hawking et al. in a slightly different setup.
Another study says: the average user spends 3 minutes for a search session. Most users are dissatisfied when the search session lasts longer then 3 minutes. All in all, there is evidence, that users are not entirely satisfied.
Ok, now the why - just a few reasoning.
First of all: the very basic mechanism of search engines -like google- follow a ‘wisdom of the crowds’ approach. And the ‘crowds’ consists mainly of web administrators, bloggers and companies in general. Therefore the top results are a kind of consensus. From this point of view, it is clear, that search engines are not able to satisfy everybody needs.
Secondly: Google’s top results are highly skewed towards online stores if you search for something you can buy online. Try it with ‘flowers’, ‘cd’…you name it. To receive specialized information you would have to dig deeply.
Thirdly: Many ‘concepts’ in our language are ambiguous: apple (fruit or computer), jaguar (animal, car, operating system). There are many many examples. At the time, search engines are not able to ‘know’ what you exactly have in mind, when submitting your query.
Of course there are more reasons. I guess the problems become worse in future and we should start thinking of new concepts. We need systems able to understand our individual needs. The research community tries to tackle these problems from diverse directions. Good. What are your experiences with search engines?
References:
[1] B.J. Jansen and A. Spink, An Analysis of Web Documents Retrieved and Viewed, Proceedings of the 4th International Conference on Internet Computing, 2003, pp.65-69
[2] B.J. Jansen, A. Spink, J Baeman, and T. Saracevic, Real life information retrieval: A study of user queries on the Web, SIGIR Forum, vol32. no 1, 1998, pp.5-17
[3] D. Hawking, N. Craswell, P. Bailey and K. Griffihs, Measuring search engine quality, Information Retrieval, vol. 4, 2001, pp.33-59
[4] C. Silverstein, M. Henzinger, H. Marais and M. Moricz, Analysis of a Very Large AltaVista log, Technical Repport, Digital Systems Research Center, 1998.
16.11.09
Some days ago the ACM Recommender System Conference 2009 ended. Some of the most recognized researchers were present, one of them is John Riedl. He gave a talk addressing the recommender research challenges. Here is a summary:
- How can one be sure a recommender system makes sense from a strategic point of view?
Remark LWS: On rule of thumb: if there is enough object diversity in a portfolio and diverse users accessing the recommender, then there is evidence a recommender is a real added value for users. - 'Cold-Start' Problem is over estimated.
Remark LWS:Cold-Start problem addresses the situation where a user or an object enters the system the first time. In such a situation, there is not enough statistics to make good predictions. This problem was recognized as main problem in recommenders till now. Riedls answer to this problem: "just be creative". However, it is not clear what this means. A final answer can only be give by real experiments and user feedback measurements. - Find a good balance between data and algorithm.
Remark LWS: As I pointed out in one of my publications, sophisticated algorithms only make sense in particular situations (data topology). - How, when and what.
Remark LWS: This is indeed very important. It is known that the way a system presents the recommendations, has a very high impact to the success of a recommender system. However I doubt there is an universal concept for this question. The how, when and what will always be context dependent. - Choose the best feedback mechanism.
Remark LWS: That's also an important point. An example is the rating scale which is presented to users (e.g five star scale, binary scale etc.). Does it make a big difference, when a recommender systems predicts an item as 4.8 or 4.82 on a five star rating scale? Very often a binary scale is enough. Again, this issue is context dependent and has to be carefully checked for each recommender system. - Measure everything.
Remark LWS: I doubt. As we know, ratings and other user behavior is very noisy. I would rather say: learn to measure the relevant things!
Contact and suggestions: Dr. Marcel Blattner
4.11.09
Low hanging fruits are gone! As outlined in my last blog entry, research on recommender system has to be more then just increasing prediction accuracy. How to precede? To go beyond the current stage we have to invent models. Models about voting behavior. Such a model could give insights what 'boundary conditions' lead to a particular data topology. With this knowledge better recommendation strategies are possible. To build a model we are faced with the following questions: how self-consistent is human voting, how strong is the influence from peers? It is clear: this is a tricky task and knowledge from other fields has to penetrate. As I said: low hanging fruits are gone!
Contact and suggestions: Dr. Marcel Blattner
23.10.09
Recommender Systems: They are the future tools to find and analyze preference dependent information. The last decade researchers mainly focused on prediction accuracy. But the last two years showed: we have to go beyond such measurements. The usability of a recommender system is not only dependent on prediction accuracy, but also on diversity between recommended items and on the way items are presented to users. The laboratory for web science developed an algorithm to fulfill ranking accuracy and diversity. The algorithm is very promising, since it performs well on sparse and dense data. Check it: B-Rank.
Contact and suggestions: Dr. Marcel Blattner



