Student Comments – November 12, 2008

Haoqi Zhang

Knowledge Sharing and Yahoo Answers

The main contribution of the paper is characterizing and analyzing the forms of peer production that occurs in Yahoo Answers. As the system is quite diverse and consists of q&a of homework questions versus discussion on topics without a necessary right answer, I found it interesting in the author's graphical analysis of ask/response relationships among different users and categories. In technical categories, for example, we saw that users tend to stay in their own categories.

From reading the paper and playing with the system, I've found that top answers and points seem to be a strong driver of the system for encouraging

participation and lengthy responses. Likewise though, top answer responses tend to be lengthy. However, this doesn't mean that the answers are 'good'. For example, an answer to a math question that gives the steps towards the answer may be rated a top answer, and one may not necessarily explain the underlying concepts that lead to the solution. Whether the questions are shallow or not, it could be that the system's incentive structure is promoting the kind of responses that address only the surface features of the problem without clarifying the underlying misconceptions.

There also seems to be a lot of 'wasted effort' on yahoo answer, in that many questions are answered very quickly and that many of the responses are not so useful. Different mechanisms for very quickly deterring users from answering questions that already have good answers (e.g., a decaying point over time since first answer) may be interesting to explore.

Also, someone thumbed down one of my answers. Could this be the guy who answered after me? It is unfortunate if this is a form of manipulation (and should be stopped!) Would perhaps better clarification of how to rank answers change the resulting interaction, e.g., best answer vs. best explainer?

Crowdsourcing and Knowledge Sharing

The main contribution of this paper is in providing a systematic analysis and understanding of how users behave in a crowdsourcing website. In particular, the authors show that winners are encouraged to continue, often win early, and learn quicker to approach tasks with less competition to maximize their potential earnings. These conclusions are significant for the design of the website and incentive structure: if a core group of users are winning most of the tasks, how can we keep these users, and more so, what should we do with the rest of the users? Part of the question here is exactly what to make of the efforts of users who do not win -- on one hand they drive competition, but on another hand they are wasting their own time and realizing it quickly by quitting the site. It would be interesting to see how the site would work in a staged setting, where the first stage is a small project along the lines of a test / interview and that the winner is chosen to participate in the task full time with larger compensation. Perhaps this would lead to a better distribution of people applying their knowledge to the tasks they are good at.

I didn't quite understand what the authors were trying to say about submitting later. In particular, the authors seem to suggest an implication that submitting later leads to more success instead of correlation -- that maybe more skilled users who win are also more likely to be procrastinators or work better under deadline pressure.

Subhash Arja

Paper A

This paper looks at one specific Witkey site Taskcn.com, which is based in China. The main study of the paper revolves around finding the strategy "winning" users use to obtain the highest success rate and the general trends that occur through observation of the number of submissions of participants and their success rates. The authors conclude that winners tend to submit their solutions to tasks much later in the time period alloted for the task, and they also choose tasks that have few submissions to maximize their chances of winning. One result I found particularly interesting was that the size of competition was a better predictor of a winner than the expertise of the user. This results seems to undermine the purpose of the site since a person with the highest expertise should submit the best solution. I also thought the concept of "a successful start leads to more future success" was somewhat similar to a previous paper we read about the "momentum" study of users in Digg and Wikipedia. In this paper, the authors state that the earlier a user gets a successful submission the more he will continue to submit. Similarly, in Digg and Wikipedia, it was found that users that built a momentum of consecutive submissions would be more likely to continue submitting in the near future.

Paper B

This paper does a study on Yahoo! Answers system to discover various trends and correlations regarding the entropy of users' interests and whether having a high success rate necessarily guarantees future success. The study is important for the designers of Yahoo! Answers as well as developers seeking to design and implement similar information aggregation schemes since the ultimate goal is to have many people participate and people with expertise in specific areas contribute largely to their specific area. There are many websites like Yahoo! Answers on the internet and this study could be replicated on those as well. This would provide interesting comparisons and contrasts that may depend on the participants or the way the system is designed. One unexpected result from the study was that content value was only used in selecting the best response 17% of the time while socio-emotional value was used 33% of the time. This reveals a flaw in the implementation and incentive design of the system since the goal should be to select answers that are the most accurate. Having a user rating coincide with each user and based on how many of that particular user's answers were chosen as the best response would change this percentage in my opinion. This would be similar to the eBay and Amazon selling model since customers would always prefer a highly rated seller. Another unexpected result was that having a high success rate in the past does not give any guarantee that the user's response will be chosen as the best answer. This also seems to be a flaw in the system.

Ziyad Aljarboua

This paper examines the behavior of Taskcn users and their submission trends.

This paper focuses on value of award vs. level of effort, factors of selection

and analysis of successful winners.

The first goal of this paper was to establish a relationship between user

participation in a task and task properties. It was found that skills required

and amount of award, while proportional to each other, are negatively

correlated with workload. This is a subjective analysis that mainly reflects

the ratings of the raters who evaluated the tasks to determine the level of

work. Such conclusion might be different under different evaluators. It was

also shown that winning incentives users to continue their contributions. This

is clear from the fact that the vast majority of users stop participating only

after few contributions and that the probability of winning is slim. Timing of

submission was shown to vary from user to another. The paper suggests that

returning users tend to submit their solutions near the end of the submission

period.

It was shown in this paper that selecting a less popular task increases the

user's chances of winning. However, the definition of popularity here is

unclear. A task could be unpopular if the requestor offers small award for a

relatively hard task since tasks with high amount of award have high number of

visits. In this case, the user will increase his/her chances by selecting this

task but at the same time will reduce his/her utility. This represents a

conflict between the tendency of users to select tasks with higher winning odds

and user return.

The most interesting finding of this paper is the fact that winners tend to

improve chances of winning (# of unsuccessful submissions between successful

ones deceases with time) while the rest of users manage to worsen their

chances! While this is not a surprising finding, it is very interesting to

observe in such an online community.

The fact that 89% of the registered users never attempted any tasks suggests

that the model that witkey websites follow does not support growth. With only

11% of registered users participating in solving the tasks, the chances of

winning are already slim. Assuming that those website continue to grow and

larger percentages of registered users participate, the probability of winning

will go down. This will discourage more and more users from participating. So,

clearly, from my point of view, this is not a model that supports growth.

I think it would be very intersting to compare these findings with findings from

similar website that offer similar services but no reward for solved tasks. I

would assume that the these trends will differ greatly speacily for tasks with

heavy work load.

--------------------------------------------------------------

This paper examines Yahoo answers knowledge sharing activity. It studies trends

such as post and thread length and overlap rations for different categories.

I find the tread length vs. post length plot to be very interesting. I wonder

how did the researches arrive at this plot. Did they employee some sort of

automated program that scans YA webpages or did they sample pages from

categories and evaluate them personally? if it is the latter, then this raises

questions about the accuracy of there results.

This paper confirms many expected results such as the number of response for

different categories. It also reveals some interesting findings that reveal the

nature of YA. One would expected that the main users of yahoo are people who are

looking for answers. So, it would seem reasonable to expect that the majority of

YA users (those who submit questions) are either first time users or very

inactive ones.

While Yahoo answers succeed to benefit from the collective knowledge of users,

this knowledge sharing model still has some serious drawback. Most importantly

is the fact that answers can come from anyone regardless of his/her expertise.

While this paper finds that users tend to answer questions in certain

categories that are often related (e.g. computers and internet and Consumer

elections), it is still not a fact about YA users. Such concern is verified by

the how the metrics for selecting the best answer such as the reply length.

Also, this model of knowledge sharing is not immune to vandalism or

manipulation. For an example, a question that asks users for their favorite

product or better actress like in this paper could be manipulated to the favor

of one versus another.

Peter Blair

Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn

The authors study trends on Taskcn, a leading Witkey ("a site [where] users offer a small award for a solution to a task, and other users compete to have their solution selection"). In particular the paper endeavors to address three questions: (i) are tasks prices commensurate with the expertise and effort require? (ii) do users chances improve with more experience? (iii) what are characteristics of winner and losers? In response to question #1 we got the expected result that compensation and skill requirement are positively correlate, but a counterintuitive result with respect to workload, which correlates negatively t(there are many tedious low reward tasks and some high reward concise tasks) -- to make this argument more solid it would have been beneficial to cite the mean and various for the payoff othe 157 task represented, in addition to examining the correction for other fields, not just design, which is especially prune to high payoff tasks that are "concise", moreover one wonder whether the use of just two specialiststo rate the difficult of the tasks was the best choice since thoose attempting the task in real life will be a combination of both experts and non-experts -- perhaps even more non-experts than experts. Not surprising the answer to part (ii) is that player learn to play the game with more experience, choosing to submit responses later over time, choosing task that they are more likely to win. In response to the third question, winners are those whos probability for winning increases with the number of attempts and loser experience a contrary fate. My suspicion is that this effect will discourage participation over time, which would be unfortunate for Taskcn. One way to counter this trend is to give players a ranking based on the number of submissions tha they have and the number of times that they have won. IN particular if a lower ranked player beats out a higher ranked answer he/she will get additional ranting points. SUch a system provide an incentive for top ranked players to want to maintain thie raninking wil providing a mechanism for lower ranked answeres to climb the ranksing despite previously failed attempts. Overall the requester also benefits by increased competition, which may also have the effect of weeding out low quality reviewers who know that they truly have no chance of winning. (If not tuned properly this may take us back to square one though, when some potential answeres are discouraged from participation.) The paper states taht users with time gravitate towards areas where there is low competition, this led me to question whether there is a correlation between the compensation and popularity of tasks as well as between the difficulty and the popularity of tasks. These may also be areas of future research.

Knowledge Sharing and Yahoo Answers: Everyone Knows Something

In the second paper "Knowledge Sharing and Yahoo Answers," the authors study information exchange in the various fora of Yahoo Answers. In particular the paper focuses on what characteristics differentiate between forum of different types, which morevoer require or define expertise according to a range of criteria. In computer-based and electronic forums there is a well defined notion of an expert as someone have specialized knowledge in computer programming; relationship forums on the otherhand require nothing beyond basic life experience, which excludes far less potential answers than other forums. Forums were characterized by number of replies and also the length of replies. It was found that there were typically less replies to technical topics as opposed to non-technical topics. COnverly, however, these answers to technical questions were on average longer. Using the ego analysis of Welser, the authors show that technical forums tend to be a lot less connected -- i.e. they are divide into two almost distinct camps of askers and responders. The more general topics tend to have networks that are more connected. This result is not surprising given the difference in the relative importance of expertise in technical versus non-technical fields. An interesting area of research would be open source software development, where users are more likely to be experts who view each other as peers. The expectation here is that although such a forum would be technical, since it consist of a group of peer experts this forums should exhibit more connectivity. Online academic publications should exhibit similar behavior e.g. arxiv.org with experts citing each other's papers. There were a few intersting coincidences betweeen this article and the previous one: the quality of users answers is not significantly increased by specialization and secondly users allocate time to popular topics have a lower chance of beign awarded the "best answer". Given that Yahoo Answers has no financial incentive of participants where as Taskcn has a financial incentive mechanism but they agree on these two central findings tells us something about agents behavior in a context where a social good (knowledge) is being created/shared.

Nikhil Srivastava

The Yang et al paper presents an empirical study of taskcn.com, an online marketplace for crowdsourcing tasks. The primary result of their data is a distinction between expert users who learn to operate the system such that their chances of winning rewards increases over time, and other users who rarely win and most often worsen their chances of winning over time. This is accomplished by a tendency for expert users to choose tasks with fewer competitors, tasks with higher payoffs, and tasks with lower workload. I was impressed at the quality of the data they presented and the degree to which it supported their (albeit always intuitive) results.

Their conclusions were especially interesting: "it will be necessary to incentivize this core group of winners in order to maintain their continued presence on the site". To me, this seems similar to the prediction market situation in which a large subset of "regular" users effectively subsidizes a group of "experts" who really run the show. In this case, a large group of users who have not learned optimal strategic behavior allow expert users to win most of the time. I think there's a tradeoff between incentivizing expert workers and educating non-expert workers. In the latter case, we might encourage more people who have answers to actually submit, knowing their chances of winning are equal to anyone elses, but we might also discourage former experts from playing at all.

The Adamic et al paper was a similar study performed on Yahoo Answers, a similar marketplace of information except without reward. Data was presented regarding the characteristics of different categories, the length of threads and replies, ego networks of discussion versus Q&A categories, and answer diversification versus specialization.

Brett Harrison

Knowledge Sharing and Yahoo Answers: Everyone Knows Something

By Adamic et. al.

Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn

By Yang, Adamic, and Ackerman

These two papers attempt to apply some formal analysis on Knowledge Sharing websites. The first paper looks at Yahoo Answers, a simple concept where any user can post a question on any topic, soliciting responses from other users. The user who submits the best responses get awarded points. The second paper analyzes Taskcn, a Chinese Witkey, where users can post problems along with monetary awards for the solutions to those problems, and other users can subsequently offer their own solutions.

The first paper gives interesting results about the types of networks formed by the users in different "clusters of categories", where the authors give three different clusters of categories based on a k-means clustering algorithm. In particular, empirical results are offered which describe the length of, frequency of, and number of responses to posts in that cluster, in addition to describing the overlap between question-askers and question-answerers in that cluster. The second paper gives equally intriguing results about the correlations between the type/award of task presented and the number/quality of solutions. The paper also tries to characterize the usual winners, and track their behavior as they win more tasks.

The results in the Taskcn paper about the "quality" of the submissions is a bit troubling, since the data was obtained by taking only two "professional" reviewers and having them give their highly subjective opinions. In order to use human opinions as data in this matter, they should have obtained data from many more reviewers to account for biases.

In order to solicit higher quantity and quality submissions, I wonder what would happen if Taskcn subsidized the projects but took higher cuts accordingly to still make profit. This curiosity is inspired by the results we studied in LMSR mechanisms where the market maker subsidizes the market to making expected returns positive.

Xiaolu Yu

Knowledge Sharing and Yahoo Answers: Everyone Knows Something

Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn

In recent years the online venues for creating and exchanging knowledge have greatly expanded. The knowledge creating and sharing venues took an interesting turn with the newly created general-purpose and open-to-public question-answer sites. What is unique about these QA sites is that they do not limit themselves to one or few focused topics as typical newsgroups and online discussion forums do. Rather, it readily invites people with varying interests in different topics, creating an extremely large repository of "everything one wants to know". These sites run by participating users' asking and answering activities, with some moderation features and incentive structures.

Yahoo! Answers is a place for people to ask and answer questions in order to share knowledge. Unlike a chat room, Yahoo! Answers is not a place for people to ask each other how they're doing or "talk" when they're bored. Unlike a message board, where a thread of messages can continue indefinitely without ever getting resolved, the questions in Yahoo! Answers are resolved when a best answer is picked. It is the largest knowledge-sharing community on the Web, where anyone can ask and answer questions on any topic. By connecting people to the information they're seeking with those who know it, it provides a way for people to share their experience and insight.

To encourage participation and reward great answers, Yahoo! Answers has a system of points and levels. The number of points you get depends on the specific action you take. The points table summarizes the point values for different actions. While you can't use points to buy or redeem anything, they do allow everyone to recognize how active and helpful you've been.

Every category rates the top ten best answerers based on the number of their best answers (chosen by askers or voters). People can either search for questions/answers they are interested in, or actually ask a question.

Categories are clustered into three different groups. The first one is discussion forum alike group with many replies of moderate length. The second is advice seeking group Many short replies with many short replie. These categories favoring non-factual answers tend to have longer threads, broader distribution of activity levels, and their users tended to participate by both posing and replying to questions. The third is factual answer group with fewer replies, but lengthy answers. Users did not occupy both a helper and asker role in the same forum, in contrast with discussion forum group where there is signicant overlap in asking/replying activity.

I have been thinking about the following questions by reading this paper. 1. How to identify experts? 2. How to decide if an answer is worthwhile? Given we cannot easily identify the expertise of a user, for one we may try to tell if a answer is worthwhile, and for another if we can identify the expertise of an asker, it will help us to pass his question to an appropriate expert. 3. What does the quality of questions mean to the quality of answers and behaviors of users? 4. Why Google Answers (offering real money incentive) did not survive? 5. How would users behaviors change if real money awards are introduced in to the system? Also, it would be important for us to get a clear sense about these points: What are the criteria for best answer selection? Factual-answer categories have more objective criteria than other categories. What are the problems with selecting just one best answer? The fact that only One best answer per question is allowed draws attention to answer-to-question ratio; answerers tend to choose less popular questions. What metrics are most predictive of best answers? Reply length, Number of competing answers, Track record of the user (most significant for technically focused categories) are three basic metrics studies in this paper.

Crowdsourcing, introduced in the second paper, is a neologism for the act of taking a task traditionally performed by an employee or contractor, and outsourcing it to an undefined, generally large group of people, in the form of an open call (Wikipedia). Taskcn is different from many of its peers. For example, in eLance and TopCoder, tasks are attempted after the requester chooses a provider or a team of providers based on his/their credentials and proposals. In Google Answers, participations are limited to expert answerers recruited; answerers would have exclusive locks on the tasks for a period of time. In Taskcn, users submit their work directly (new Taskcn offers multiple approaches based on the difficulty and complexity of tasks. ) Taskcn also differ from open question answer forums such as Yahoo! Answers, because instead of questions that are answered by other users without payment, the requesters offer awards for completion of tasks they pose.

We should be aware of that each of these approaches are all risky to some extent. On the one hand, Taskcn users submit their work directly, and concurrently with other users competing on the same task, they have little guarantee that their work will win the reward. On the other hand, the risk will be on the requester side in eLance and TopCoder; they have to stick to their original choice or providers even if the submitted solution to the task is not satisfying at all.

In Taskcn, a user offers a monetary award for a question or task and other users provide solutions to compete for the award. The website plays the role of the third party by collecting the money from the requester, distributing the award to the winner(s) who is (are) decided by the requester, and taking a small portion of the award as a service fee. It is socially stable – there exists a core group of users who repeatedly propose and win. The incentivized mechanism it employs works in a way that potential monetary award encourages people's participation.

Users strategy varies with time because they learn from their failure/success over time. They learn to submit later, choose less popular tasks, choose Tasks with higher winning odds, and they also raise award expectation. However, average users fail to improve their chances of win by this learning process. In contrast, a very small core group of successful users manage to win multiple tasks as well as to increase their win-to-submission ratio over time. Whether this is a case of the rich getting richer, since their successful wins give them a reputation that may enhance the chances that their submission is selected, or whether it true evidence of learning, remains unclear.

The task submitted must be necessarily of relatively low complexity and effort since given no guarantee of awards. In the case that task is attempted after the requester makes his choice (New Taskcn feature), the tasks are relatively more complex.

Some common strategies used by winners include successfully selecting less popular tasks from the very first attempts and always submitting than others. Winners are better at starting and sticking with such strategies that will improve their chances of winning. These two strategies above not necessarily lead to a winning of award. They are just some common characteristics shared by winners. Therefore, real winning strategies are still unclear.

A large fraction of winning task solutions is contributed by a small core group of individuals. Such skewness of contribution to Internet peer production systems have been widely observed. Participation is open to anyone, but a large portion of the content is contributed by a small minority of the participants.

The design applications suggested by the authors are Identifying the core group of winners early; incentivizing the core group (if a core group is necessary); drive large number of prospective users towards a site; making task awards and skills and level required by the tasks commensurate. Still, it would be interesting to further explore what are the winning strategies working for crowdsourcing and freelance marketplaces.

Nick Wells

Crowdsourcing and Knowledge Sharing: Strategic user Behavior on Taskcn

This paper undertakes a study of a Witkeys website, Taskcn.com. Witkeys are

websites where users post tasks to be completed competitively by other users

with a fee going to the winner. This study surveys the site data and comes to a

variety of findings. For one, they find that task winners tend to spend longer

on completing the task on average. They also found that first-play winners are

more likely to take on more tasks. In general, they find that users tend toward

tasks with less submissions in order to have a higher chance of winning (and

higher expected gain). This was the same across winners and losers.

Before this paper, I was not familiar with the idea of a Witkeys website, which

seems like an interesting concept. I would be interested in seeing if similar

findings could be found for certain markets on sites like craigslist or

auction-oriented websites. It could prove academically useful to develop an

economic model to encapsulate the ideas presented in this paper as well.

Knowledge Sharing and Yahoo Answers: Everyone Knows Something

This paper studies the participation on Yahoo Answers. First, they broke the

different categories of answer networks and found intuitive but interesting

results. Less factual dicussion topics tended to have longer discussions while

more factual discussion topics were much more curt. They also applied the idea

of entropy and found that

One interesting note of this paper is that Yahoo Answer's knowledge is very

broad but not deep. I am not sure if this is related to the idea of entropy. I

do not quite understand their application of the idea of entropy which would be

good to discuss in class. Regarding this paper, I would be interested in seeing

how depth vs. breadth plays out relatively between categories and perhaps

between other forums as well.

Angela Ying

Paper 1: Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn

I thought that this was an interesting topic to study - I had never heard of crowdsourcing before. Anyway, the paper focused on a witkey site, Taskcn.com where users pay other users to develop solutions, and the winner is paid a monetary sum. The majority of the paper focuses on the trends that the develop when users become more and more experienced in using Taskcn.com. Specifically, the paper notes that users on average do less work, choose tasks with less people submitting, and try to increase expected gain. An interesting result is that the majority of users actually get worse as they do more tasks, and win less often. A small group of winners is found to make up almost 20% of all wins and are particularly good at increasing their expected gain. This group exhibits the same preferences for tasks as other experienced users but are very good at actually winning money.

This result is interesting because it mirrors phenomena found in other web services such as Wikipedia, where a small group of contributors make particularly good contributions compared to others. A weakness of the paper is the narrow focus on Taskcn.com rather than a broad review of many witkey sites, since the paper mentioned that there were several. A possible extension would be to examine if there is a difference in characteristics between the winning group and mainstream - are these people of a certain profession, more technologically savvy, etc. Of course, because it is on the web it may be difficult to find this information.

Paper 2:

Knowledge Sharing and Yahoo Answers

This paper was an empirical study on Yahoo Answers and the types of people / questions that are asked on YA. The paper began by clustering the question/answers on YA by 3 means - length of average response, length of thread (# of responses) and overlap of askers / repliers. An important contribution to this paper was the result that there were two "types" of questions posed on YA. One of the types is fact-based, which tend to be shorter in thread length (but perhaps the best answer is longer in average response), and where the askers and repliers do not overlap that much. The other type is discussion-based, where the best answer is selected based on emotions and agreement. This type of question has a lot of overlap between askers and repliers, long threads, and perhaps shorter actual responses. In addition, the paper discussed expertise levels on YA and found that experts may not necessarily be chosen to have the best answer more than others. Finally, the author defined a user entropy function and correlated entropy with performance in technical fields.

I thought this was an interesting paper but it seems that the paper simply confirmed what we had all already suspected. In addition, this paper had a lot of different results but it would have been interesting to see some more theoretical ones.

I think it would be interesting to see how the performance of YA compares to the performance of more focused Q&A forums and discussion forums. For example, it would be interesting to look at the number of answers and length per thread in a football discussion forum versus a football thread on YA to determine if YA is actually a good place to go compared to a more specialized forum.

Victor Chan

Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn & Knowledge Sharing and Yahoo Answers: Everyone Knows Something

The two papers discusses the interactions between request makers and users that fulfill requests. In the first paper (TaskCN), the requests are made by users who need a job done, and the solution providers are users who are bidding to have their solution accepted. In the second paper (Yahoo answers), the request makers are the people asking questions and the answers are provided by other users. It should be noted that Yahoo Answers does not provide monetary incentives for people to participate, where as TaskCN provides a chance for the solution provider to win money.

The main contribution of both papers was to evaluate the systems. The TaskCN paper found that most users become inactive only after a few rounds of submission. Which is likely due to the low probability of being selected as the winner. However there is also a group of players that acitvely seek out to win, and continue to play. The paper also shows that users learn a few techniques to increase the likelihood of winning the bid, including: choosing less popular tasks, submit later, and choosing tasks with higher winning odds. The main contribution of the Yahoo Answer paper, shows the relationship between the category of questions, and the users that respond to them, and the number/length of responses. The paper basically categorizes the types of questions asked into factual types and discussion types. Where factual type questions tend to have not as many answers, but answers that are medium to long. The discussion type questions had many answers, however they were generally shorter answers. Finally the asker/replier overlap ratio was also examined to provide a metric to see the frequency asking and replying in the same category. This was found to be high in the discussion type categories. The paper also finds that there is no correlation between a user's entropy, and their chances of being selected as best answer. This is an interesting point, since low entropy would suggest a user is an expert in a category, which should increase their chances of having the best answer to a question.

To compare the two systems, I find that Yahoo answer's lack of monetary incentive could be limiting it's usefulness to only simple problems. However, TaskCN's up front effort for solutions before payment is also limiting it from taking on larger contract jobs. Another interesting system to look at would be Rentacoder.com, which programmers actively bid on contracts for coding tasks. In this system, the coder's previous experience, merit play a much larger role, since the chance of being selected depends on reputation/experience and the offering price. This type of a system mimics an auction more than the format of Yahoo Answers or TaskCn. A future project could be to examine this system and see how users choose who to give the contract to.

Alice Gao

Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn

The main contribution of this paper is to characterize strategies used by a small group of users are consistent winners in the user community. I think it is definitely important to examine strategic behaviors used by users in such a community. This is similar to a prediction market setting in which users would try to maximize their profit over time. In this context, though, the users are only in control of registering for a task, and picking a time to submit their solution. It is interesting that statistical analyses showed that the people who are consistently winning do seem to use some strategies such as submitting solution later, choosing less popular tasks, and learning to choose tasks with higher winning odds.

Reading this paper made me wonder about how the authors came up with hypotheses to test. It seems to me that the general approach taken in this paper is to find some statistically significant phenomenon in the data, relate it to some intuitive behavior patterns, and try to form an explanation of these behaviors. In this way, it feels like a rather disappointing approach because we are somewhat "blindly" searching for patterns in data using regression analysis. A more meaningful approach would be forming a systematic set of hypotheses based on certain theories in social behavior/game theory, and trying to verify these hypotheses using the data. Also, a related weakness of this paper is that, even though some of the speculated user strategies seem reasonable, the users were not able to give satisfactory explanations for all of the observed behaviors. Therefore, I think it is very important to start from the theoretical side and try to formulate hypotheses systematically based on these observations. Perhaps the logical next step would be to formulate such hypotheses and use the data to verify them.

Additionally, I noticed that the authors did try to give good reasons for the motivation behind this study, such as how to design such systems to make the website more successful. So it would be interesting to explore questions such as what incentivize users to participate, etc. Also, comparing to the paper on Yahoo Answers, an interesting question to ask would be how do users behave different in such systems with and without money.

Knowledge Sharing and Yahoo Answers: Everyone Knows Something

The main contribution of this paper is to characterize certain user behaviors and interactions in the Yahoo Answers community. One thing that is both interesting and surprising to me is that the authors were able to differentiate different topic categories based on very primitive properties of threads such as the length of answers, and the number of answers, etc. This approach is quite unique, and I would never think of inferring such high-level categorization using low-level data like this. I think the main limitation of this paper is that it is trying to explain too many things at once. Every paragraph seems to introduce something new and the content is not very well organized overall. Perhaps this paper could be broken down to two papers each focusing on a more specific topic.

Also, this paper also doesn't motivate the research topic very well. It seems to assume that people care about anything related to Yahoo Answers. It would be more effective if the paper could relate its results to concerns such as how can we design the rating system and overall forum structure to make the website more successful.

Sagar Mehta

These papers study empirical data from two knowledge sharing sites Yahoo Answers and Taskcn. Upon looking at YA I came to doubt the Eckhart Walther's claim that Yahoo Answers is "a searchable database of everything everyone knows". The site seems to be set up more as a discussion forum than as a knowledge base. As a result, few users seem to actually provide useful information, and the reliability of that information is questionable based on the nature of the site. I also question the overall utility and benefit to society of having such a forum. All the kids asking for homework help are probably a bit stupider for not having tried the question themselves. I'd be interested in knowing how users generate new topics/questions. Unlike Wikipedia which seems to have a logical growth of entries based on references, there is probably less of a pattern to user question inputs.

On the other hand, I found the website described by the second paper to be pretty intriguing. I don't know how exactly the site works, but it does seem open to manipulation. For instance, if I post a project on a site through my account, I can always create another account and use that to post a response to my own question. I can then vote for my alias as the best answer, but I'd still be able to read other users submissions, correct? I found the papers conclusion's with respect to winning as an incentive to continue and users learning over time fairly intuitive – except for the paradox of users failing to improve over time. However, in the long run if I'm trying to maximize my expectation I might submit several mediocre answers rather than one or two good answers. So, the failure to improve may be explained simply by the optimal strategy for some users.

Michael Aubourg

I would like to speak about the incentive people have to answer/ to ask and to answer CORRECTLY to the questions.

There is no point to have a lot of agents in such a question-forum without checking the quality of the answers.

I often go on this Website, and I am surprise that the paper does not even mention the points system. (It's maybe too recent).

What is the goal of this system ?

The points system is weighted to encourage users to answer questions and to limit spam questions. There are also levels which give more site access. In addition to this, points and levels have no real value, cannot be traded, and serve only to indicate how active a user has been on the site.

A big disadvantage to the points side is that it encourages people to answer questions even when they do not have a suitable answer to give, in order to gain points. This is what I reproach to social knowledge sharing system : the information is rarely excellent. There is no point in having thousands of different answers. One, containing the truth is enough.

On the other hand, people ask questions to gain more knowledge since they lose point when asking questions.

The point system encourages users to answer as many questions as they possibly can, up to their daily limit. Once a user shows that they are knowledgeable within a specific category they may receive an orange 'badge' under the name of their avatar naming them a "Top Contributor" which is a kind of recognition. It is also a way to weight answers quality. The user can then lose this badge if they do not maintain their level and quality of participation.

Unfortunately, I noticed that once a user becomes a "Top Contributor" in any category, the badge appears in all answers, questions, and comments by the user regardless of category which is not very smart.

An expert is rarely an expert in many different fields.

This is my point of view of YA :

- The network contains a lot of dubious and strange questions that do not concern a lot of people

- I do not trust the validity and relevance of most of the answers. Currently, it is still lacking reliability.

Avner May

Knowledge Sharing and Yahoo Answers: Everyone Knows Something

Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn

In the Yahoo Answers paper, I thought that they did a good job analyzing some basic characteristics of the knowledge sharing which occurs in YA. However, none of these results were particularly surprising or enlightening in my opinion. It did not surprise me that more focused responders tended to give better answers, or that more technical questions tended to have shorter threads, or that the content in YA covers a lot of breadth, but not much depth. I do not think I agree with this being the “future of search,” as is quoted in the beginning of the article. In some sense, it is nice that you get your question answered specifically; however, I would only resort to YA I the case where Google and Wikipedia had failed me. Waiting for an answer on YA can take more time than I’m willing to wait. With regard to it being a knowledge base, it by no means seems like a dependable one, since you depend on someone else having already asked the question you have. Nonetheless, as time passes, more and more questions get asked, answered, and stored online for others to view. I am curious about what cross-sections of human knowledge YA gathers, as compared to Wikipedia. Do these forums gather almost disjoint sets of knowledge?

In the Crowdsourcing article, I found the elements that had to do with strategic actions of the users interesting. Once the element of financial incentive is thrown into the equation, people begin to act strategically. This analysis is interesting from a pure game theory perspective; what strategies work best? How long does it take people to reach or learn good strategies? What are the equilibriums in this system? I think the design of any website of this sort needs to work very hard to ensure that the incentives line up correctly.

Hao-Yuh Su

1. Knowledge Sharing and Yahoo Answers: Everyone Knows Something

2. Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn

These two papers investigate two different knowledge sharing websites, Yahoo Answer (YA)

and Witkey, individually. They both examined the characteristics of posted questions and

participating users, trying to formulate the mechanism inside the knowledge sharing system

and a possible way for future growth. From the statistical data from the two papers, we can

find that the two websites have quite different properties. For example, in YA, most questions

and answers are simple; there are even questions that can be classified as opinions and conversation.

However, in Witkey, most questions have relatively higher complexities. I think it is because YA

and Witkey have fundamentally different designs. In Witkey, askers are regulated to offer a certain

amount of reward to the selected answerers (monetary-award competitive mechanism); however, in

YA, all activities are free and non-profitable. In the last paragraph of the 1st paper, it says:"whether

different incentive mechanisms could encourage YA participation by top level experts- who may

currently still prefer more specialized, boutique forums- while at the same time allowing the rest of

us to get our everyday, simple questions answered. I have some thoughts about these words. First,

I don't see the incentive mechanism in YA. Perhaps there are incentives for askers, but I don't know

if there is any incentive for answers so far. Especially, in this paper, it has been shown that

answerers group may be separated from askers group in some higher expertise category such as the

Programming category. Second, I am thinking about if the monetary-award competitive mechanism

(MCM) used in the 2nd paper is the solution to this question. In the 2nd paper, we do see the existence

of some winning groups that keep Witkey running. However, their expertise is not indicated throughout

the paper. What we can say is that the selected answers are relatively satisfactory. Moreover, if MCM

does work and attracts experts, it still may not satisfy the last expectation- allowing the rest of us to

get our everyday, simple questions answered. In the 2nd paper, the user strategy of MCM is identified

as following: users will tend to choose less popular tasks and tasks with higher winning odds and also

raise their award expectation. Obviously, according to this strategy, the daily simple questions will be

ignored under the competitive mechanism. Therefore, I doubt if there is any mechanism that can meet

these two expectations at the same time.

Rory Kulz

Knowledge Sharing and Yahoo! Answers: Everyone Knows Something

This paper is much more interesting for its techniques than the

analysis of Yahoo! Answers. I like the idea of using these "ego"

networks to give a heuristic for whether a person is more of an

"answer" person or a "discussion" person, and ditto for the motif

analysis. Both were very neat ideas that I hadn't exactly encountered

before.

At the end, the authors speculate whether "different incentive

mechanisms could encourage YA participation by top level experts." But

I imagine the problem here might be the quality of the questions that

deters experts. Most experts I'd guess do not want to waste their time

sifting through "high school students" looking for solutions. So

really you want to find incentives for more sophisticated users, not

quite experts, to seek solutions. Of course they play off each other;

in the case of Google Answers, by having users pay, you weed out more

frivolous questions while simultaneously attracting more sophisticated

answer-givers.

I'd also like to see a comparison perhaps between cross-posting on

Yahoo! Answers with cross-posting on Usenet or some similar, smaller

hierarchical system that involves not-quite-discrete communities and

discussion groups. It would be interesting to see whether as a user

base grows larger, the question areas become increasingly separated

from each other in terms of participation.

Crowdsourcing and Knowledge Sharing: Strategic User Behavior on Taskcn

I was surprised how few users actually have won relative to the total

number of participants on Taskcn. Given the sort of obvious notion

that "...the result of a user's first, and subsequent, competitions

can be an important factor in later participation behavior: winning

encourages users' contribution," it makes me wonder whether a

mechanism could be added whereby users answer some set of questions

upon registration and then certain tasks are suggested to users. The

authors suggest incentives based on prior tasks, but why not start at

the start? This would funnel users towards niche areas and maybe

prevent the problem of users being daunted by the sheer number of

tasks and low probability of winning (while simultaneously increasing

each users' probability of winning by lowering the size of their

competition). Note that it's not clear that merely funneling users

towards less popular tasks by "modify[ing] the interface" would

achieve this same goal.

I am mildly suspicious of some of the results on user behavior. I

wonder how much of failure to improve is simply due to failure to

mature in the necessary skills to win. "[Winners] tend to take longer

to submit the task." In other words, winners spend more time trying to

make a better submission? Well, no wonder they win!

Zhenming Liu

Following Monday’s class, we have another set of papers describing the behaviors of peer production system with focus on knowledge sharing system. I will continue to mumble/complain about the methodologies in papers of such type.

Firstly, I found I am not really able to access the science contributions for these two papers. Both papers pull out a lot of facts. Some of these facts match my intuition; while the others are interesting to know. Nevertheless, it is really difficult to tell the difference between these papers and essays written by journalists. Maybe this is due to the fact that I am lacking of social science trainings.

Clearly the availability of a wide range of computer science tools including massive data processing and discrete math modeling techniques allows computer scientists are able to revolutionize contemporary social science. However, I would believe this could only be possible when researchers in computer science and social science work together. In reality, computer scientists usually go too far in terms of modeling and focused on problems that are irrelevant to social science.

For example, in the paper discussing Yahoo’s answer, it is nice to see graphic presentations like Figure 1, 2, 3, and 4 in a computer science fashion. Also, it helps to understand the nature of Yahoo’s answer by computing some statistical indicators like (e.g., Table 1). However, I would believe the user’s entropy goes too far for the following two reasons: 1. It is inaccessible to general audients in social sciences; 2. It is a quite “arbitrary” model without enough justification. I think it is also because of the wide availability of different computer science models (especially in the study of AI), we need to be more cautious when we need to choose a model and present it to the social science community.

Andrew Berry

Knowledge Sharing and Yahoo Answers: Everyone Knows Something

My main problem with the methodology of this paper is in regards to how the authors drew conclusions about expertise depth. The authors conclude in their analysis that Yahoo Answers has a lot of breadth (as demonstrated by the three clusters) but very little depth. First, the authors only investigate Programming questions. I will grant that cluster 1 would not be a good group to examine expertise depth because many topics are opinion based, but I think it is questionable not to include the second cluster in which expertise could be measured in some cases. However, if cluster 3 is the only group that one can objectively measure expertise, using only 100 randomly selected questions in only the Programming category does not suffice to draw conclusions for all of Yahoo Answers. That only provides local results. I understand that the paper was using the programming category as a paradigm for all of cluster 3, but allowing the expertise section to only be limited to cluster 3 and more specifically the programming topic does not seem adequate for generalization. Essentially, 100 programming questions were used to determine the expertise level of all of Yahoo Answers! Also, we know that k-means clustering is far from full proof in practice so perhaps there are other categories, even within cluster 3, that could help produce more robust results.

I also wonder in regards to the best answer results if the results are tainted by how a best answer is selected. The paper does not specify how best answers are selected, but if it is similar to a voluntary response star rating system, this could be problematic. Most users will probably not rate the quality of answers and furthermore, a user may be able to rate his or her own response. Additionally, some clusters may have participants with a different propensity to rate responses at all. This may provide some insight into the surprising correlation results with entropy and best answers.

Crowdsourcing and Knowledge Sharing: Strategic User Behavior Taskcn

Given that the single most important factor in determining whether or not a contributor wins a reward on Taskcn seems problematic. Also, given the fact that later submissions correlate with higher rewards suggests that there is a way to game the system. Suppose we submit an answer to a question with 200 other competitors. What if our answer suffices, but is not the best. Would a normal user scroll through 200 suggestions to find the best response or stop after seeing our good response after 20 entries? Does Tasckn display response answers randomly? If not, this may be a reason for later submission. Additionally, what does this say about the quality of solutions? The paper also discusses that experienced users who have won multiple times seem to strategically seek out less popular tasks. Winning task solutions are generally contributed by a small core of individuals which presents another danger to the quality of taskcn solutions. Many people quit after being deterred in their first few submissions, experienced users learn to respond to tasks with few participants and the number of experienced users who win is very small. Thus the pool of contributors is a small group that may or may not have a broad expertise and these users are incentivized to seek out tasks with little competition and higher skill requirements. This seems like a dangerous mix and makes me very skeptical of the taskcn network. I am surprised that the paper did nothing to talk about the potential perils these trends could bring in solution quality.