Digging through customer review information appears to be a hot topic these days. There are a multitude of tasks that fall under the umbrella opinion mining, a few of which are:
- Feature identification – identifying features belonging to products in unstructured data
- Opinion word identification – identifying which words actually indicate a statement of opinion versus words indicating statements of fact
- Sentiment classification - determining whether a statement of opinion is positive or negative
- Opinion representation – what is the best way to present the mined data to the end user OR what the output of the system should look like
- Opinion summarization – collating multiple opinion statements into a coherent summary
I have only just begun digging through the literature, but the first strikes me as particularly challenging. Determining the features an item has is basically a knowledge induction task. You need to form relations between the base object (e.g. digital camera) and its possible features (e.g. battery life). Because of the difficulty, it appears most research has used a hand-built ontology to achieve this (or if not a full ontology, at least a list of possible features for each item). The problems compound when you bring in opinion words. For example, opinion words like “long” may be good when applied to battery life for a camera but are not good when applied to the processing time after taking a picture. There is a lot of room for research here and no shortage of people vying for dominance in this space.



[...] for my new job, I will be mainly working on opinion mining, which I have written about before. I expect I will be writing about it a bit more here in the [...]