Smart Blog

Technologies Used For Data Mining: Sentiment Analysis

Davide Avella,

Quali sono le domande più frequenti sui motori di ricerca?

There are so many searches on Google, and they are quite diverse. Recently, the list of the popular queries made on the popular search engine was published and the results were really quite surprising.

You can try guessing what could be on the top ten but most likely, the results will be far from your expectation. Questions like "What are the official languages of the Netherlands?" probably won't be within the top 100.

Surprisingly, against all expectations, the most asked question is "What is my IP". The second msot common query is "What time is it". This is followed immediately after by some kind of geographic indication, searching for the exact time on the various places of the world. In the third place, we find "How to register to vote" -a typical question for Americans who want to take part in the voting process. The fourth, "How to tie a tie", doesn't need explanations why it belongs on the top of the list.

Search queries can be divided into different types. A typically English classification follows the rules of the five "W"s: who, what, when, where, why. Later, another element was added to the well-recognized set. The "H", or the "how" was added to the five "W"s. This type of query is also very popular with search engines. "How to tie the tie" as seen before, or "How to do Sentiment Analysis".

And this is precisely the question I will try to answer in the next paragraphs. Sentiment Analysis examines posts, articles, and other texts online in order to assess the impact they can have on web users. So let's see three possible ways to go through this kind of analysis.

Understand what goes into a Business Intelligence platform and how it can help you.
Download your FREE guide: Pentaho from A-Z!

Pentaho from A-Z: Download now!

Method I: Choose a programming language

Sentiment Analysis came about after Text Mining. Therefore, a programming language is particularly suitable for building automatic learning models for predicting the positivity or negativity of a text. With the advent of Data Science as a discipline, two languages have been established: Python and R. Both have a number of libraries for the processing of natural language and lexical resources of different kinds. NLTK is the Python library, while for R the module is TM which has native packets for reading different PDF and XML file formats.

Method II: Choose a software

If you are unfamiliar with the programming languages, another option is to rely on tools with interfaces that allow you to build the necessary processes. Among the many available, here are a few with their features.

1. Weka

It is an Open Source Data Mining software included as component within Pentaho that provides methods for pre-processing text, such as drawing information from a database and reading CSV files, and a set of automatic learning algorithms.

2. Knime

Open Source Reference Platform providing more than 100 modules and a wide choice of advanced algorithms. Limited, however, in the open version to small datasets.

3. Rapid

A miner equipped with an advanced graphical user interface that helps in displaying information in descriptive ways, like histograms, and also helps when developing workflows.

4. Qlik

Holds a connector that lets you query different APIs for Sentiment Analysis. Its strength is the batch feature, which queues calls and allows you to work in parallel with it.

Method III: Choose the service

The third possibility is to use an API, a service connected to a web server that, once analyzed, returns as a positive or negative sentiment value of the submitted text. Many of these services are easily accessible via the web or can be recalled both within software and programming languages. Within a certain number of calls these services are free, after a certain threshold you have to pay a subscription.Analisi del Sentiment sui social networkAs it is well documented, Sentiment Analysis is quite accurate when it comes to dealing with long texts, since it is easier to rebuild the context. However, this ideal situation is very far when it comes to social networks where the texts are mostly very short, with slang expressions, emoticons and abbreviations. So several products have specialized in analyzing sentiment on web and social texts, even in languages very different from ours like Chinese!

Sentiment Analysis is just one of the many aspects that Business Intelligence can play in Retail. Pentaho, the most complete and innovative BI Open Source platform, uses Weka and many other useful tools to analyze your business. Download our FREE Pentaho guide: Pentaho A-Z and see how it can support your business with data analysis and Business Analytics!

Pentaho from A-Z: Download now!

Condividi l'articolo



Condividi questo blog


Recent Posts