Sunday, October 18, 2015

Analysis of Big Data alternative to the pre-election polls – Puls Biznesu

The research team led by prof. Vladimir Gogołka of the University of Warsaw, several years ago adopted the so-called. Data refining Big Data, based on the forecasts, among others, the results of parliamentary and presidential elections.

“winnowing valuable information from Big Data requires some specialized software tools. Their function is to collect entries – information from the network (doing the work), searching for phrases containing a specific name, eg. A company name that are in the vicinity of words referred to as sentiment. Eg. Abacki policies is a good economist. By counting the number of phrases with positive and negative sentiment (in the example of the positive sentiment is good) we obtain the opinion of Abackim, eg. 100 thousand. good reviews 1000 bad “- explained in an interview with PAP Gogołek.

Professor with the team used the method of refining large datasets on the occasion of presidential and parliamentary elections in 2011., then it has proved its high reliability. A similar analysis was also performed during the finals of the campaign in this year’s presidential election.

“The parliamentary and presidential elections in 2011. Results are provided flawlessly. In the last presidential election meaningful is the percentage difference (just 0.66 per cent.) Between the numbers of positive sentiments regarding each candidate, collected by our tools on the eve of presidential elections in 2015, which was 2.44 per cent., And the real difference that separated Andrzej Duda and Bronislaw Komorowski – 3.10 per cent. “- explains the professor.

The expert explains that the refining large datasets provides a valuable alternative to quantitative surveys, and by automating processes, its cost in comparison to conventional methods is much smaller.

“The classic study based on an analysis, most often categorized answers to questions that are asked specific representative number, hundreds, rarely thousand people. Refining are subjected to while millions of entries. Eg. In recent studies of John Paul II, refining surrendered about 5 million records. The reliability of the test classic example is the representativeness of the sample. Thousand people in the refining credibility imply previously obtained results, “- says Gogołek.

“In comparison to traditional research costs of refining are marginal, especially if you have a relatively standardized tools, robots that collect entries, identification sentiments, calculating times the sentiments” – adds the professor.

Refining Big This should not be considered as a research experiment, the professor argues that it and similar tools are commercial applications.

“Refining covers a very broad spectrum of possible research min .: monitor brand – to identify current threats positive image of the brand, the possibilities collection sentiments regarding the trading of listed companies – tests showed extremely high correlation predictions with actual quotations of four listed companies (Enea SA, KGHM SA, Synthos SA and Tauron SA). Similar to studies brand is easy, using the refining, monitoring trading organizations, parties and individuals. Identification of threats: crime, the drawbacks of mass products, etc. “- Says Gogołek.

Refining large data sets is also taught at the Faculty of Journalism and Political Science, where students use these tools, among others, to conduct a study, the results of which utilize the diploma theses.

As explains the professor, refining, for several years, is part of his lecture on new sources of journalistic information and, in addition, students have access to the tool, eg. when writing theses.

A survey conducted this year by IBM’s Institute for Business Value among executives of global companies showed a strong need for different types of data analytics into daily practices of functioning of enterprises and organizations. According to Hal Varian, Google’s chief economist, Big Data Scientist, researcher or data, will be one of the most sought after professions in IT over the next decade. It is estimated that already by 2020, the network will grow to 45 zetabajtów. Until then, the gap in the labor market in the US will amount to more than 1.5 million vacancies, waiting to fill by specialists from Big Data.

LikeTweet

No comments:

Post a Comment