Past Events

03.04.2017

Taking Big Data with a Grain of Salt (Summary of Presentations)

XU, Hong | ZHANG, Michael Xiaoquan

Recognising the importance of knowledge transfer between those in academia and the wider community, the HKUST Business School has created a lively forum to address pertinent issues and promote interactive exchanges.

Rebranded for 2017, the “BizInsight@HKUST - Lunch Presentation Series”, returned in late February with two experts from the school tackling a topic of increasing concern for businesses – and individuals – in Hong Kong and around the world.

The realities of big data and online social networking have become part of our daily lives. But with so much information now so readily available - and so easily shared – it is ever more apparent that there is also “dark side” of big data, which can’t be ignored.

At one level, companies, with their new enthusiasm for analytics and interpretation, can misread what the database tells them, reaching false conclusions and taking wrong turns as a result.

At another, many private citizens simply don’t want their personal information and photos “tagged” or shared by friends for the rest of the world to see. However harmless the intentions, they dislike the “peer disclosure” aspect of social media, which is now so prevalent, and are wary of potential problems it can bring.   

Speaking first on the subject of “Big Data Pitfalls”, Michael Zhang, Associate Professor of Information Systems, Business Statistics and Operations Management (ISOM) at HKUST Business School, highlighted his top ten and, as importantly, how to avoid them. 

One misconception, he said, is to think that systems will keep improving our ability to make decisions. A second is to assume the future will be all about smart robots, workforce analytics and the jobless society.

“We have powerful tools, but not enough people who can ‘do’ big data right,” said Zhang, citing examples where Walmart and the makers of hit TV series House of Cards made wrong assumptions about sales patterns and viewer preferences based on apparent correlations which proved incorrect.   

“You must know how the data is generated. Otherwise, you just have an unexplained correlation, for example between egg sales and transport accidents in the US. Even school kids can see the mistakes.”

He also noted that using Twitter to do studies can skew results. Since the platform is driven by young people, it does not give a representative sample of the whole population. Similarly, data from online ratings and reviews is also open to question. Such feedback is usually at the most positive or most negative ends of the spectrum and the motives of reviewers remain unknown.

“Social science is more difficult than natural science, so you can get a lot of uncertainty and unintended consequences,” Zhang said. “AI [augmented intelligence] is powerful, but you get problems ‘in the wild’ with dirty data, misleading statistics, and the ‘garbage in, garbage out’ factor. Also, while over-generalising from one data point to one population leads to wrong projections, using observations of a population to judge individuals is equally incorrect.”

Therefore, he cautioned against analytics without theory and the illusion of predictability for things like earthquakes and financial crises.

“Models are simple, but realities are complicated,” Zhang said. “Big data can only give us average effects. To do it right, you need a theory, logic and rationale, as well as a mechanism, and a way to verify the results are true.”

Exploring a different angle, Hong Xu, Assistant Professor in ISOM at HKUST Business School, spoke about “Peer Disclosure in Online Social Communities”. In particular, she noted the privacy issues now raised by the posting or sharing of sensitive personal information by “friends” and “non-friends” often with scant regard for the subject’s wishes or possible consequences. 

As examples, she cited home videos uploaded without consent going viral and instances of photos taken and shared, with the clear intention of causing annoyance, embarrassment or worse.

Therefore, it was important to find a workable balance between the benefits of online communities, which include entertainment and recognition, and the need to accommodate privacy preferences. Economic modelling and game theory could be used to analyse interactions and “unwanted externalities” and, thereby, create a viable solution.  

“Social networks are all about sharing,” Xu said. “But they also have to recognise the problems of peer disclosure - and we don’t see them doing this. So, in our study, we first looked at problems with an economic impact like pollution, smoking and congestion. We then built a model to capture different user behaviour in online communities and address the harm caused by peer disclosure.”

Initial conclusions point to the need for regulation, with suggestions focusing on some form of quota to limit excessive posting plus policies designed to “nudge” users in the right direction. Software can check for sensitive or abusive words. And a slightly longer waiting time before sending would help to curb impulsive instincts.  

“Social networks can nudge users with a warning message if, for example, there are other people in a photo,” Xu said. “If sharing is a little more difficult, it will encourage more regard for others’ privacy and safer online environment for everyone.”

The article was published in SCMP on 29 March.


Comment

Try different words

THANK YOU

Thank you for your comments!