Big Data: More than just a number?

We hear a lot about Big Data. But what does it actually mean? Is it, quite simply, lots of data? Or is there more to it than that? Spoiler alert, there is. A lot more. In this episode, we're taking a look at the age of insight, and how Big Data has evolved from a technical concept to a way of extracting enormous value from the fumes of data meant for other purposes.

Louise Blair:
Perhaps people wouldn't think that data analytics company focusing on vaccines and epidemiology would be looking at the size of poultry farms, but that's another data set that we can bring in to help understand the risk.

Michael Bird:
Big data. Yep. It's become a bit of a buzzword over the last few years, hasn't it? And it can mean a lot of different things depending on who you ask. In essence though, it's taking a lot of information and using it to find insights that you didn't know were there before. And where there're insights, there's value. Sounds pretty simple, right? Well, not exactly, because collecting a whole load of information is one thing. Finding use for it or at least using it, well that's a different thing altogether. And with humanity generating the 2.5 quintillion bytes of data per day, I promise that's a real number, and growing, knowing where to look for the juicy bits takes skill.
So that's what we're going to be looking at today. We're going to be lifting the lid on big data, chucking in some finely chopped information, seasoning it with some expert opinion, and hoping a delightful, insightful stew comes out the other end. You're listening to Technology Untangled, a show which looks at the rapid evolution of technology and unravels the way it's changing our world. I'm your host, Michael Bird. Big data is big business. Between 2018 and 2022, the global big data and business analytics market grew from $169 billion to a predicted $274 billion. What qualifies as big data in the simplest terms has grown too. In 1999, a one gigabyte data set would have been considered big data. Nowadays, data sets can run into petabytes or even exabytes spread across multiple sources and formats. But it's not just about how big your database is, it's what you do with it that counts.

Vedran Podobnik:
Hello, everyone. I'm Vedran Podobnik, Data Practice Lead for UK Ireland, Middle East and South Africa in HPE. I'm your also university professor of data science at University of [inaudible 00:02:46].

Michael Bird:
What is big data? What do we mean by big data and why is it so important?

Vedran Podobnik:
Around 15 years ago, the most important characteristic to say that the data is big data was volume of data. So basically if you have mass amounts of the data, then this type of data was called big data. This changed a little bit through time and there were different characteristics. First known as 3V, then 5V, which take into account the other aspects of how data can be described. So volume is kind of the foundational one. But then with the time, especially in academia, you always need to write a new paper, meaning you need to basically design the new theory. So then when somebody saw 3V, let me think about additional Vs which can be used. So one of those V is velocity, which is connected with how fast the new data generated. More and more data is basically through streams in the real time coming to data platforms. So the higher velocity of data, the bigger potential of data to be kind of classified as big data.

Michael Bird:
The speed that we gather information has become a critical component in business over the last 15 to 20 years. Using web cookies and literally smart devices and the internet of things, we can generate absolutely vast amounts of data in near real time and store and retrieve it around the world on mass close to the point where it's needed. Most of us as individuals take data at our fingertips for granted, but it is easy to forget that many organizations, even those with data at their core, are still generating and collecting data over periods of weeks, months, and even years. Glacial pace in the digital age. Some of them, especially those in the public sector such as government departments, are still relying primarily on paper filing. So when someone comes along and turns those decades or even centuries old filing systems on their head, the results can be dramatic.
Heather Savory was the Deputy National Statistician, the Britain's Office of National Statistics. She's also worked on big data for United Nations and currently sits as the non-executive director for the UK Parliament Informational Authority. In short, she knows quite a bit about big data and she spent much of her career transforming big public bodies to take advantage of it.

Heather Savory:
I became director for data capability at the ONS in 2015, and I was charged with doing a full digital and data transformation, and it was a tough gig. It was a newly created direct general position because the board had realized that the way our national statistics were being produced needed a transformation. A lot of things were still being done on paper. They had attempt a transformation probably about five years before I joined, which had been a complete unmitigated disaster. And so I went in with a very good team who worked for me, worked directly to me. We actually transformed how the ONS works. So that included the first digital census that you'll have taken part in. That also included building the ONS data science campus, which is one of the things in my career that I am the most proud of.

Michael Bird:
How long has the ONS been around for? They have rooms full of old records.

Heather Savory:
Oh, hundreds of years. There are paper records which I discovered were in a shed somewhere and the roof was leaking and we had to get that fixed. There was all sorts of interesting times at the ONS, but one of the main things really was trying to get statisticians to think differently about how they collected and used data because obviously most statistics is based or was based on actually doing a survey on paper. And surveys are always going to be needed. You're not going to get away from needing surveys to produce national statistics. But there's all these other very rich data sources which are easily available across government and from business which you can start to frame the way you are thinking about your problem solving in a completely different way.

Michael Bird:
With hindsight, it's incredible to think that Britain's Statistics Authority was pretty much paper based until 2015, but that is the power of digital transformation. It really is, well, transformative. By turning paper data into digital data, the ONS was able to compare and contrast it to other data sets and data sources and come up with a whole new way of looking at information. In short, they were able to look for insight among the data and that pretty much gets to the core of what big data means today. But to do that, you need to take advantage of different data sets. So let's not get ahead of ourselves. Here's Vedran again.

Vedran Podobnik:
Third V is variety. So this is about different types of data that exists. Initially, majority of data was structured data stored in databases, big databases, but with the time, we more and more talk about unstructured data as well. We talk about specifically from the big data perspective about the multimedia data, which is huge by definition. So for example, CCTV recordings.

Michael Bird:
Looking at unstructured or rather differently structured information and overlaying it with data that's been shared or made public opens up a whole new world of opportunities. It's something Heather Savory is keen to explore and it can have dramatic real world consequences when different sets of data can be combined to address real human needs.

Heather Savory:
Think of the apps that you have on your phone. Bus checkers. When you're standing at the bus stop in London, you can look on your phone and where the number 93 bus is. Well that because TFL released that data as open data and allow people to build apps on top of it.

Michael Bird:
Which is really powerful, isn't it?

Heather Savory:
Very powerful. Yes.

Michael Bird:
You can let other people do some of the innovation.

Heather Savory:
And the owners of the data which is collected for a particular purpose are I think generally really surprised by the other things that people do with that open data, and it's not competitive to their core market or their core purpose because it will be something completely tangential. If you are in the unfortunate situation of buying a property at the moment with the way interest rates are going and you want to see whether or not you're flat or house or bungalow or caravan, if that's all you can afford, is subject to flooding, you can actually look at the flood database, which is based on this open data from the environment agency.

Michael Bird:
So that does raise an interesting question though. If data is open source or has at least been made public, you are kind of taking it on trust that the information you are using is up to date and accurate. You have to take it on trust because it's not your data. You've just been given access to play with it. Here's Vedran again.

Vedran Podobnik:
Then we had the four V and five V theories. Four V was veracity. This is explaining how trustworthy data is. So when we talk about data, it's not only about having huge amounts of data, but can we trust this data? And this is specifically when you talk about data from the perspective of data being used by senior manager in the company. The question is there's a dashboard showing some kind of analysis on the data, but the question is is this data right? Can I trust this data?

Michael Bird:
Bad data, or I suppose counterintuitively the fear that your data could be revealed as bad data, is something Heather Savory has come across often in her career. After all, she used to be the independent chair of the open data group, which advised government departments on releasing their data and research for wider use and the greater public good. And let's be honest, wherever we are in the world, our governments aren't always known for their openness or for not occasionally muddling the figures.

Heather Savory:
So each department has its own data, and despite data sharing legislation, it has proven extremely difficult to actually get departments to share data. People tend to want to sometimes keep their data to themselves because there is no such thing as a perfect data set and people become very concerned that actually the management information that they have is flawed and they don't want people to know that they're making decisions on a day to day basis based on flawed information. Although actually everybody is. Is anybody's data perfect? And we sort of need to think about being more transparent about the real state of our data so that it can be improved. As soon as you open up a data set and let other people look at it, they will identify issues with it that you haven't noticed yourself. And that means where the problems are and that actually means you can fix them. But taking that step is a very big step for most people.

Michael Bird:
Let's be honest though, with the best will of the world, no organization's data is perfect. Not that we'd admit it. So I guess we are probably no better than the politicians, are we? And that means there's a question to be asked within organizations about the value or risk depending on your perspective of having your data scrutinized versus the potential enormous opportunity of releasing data and the insight that it could bring. Here's Vedran.

Vedran Podobnik:
The fifth V is value. And for me, with the time, that V is more and more important. Why? Because when we think from the perspective of huge amounts of data which has been generated today in 2022, we are talking about 5,200 zetabytes of data. So it's huge amount of data which is generated yearly. So if you think just from the perspective you need to store this data somewhere, this is a huge cost of storing the data. So the question is why should someone store data? Individual organization, if you store data, you need to pay some money for it. So if recognize the value for your data, so I would say specifically for your organizations, more and more they think from the perspective should I store the data or should I delete the data? If I'm storing the data, am I storing the data in hot tier, meaning that I can access it all the time, but this is usually more expensive way to store data, or I'm going to archive it somewhere? It's not deleted, but still you need to pay some money for it.
There is also the term dirty data, which is used quite often today. This describes the data which is stored somewhere but with really low potential of being used in any way on the benefit of the company. This is why it's dirty data. But it's not easy to identify and differentiate between dirty data and really the data that can be valuable. So here the importance of metadata data governance comes into place and the reality is that still we don't see the right technological solutions which would enable to do proper large scale metadata management and governance. There are a lot of technologies which can be used for different niche or silo data types, but we are currently missing the overarching unified approach to data governance.

Michael Bird:
In organizations, is there some sort of tug of war between wanting to store as little data as possible because that data has to go somewhere and wanting to store everything or as much as they possibly can because, oh, they might use that at some point in the future.

Vedran Podobnik:
Tension I think is the right word here. So you can always feel the tension between teams who are responsible of platforms and users of the data, right? Because usually users of the data, they are not aware of really cost of storing the data because the costs correspond with the platform teams. So of course they want to minimize the cost and also minimize the complexity of the systems. It's much easier if you can have the data platform which is sitting on hundred nodes compared to data platform which is sitting on thousand nodes from the perspective of maintenance and the management, while this also means that you can store 10 times less data. So the question is how to find the right balance between those two. And then again, here we come to the question of data management and data governance, understanding really what data is important, what data you need to have, and what data is not going to be useful. And so this is the challenge I would say today when you talk about big data is companies need to think from the perspective I'm going to analyze this data to get insights pf the data. So the question is how companies transform data into value. And for me, if we talk today about big data, this is most important V today. Value of the data.

Michael Bird:
So value is a question of potential benefits to our organizations by sharing information and question of the literal cost of storing the data somewhere. But we humans are pretty ingenious and there are organizations out there which have spotted an opportunity to bridge that gap to collate the vast data of several organizations, much of which is of little value to anyone else, and overlay it with other data and human expertise to do the hard work of generating insight third parties can benefit from.

Louise Blair:
My name's Louise Blair. My job title is a lead analyst at Airfinity and a head of vaccines and epidemiology. Airfinity is a life science predictive analytics company. We gather all data on the clinical pipeline in infectious diseases but also other diseases such as within cardiology as well. So we understand or identify what's going on in the pipeline all the way from discovery and preclinical through the clinical stages to approvals. But then beyond that as well, so the procurement, manufacturing and production of all these candidates. So gathering data from all the different sources that are out there to try and make sense of that in one picture. For example, understanding what is coming through the pipeline for COVID-19 and what impact that can have on the current cases, hospitalizations and deaths that are happening, but also happening in or will happen in the future as well.

Michael Bird:
Airfinity collects and collates vast amounts of data from myriad sources on new medical and pharmaceutical research. It then overlays it with human expertise and other raw data from hospitals and public health bodies to try and garner insight that can then help drug manufacturers, governments and others to strategically plan for the future. They deal with a massive amount of information in a huge number of formats, often siloed or stored in a way where it can't really be easily digested or connected to the outside world. So combining that silo data has huge potential, as Vedran explains.

Vedran Podobnik:
One of the biggest challenges that organizations face today is silo data. Meaning that for different types of data, they're using different technologies and those houses are not connected. And the great thing about the data is that one plus one sometimes does not equal two, but can equal three or four because you can get additional insights, additional value can be extracted from the data if you combine silos. Connected data is very important.

Michael Bird:
And connected data has allowed Airfinity to do what it does so well. What does Airfinity do with absolutely vast amounts of data?

Louise Blair:
We're trying to gather it for all sources really. So I think traditionally in science there have been very set places that you would find information, whereas especially within Covid, but more also sort of for other diseases over the past couple of years, that's really expanded. There's a lot of information out there, but it's becoming more difficult to make sense of it all in one place and that's really our key aim.

Michael Bird:
Can you quantify how much data, maybe it is a trade secret, but how much data could one gather around a single drug or a single something like COVID-19 for example?

Louise Blair:
Yeah, it's quite, I guess, hard to quantify exactly in terms of the different disease of different candidates, but pretty much any bit of information that is out there, we're gathering. So whether that's just the name and the alternative names to a drug to all the press releases, papers, results, efficacy results either from clinical trials, but also all the real world studies that are happening as well. We are gathering information on animal human interaction. So an example of this, trying to understand potential avian flu outbreaks and what could lead to an avian flu pandemic. Looking at countries, looking at their population size, looking at the frequency of interactions that they may have with large groups of animals. So if you think of large indoor poultry farms, if there's a lot of interaction or large farms with a lot of human interaction, there's an increased risk of transmission.
We of course all know about live and wet markets and potential risk that they can have following the COVID-19 pandemic. So gathering all that data can help us understand hotspots of risk going forward. Now perhaps people wouldn't think that, yeah, data analytics company focusing on vaccines and epidemiology would be looking at the size of poultry farms, but that's another data set that we can bring in to help understand the risk. And if we didn't have the expertise to understand what are the potential hotspots and contributors to potential transmission from animals to humans, we wouldn't look for that sort of data set.

Michael Bird:
So what Airfinity are doing in this respect is finding new perspectives from what has been dubbed data fumes. I ask Vedran to explain what that actually means.

Vedran Podobnik:
Good question. As with everything in life, and this is really valid for politicians, they always say when you cite them, if it's something that they don't want to be cited on, it's taken in the wrong context. But this is same as with data. So context is very important because same number in different contexts can mean completely different things. So from that perspective, when we talk about data fumes, which you mentioned here, we are talking about the context of putting data in the environment today where we have mobile phones, smart watches, which are on all the time, and if we enable them to basically record how we move, then somewhere in the cloud they can basically store the traces of how we moved the different geolocations. Data fumes is a concept, how you extract the value out of this data. So it enables companies to access digital footprints of the customers and then enables them to analyze the data in a way to create different insights about the users.

Michael Bird:
Why is it called data fumes?

Vedran Podobnik:
It's called data fumes because, and I'll take my academic hat now again, in academia, basically the measure of success is how many new scientific papers in impactful journals you can write. And then in order to be able to publish in those info journals, you need to be a little bit creative and you need to be a little bit provocative, right? The data fumes term was coined in one scientific paper, one scientist.

Michael Bird:
Fair enough. All right. So the phrase is a bit of a buzzword, but the value it can offer is very real. In fact, understanding data in different contexts was core to Heather Savory's digital transformation of the Office of National Statistics in the UK.

Heather Savory:
A traditional statistician thinks about what they want to know and then commissions a survey to ask certain questions to gather the data that they want. Now what we wanted to do and what we did do was to bring in all these data sources and to turn that on its head so that rather than thinking what data do I need to answer this question, you think what data have I got and what can it tell me? And when you bring this data together, you can actually look at it from multiple different angles and merge different data sets together to answer a whole plethora of questions that you could never answer using surveys. So a good example is during Brexit we produce some very early trade estimates using shipping data and using actually open data on shipping and aircraft. There are actually some early economic indicators based on big data sets and things like pricing data from supermarkets and other stores so that the actual estimates of the economy not only are, I believe, better quality, but more importantly actually they are closer to real time.
Because if you get from one of the major supermarkets a list of who's buying and what it costs, that pricing data, it's the exhaust from them selling the food to us all. So you can do that. One of the really most powerful sources of data is geospatial data. So you can use geospatial data to look at deforestation in the Amazon and you can actually do that. If you look at deforestation across the globe, you can make a really good estimate as to what's happening regarding climate change. You can look at water sources and water supplies.

Michael Bird:
And it's not just in predicting trade patterns or global warming that big data and data fumes can play a huge part. There are times when it can literally be life or death for organizations as well, especially when it comes to cybersecurity. Using insights gleaned from thousands or even millions of interactions and data points can help organizations identify potential attacks and help security teams deal with them. You might remember in our last episode, HSBC's chief of cybersecurity architect George Webster spoke to us about protecting the bank from ransomware attacks. Well it turns out data insight can play a huge part in that. So your PhD was on developing methodologies and techniques for large scale security analytics. Can you briefly explain what that means to a lay person like me and just give a bit of a sense of how the field has evolved since you did your PhD and how that's going to continue to evolve? How you think it's going to continue to evolve?

George Webster:
For my PhD, I just wanted to goof off of Germany for a bit. That was really the PhD. No, so the PhD kind of came about and it really became from my frustration with the current affairs and you still see it today, cybersecurity is often very underled and I would argue the vendors have let down the security community. It takes a really long time to change the tools. The goal there was to kind of look at it of how can you start to change it, how do you start leveraging data and insights and start to unlock the human? So don't rely on that tool but unlock that human to be able to perform an investigation and be empowered to make effective decision making. So how is the field really evolving is you're actually seeing the field massively go towards that. We talk a lot about various cybersecurity, but if you see it, the data landscape for instance, Databricks has a cybersecurity capability. A lot of the recent upstarts in the venture capital world, again, you're seeing analytics more and more. You're seeing also a lot of these major corporations starting to stand up data science teams to start to help and augment.

Michael Bird:
And that's a key with big data. It's about giving us humans more insight. Nowhere is the idea that big data is there to enable humans rather than replace them clearer than with groups like Airfinity who are taking enormous and disparate data sets, for example, vaccine trials and livestock supplies, and then tying them together to generate insights. It's not the kind of task an AI is currently suited to.

Louise Blair:
Yeah. I think what was quite striking when I first started at Airfinity is that there weren't many people doing this type of work. So trying to understand the picture as a whole but also with the depth of knowledge within each disease. And I think what sets us apart differently as well is that we have individuals that are specialists in each area, because I think sometimes, yes, we have all this data and I think there's been quite a few of armchair epidemiologists over the course of the pandemic.

Michael Bird:
I was one of them.

Louise Blair:
Sometimes we can get data that is out of context and we don't really understand the impact 'cause we don't have that understanding of how perhaps the vaccine manufacturer works and how that can then have an impact on epidemiology. I would say, yeah, over the past couple of years we are trying to work very differently to make that picture a lot clearer in terms of future impact in the vaccine space. And we've been working with not only governments but biotechs and pharma to understand that race of vaccine. So what's coming on the market, how quickly or how likely they are to progress to the next stage and to approval to really shape their vaccine portfolios.

Michael Bird:
So are there things that we can do today that we maybe couldn't have done, say, like a decade ago?

Louise Blair:
Yeah, definitely. I think just in terms of the volume of data that is out there and being able to collect this data all in one place and then have specialists really analyze it. So we may previously been able to get an overview of what a single company is doing in the UK, but we are now scanning that across the world. But I think it's the interplay because we are able to process data in a very different way now than 10 years ago. We can look at the broader picture rather than just focusing on the pipeline or just focusing on the epidemiology. We're able to put that all in one place and gather that in a way that we haven't been able to before.

Michael Bird:
If there was another COVID-19 style outbreak in the next, I don't know, five years or so, let's really hope not, what would you and Airfinity do differently based on what was learned during the pandemic and all of the technological advances that have happened in the last few years?

Louise Blair:
So I think, first of all, it's just the horizon scanning of what's happening worldwide. Yes, we hope that a pandemic COVID-19 won't happen within that timeframe, but there are outbreaks of different diseases happening all the time. For example, the Ebola outbreak in Uganda at the moment. And it's being able to monitor each of these outbreaks, understand what the countermeasures are. So are there vaccines available, what type of lockdowns or restrictions of movement actually work for each type of disease, and how that all interplay. We can get that information to people on the ground or governments or pharmaceutical bodies all at the same time, sort of very rapidly in comparison to what we were able to do maybe two, three years ago. So we are able to advise a lot quicker. We're able to act on any outbreaks a lot quicker and therefore could help stop an outbreak that could lead to the COVID-19 pandemic or similar situation. But we need that horizon scanning to continue. We need that data to continue in order to help with that process.

Michael Bird:
HSBC's George Webster agrees that humans are still at the core of gathering insight, even in the groundbreaking world of cybersecurity where machine learning is taking off. And you'd expect most cases to be handled by software and algorithms, the role of AI within big data isn't to advise based on wide disparate data sets. Instead it's there to assist the human by filtering out noise whilst the human provides the insight or helping them execute their decisions and strategies.

George Webster:
What is AI? In many ways, there's nothing really intelligent about these AI systems. When you talk to machine learning for instance, it's just rapid pattern matching. Nothing AI about it. It's just how do you do pattern matching? You look at SMT, you're basically just finding the shortest path or solving the equation. That doesn't mean those patterns in this AI like methods don't have massive bang and huge significant benefits, but it's kind of hard in a cyber space. How do you find a pattern when the pattern's constantly changing? I don't know where the threat's going to come. No one ever will. We can't predict the future. So it becomes very difficult and very challenging. But what you do see is a lot of these AI methods and techniques start to come into play for how do you start accelerating and processing. So map produce for instance, an algorithm that allows you to distribute your execution, well that's often in the AI world, it's heavily used. Now you're seeing the same thing. We have huge, huge volumes of data. So able to [inaudible 00:34:41], we're not just using a reverse index, if you will, or a search engine. Now you're starting to see us use techniques like map produce to do the work.
Those are really, really complicated and long answer. But the sum is you are seeing AI being leveraged and you are seeing it being used but not in the traditional sense. You're seeing a lot more of than starting to use those techniques to start to incorporate more data driven methods.

Michael Bird:
Fascinating stuff. But here's the thing, and it's one of the big arguments of big data. These organizations are using our data. Governments across the world take information on us and our spending habits and our lifestyles and they compare it with information taken from other government departments or public information on everything from flooding to bus times and then they create insight. Our banks use our interactions to learn whether we are suspicious or dangerous actors. Flight patterns can be used to predict disease transmission. Now our names might not be attached to that information, but that information is ours. We made it. So does that raise any concerns about how it might be used? Well, Heather doesn't seem to think so. Certainly not at a state level.

Heather Savory:
The way I like to think about it is like this. So we are part of society and society is run by government. And in order for the government to work out what it needs to do, it needs to know that we exist. That's why we do the census. It needs to know where we live in aggregate so it knows where to build houses, hospitals, and schools. And you need to be part of this big data as an individual in order to get the things that we all then complain about. If you're not even on record as of a demographic, then there's absolutely no way that anybody can even try and get it right in government. So that's one thing. And the second thing is I think that there's so much more information available to people now. The things that we've talked about, the apps that you can get, the things you can look up, the things you can do.
The real challenge for us in our society is really now around digital exclusion because there is so much that's going online. And one of the things I have to say that I am the most proud of, which we managed to do at ONS, was actually get the systems in place which have enabled the day-to-day Covid infection reporting, which everybody saw on the news and took for granted. So if you think back 10 years and there was a pandemic of that nature, there would be no way that you could actually find out what was happening across the country. There would be no way that you would know what concentration of sick people there was in your local area. So we are all actually empowered by the good use of big data.

Michael Bird:
Yeah, I think I found that actually that was probably the first time that I'd really seen government data so overtly accessible. And exactly what you said, you could go through the layers to your local area and see how many infections were in basically your tiny little ward or whatever it would've been. And actually I think that was really fascinating that you could get from that level and then you switched on the TV and they were showing graphs and basically explaining data to the general public. And I thought that was probably the first time that sort of happened. That government representative has gone on national television and shown a graph and said this is our decision and this is why we've made that decision. And I thought that was quite interesting because government can sometimes feel a little bit opaque with the way that they make decisions. So I thought that was quite fascinating.

Heather Savory:
And really I think that with my member of the public hat on, I would say that government should be doing more of that. Yeah, I agree. But I think that the real challenge that the world faces is around regulation. So there is an absolutely massive challenge which I encountered at the UN. I tried to get countries to share data with a common purpose for the common good to measure the sustainable development goals better. And to a certain extent, they are sharing it better. But the real problem is that you need global solutions. No country is isolated anymore. So it will be very interesting to see in 10 years time how the world has changed with respect to where does the data and information power base lie? Is it still going to be Google? Is it still going to be in the US?

Michael Bird:
Vedran has his own thoughts on the privacy argument.

Vedran Podobnik:
This is not an easy question. It's not easy question because if you constrain yourself too much either through regulation or self regulation or not wanting to disclose too much information, this means that you also are constraining yourself from perspective what value you can get out of the data. So there should be a balance. Because if the balance is crossing the way that you are disclosing too much, this means that information can be misused. But if you are constraining too much, then you are also in a way blocking the innovation. But also you are not helping the data scientists, data analysts to build a new system which are going to, in the end, make our lives better, make our society better. Because if you want to have technology with capability to predict whether there is a big potential that [inaudible 00:40:30] is going to happen tomorrow.
You need to share a lot of your personal data about your heart rate, but also the other kind of sensor data which is collected from your body. Also contextual information about potentially where you went. So it's, again, the balance between technology, what individual as a user wants to basically do with technology. Whether they are comfortable with sharing the data with technology organizations. I'm not fond of being too strict with the regulation. But again, it's clear that if we take a look what is happening with social networks and social network companies in the last few years and what is more and more happening today with AI, which is being more and more regulated, is it's becoming the mainstream technology. It's important that, firstly, people who are working with technology, they're aware of everything that can go wrong. Sometimes people are just not aware how data can be misused, or if they're building some kind of AI system, how those can be misused and then the regulation comes afterwards. I would say we need to work more on enabling people to be aware of everything that can be achieved, to be aware of all usages of technology and data that can be done before we are really preventing and regulating things in a way that we are kind of blocking innovation.

Michael Bird:
So big data, it is big business and once we get over the challenges of what to store, how to analyze it efficiently and regulation where it's needed, it'll become even more of a game changer in improving our lives with everything from better traffic flow to better health outcomes and better business efficiency. In fact, that's already happening. It's just going to happen a lot more. So I guess the final question is where will we be in 10 years time? What will the ever evolving signs of big data mean then? Here's Heather to see us out.

Heather Savory:
Oh, crystal ball.

Michael Bird:
Yeah. This is crystal all time.

Heather Savory:
Crystal ball question.

Michael Bird:
We'll play this back to you in 10 years.

Heather Savory:
Okay. Well I think people will be much more used to the concept of the better use of data, the use of big data. I think that the real challenge that the world faces is around regulation. There is an absolutely massive challenge which I encountered at the UN. I tried to get countries to share data with a common purpose for the common good to measure the sustainable development goals better. And to a certain extent, they are sharing it better. But the real problem is that you need global solutions. No country is isolated anymore. So it will be very interesting to see in 10 years time how the world has changed with respect to where does the data and information power based lie. Is it still going to be Google? Is it still going to be in the US? Is it going to be elsewhere?
We in the West aren't entirely happy with some of the things that the Chinese do in terms of their use of data, particularly surveillance of their own population. But taking a step back, if you think about being not in small towns, not in the countryside, soon as you get on a train, even in rural areas, you are on camera here. We are not as far away from that as people might like to think. And I would be interested to see whether or not people become more or less worried about that. Because going back to what I said before, to be part of a society, you have to have a digital footprint. And increasingly we have as much or more digital footprint than we do physical.

Michael Bird:
So Big Brother may not be watching you, but big data might. It's all right, though. He's probably friendly.
You've been listening to Technology Untangled. I'm your host, Michael Bird, and a huge thanks to Vedran Podobnik, Heather Savory, Louise Blair, and George Webster. You can find more information on today's episode in the show notes. This is the eighth episode in the third series of Yechnology Untangled. In the next episode, we'll be looking at the year that was 2022 and the challenges faced by global organizations. Be sure to subscribe on your podcast app of choice. You do not want to miss out on this episode. And of course you can catch up on the last three series. Today's episode was written and produced by Sam Data and me, Michael Bird, with sound design and editing by Alex Bennett and production support from Harry Morton, Alicia Kempson, Allison Paisley, Alex Podmore, and Ed Everston. Technology Untangled is a Lowest Street production for Hewlett Packard Enterprise.

Hewlett Packard Enterprise