Join the discussion: Ask Katharina and Jordan questions about AI privacy and governance
Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineup
Connect with Katharina: LinkedIn Profile
In today's data-driven world, businesses are increasingly leveraging the power of artificial intelligence (AI) to enhance operations and gain a competitive edge. However, with advancements in AIcome concerns about data privacy and governance. As a business owner or decision-maker, it's crucial to understand AIprivacy principles and regulations to protect your organization and maintain consumer trust. In this article, we will explore the key concepts discussed in a recent episode of the Everyday AIpodcast and their implications for your business.
Setting the Context - AIGovernance and Data Privacy:
AIgovernance refers to the frameworks and processes that ensure ethical, responsible, and secure use of AIsystems. Data privacy, on the other hand, focuses on safeguarding personal information and sensitive data. These two concepts are intricately connected, as AIsystems often rely on vast amounts of data, raising concerns about potential privacy breaches.
Key Privacy Principles for AISystems:
1. Data Quality: AIsystems must ensure accurate and reliable data usage. For instance, training a machine learning model to recognize fruits requires data free from errors to be effective.
2. Data Collection Limitation: Adopt the principle of data minimization by only collecting necessary data for specific tasks. Avoid gathering excessive information unrelated to the intended purpose.
3. Purpose Specification: Personal data collected and processed by your organization should adhere to the principle of using it solely for the stated purpose. For example, customer emails collected for a gaming app should not be used for unrelated advertising purposes.
4. Use Limitation: Obtain consent from individuals before using their data for purposes beyond the original intent. Unapproved use of data can violate privacy regulations and erode consumer trust.5. Transparency: Embrace transparency by openly communicating about the data you collect, why you collect it, and how it is used. Transparent practices build trust with customers and demonstrate accountability.6. Accountability: Establish clear roles and responsibilities within your organization to ensure someone is accountable for data privacy breaches and AIsystem performance.
Navigating International Regulations:
When deploying AIsystems, it's essential to understand the different data protection regulations worldwide. For instance, in the United States, publicly available information is generally usable by businesses. However, in Europe, even publicly accessible data falls under the protection of the GDPR, requiring valid reasons and permissions for its use.
Considerations for AILanguage Models:
As a business owner, it is crucial to prioritize AIprivacy and governance to protect your organization and maintain customer trust:
1. Establish Data Governance Policies: Develop clear policies and procedures for handling personal data, adhering to privacy principles discussed earlier.
2. Conduct Privacy Impact Assessments: Regularly assess the privacy risks associated with your AIsystems, identifying and addressing potential vulnerabilities or compliance gaps.
3. Implement Data Protection Measures: Employ robust security measures to protect data throughout its lifecycle, including encryption, access controls, and regular audits.
4. Stay Informed: Keep up-to-date with evolving privacy regulations and guidelines relevant to your industry and geographical location.5. Collaborate with Experts: Seek guidance from AIprivacy and governance experts, legal professionals, or data protection officers to ensure compliance.
In an era where AIis becoming an integral part of business operations, it is imperative to prioritize data privacy and governance. Adhering to privacy principles, understanding international regulations, and implementing effective data protection measures will not only safeguard your business but also foster trust among your customers. By prioritizing responsible AIpractices, you can leverage the power of AIethically and ensure long-term success in the dynamic world of artificial intelligence.
- Importance of data privacy and concerns about AItools
- Announcement of Google's new way of watermarking AIimages
- Microsoft president's warning about the need for human control in AI
- Release of the enterprise version of ChatGPT by OpenAI
- Introduction of guest expert Katharina Koerner from the Tech Diplomacy Network
- General explanation of AIgovernance and privacy
- Globally accepted privacy principles and their application to AIsystems
- Specific issues related to the origin of data used by large language models
- Differences in regulations around web scraping between the US and Europe
Jordan Wilson [00:00:19]:
Data privacy and AI aren't always two things that go hand in hand. it's it's something that I think so many businesses and individuals are are worried about. when it comes to using different generative AItools is What happens with my data? We're gonna tackle that today with an expert in AIprivacy who I'm very excited to have on the show. So If you're joining us live, please take part in the conversation, ask some questions. I think it's gonna be a very informative conversation. So My name is Jordan Wilson, and this is everyday AI. It's a daily live stream podcast and free daily newsletter helping everyday people. like me, like you, not just make sense of what's going on in the world of AI, but how we can actually make it work for us.
Daily AI news
Alright. So before we get in our discussion, on data privacy and AIgovernance. let's go ahead and go over what's happening in the world of AInews. So we actually have couple of things related to to privacy and governance. So, starting off, Google just announced a new way that they are going to be watermarking AIimages. So they obviously, announced this a couple months back at their annual conference, but it was just released now, you know, within the last couple of hours, that deep mind, which is essentially Google's ai, specialty arm, so to speak. So, this is a deep mind product, and it's called Synth ID, and it will watermark and label images that have been created with AI. So, it's gonna be interesting to see how that one plays out.
Alright. Next, the Microsoft president just said that AIneeds human control to be safe. Alright. So in in exclusive interview with CNBC, Microsoft president, Brad Smith warned, and I quote that AIhas the potential to become both a tool and a weapon. So this is obviously something we've heard about before, but if you wanted to check out more on what Brad Smith said, make sure to check out that section in the newsletter.
Alright. And our 3rd news story of the day Definitely. It might be last, but not the least. So ChatGPT enterprise has been released. So this was announced months ago, and we covered it on the show when it was announced. But, OpenAI has released, finally, the enterprise version of ChatGPT, a much more locked down version, kind of geared more toward data privacy. So This is, something that OpenAIhas been working on for many months. I don't even know what all of these terms mean it's it's SOC 2 compliance, that's a certain level of of compliancy, with data, I believe. a couple other things, details about this ChatGPT Enterprise. Not everything's been released. So, at least right now, there's no price tag on it yet. but we do know it will be faster, with no caps or limits on GPT 4 and a longer, context windows as well. So, exciting news on the enterprise front because I know that so many companies are a little hesitant with their data, and they don't necessarily want to hand it over, to OpenAIor some of these larger companies.
So what a great transition piece for, our experts for today. We're gonna be talking AIprivacy, and governance. So I'm extremely excited to bring on our guests for today. So please welcome to the show. Katharina Koerner from the Tech Diplomacy Network. Katharina, thank you for joining us.
Katharina Koerner [00:04:07]:
Thank you for inviting me. Great to finish out.
Jordan Wilson [00:04:11]:
Absolutely. This is gonna be a fun one. If this is such a topic that I think a lot of people are talking about, but, Catherine, if you could, just start us off very general, kind of break down this complex world of AIgovernance and privacy, just kinda so that everyday person can can understand what does AI governance and privacy even mean to to to the rest of us.
Understanding AI governance and privacy
Katharina Koerner [00:04:36]:
Thank you. Thank you. super important question. So I thought I would start with, mentioning that we have some globally accepted privacy principles and which are embedded in many, many privacy and data protection regulations around the world, which are, by the way, popping up here in everywhere, like just getting more and more. and of course, all of those privacy principles also apply to AIsystems in case they process personal data. So which are those? so first of all, we have data quality, for example. That means making sure that data use AIsystems is accurate and reliable. So for example, if you're teaching a robot to recognize different kinds of fruits, if the if if it learns, if this data is if data is it learns from its rule of mistakes, for example, calling apples oranges, that wouldn't be very helpful. So second one is data collection limitation. So data minimization is a very important principle. Always use just the data you really need to collect for a specific task and do not gather extra information. And so for example, if you have a fitness app on your, phone, If it's asking for a location, but it doesn't need that to count your steps, it's not following this principle. Then we have very important principle of purpose specification. That means the data you collect and process as an organization should only be and only be used for the purpose it was selected for. So let's say you sign up for a gaming app. So the email you you used the for that gaming app to send you game related stuff should, of course, not be used to advertise other products. then we have use limitation. This is, meant in that way that you should not use data for anything else other than the original purpose without permission. So you have photos on your social media platform, and all of a sudden, those photos are used for something else. That would be a breach of this principle. And then what is super important, and I think a lot of people are very aware of this is transparency. This means being open about what data is collected, why it's collected, how it's used. let's say, we order something in a restaurant or we have a recipe. Of course, we want to know which ingredients are used, maybe I'm allergic to something. So, you trusted me or more if you know what was was used. And then lastly, I want to mention accountability. if something goes wrong, there should be someone responsible. so those are some privacy principles that also apply to AIsystems. And in, of course, as expected, there are some particular issues on top of that when it comes to AI. So for example, a big issue is the origin of the data that large language models use. The basis for services just judgevity, as you mentioned. and here, usually, the data is crawled from the internet, and we have different regulations around the world pretty complex. The world of data scraping, I mean, the regulations about web scraping. So in in the United States, in general, the law says if information is freely available, in the public, businesses can use it. So if you have, I don't know, if you wrote something on a poster on a public bulletin board, anyone can read it, anyone can use it. Except for when the website says you can take the information or if it's behind a payroll or behind, some your credentials, then you shouldn't. You I mean, you're not not allowed to. But in Europe, on the other hand, any information, if it's public or not is considered personal information. It is protected by the GDPR. So if someone wants to use personal data, they need a good reason and a permission. like, ask you, you borrow a book from the library. If you don't ask, it's not allowed. So that's something that is very, very relevant in the context of II and LLMs. And should I do you want to I mean, I could go on and on and on. And I don't want to, like, take over your show.
How to balance using AI but keeping privacy
Jordan Wilson [00:08:50]:
No. No. I mean, I think just right there, we hit on so many different, different points that are so important, you know, making sure data is accurate and reliable, only using it for the correct purposes, which I think is for important. but also, you know, transparency and and, you know, understanding that different countries, you know, have different rules and regulations. I think it's important because I think, you know, even here on the show, a lot of times, we're talking about how things, impact the US. But, you know, as as you see here, you know, May Britt is joining us from from Europe. You know, Val right here with a comment is is joining us from South America. So we do have people tuning in from all over, you know, Bronwyn joining us from South Africa. So thank you all for joining, and thank you for your comments. But one thing, Catherine, that I wanted to talk a little bit more about is kind of how, you you know, companies or even individuals or just us as society, how do we strike a balance between leveraging AI's capabilities? Because we always hear about it and you know, all of the things that generative AIcan can do or make is so exciting. So how do we strike that balance between AI's capabilities and making sure data kind of remains how it should, which is which is private or only the data that we are wanting, you know, a company to to, collect for the purposes. So how do we strike that balance
Katharina Koerner [00:10:17]:
Well, that's a very, very good question to ask. and I don't know if that's too goes too far from, like, the the the the main topic of your show, but I have done a lot of work in research and privacy enhancing technologies. So that's usually the answer I give. so a whole new field has emerged with, like, a vibrant startup, community. Google is using it. Microsoft is implementing it in the products. something that is called privacy preserving machine learning and a huge research, ecosystem as well. So because it's true that with traditional privacy protections, often the utility of the data decreases. it might even decrease the utility of the data while not even protecting the data very well, which was the first intention. So one example is, is the protection, of personal health information under HIPAA. It's a classic example. So that's the US health insurance portability and the accountability act. So there are a lot of, entities covered under HIPAA. So if you process personal health information, you have to comply with this law. It's a federal law in the United States. and under HIPAA personal health information should be de identified for protecting patient privacy. So to support research, reducing risk, promoting data sharing between host so that we can have better insights in this personal health information. And one accepted method for this is, so called it's a safe harbor method, and that involves that you remove 18 direct identifiers from the data like name, social Security number, medical record numbers, etcetera. So that law was from from the nineties. And they really put this into law because usually law is tech neutral. So they formulate it in a phrase it in a way that, you know, it can go with time, but HIPAA will said, strip those 18 identifiers from personal health information, then you're more or less good to go. But in fact, you strip so many identifiers for the reason of privacy protection that it is that that that you lose a lot of information, plus it is not even good privacy protection because there are many re identification attacks that can be very successfully, you know, contacted on this, HIPAA protected health data.
Privacy-enhancing technologies for secure data sharing
And so now here is this new ecosystem, these new technologies in the is the set that have emerged over the last couple of years that can unlock the valuable insights from the data while protecting privacy. And those are So to mention a few of them differential privacy, synthetic, so differential privacy is kind of, your randomized responses in the in a data set. So you cannot say 100% is your data in this data set or not. You cannot you can mathematically prove that it's not possible to tell if your data was in the data set and contributed to the output? Yes or no? Then we have synthetic data. So you build a you you build a new dataset that mimics the patterns from the original dataset, and then you can use it. or there's something you called. I think it's a very poetic name, homomorphic encryption. It's like, you know, it's on a maturity curve. It it's going up, but it means that you encrypt data and you can still process and have, like, you know, contact computation on this encrypted data without decrypting it. Pretty similar is trusted execution environments. It's a hardware solution, where you also work you have, like, a secure enclave and the encrypted data goes in only inside. It's decrypted and processed and computed upon, and then it leaves the cave in an encrypted way, or we have secure multiparty computation. So meaning you can have a common computation and output but you will never know what actually what was the input, actually, a classic example is often, you can contact analysis of incomes, what's the average income, for example, or what's the disparity between female and male incomes, but you will not see what the actual input was. so that that those are some of those privacy enhancing technologies, and they are politically supported globally. So we have the US national strategy to advance privacy preserving data sharing analytics early this year. I wrote an article somewhere, maybe who can post it or whatever. Like, there's so many there's so much policy support for those pets because they're so promising because Some of them, as I mentioned, they even protect data in use. So we do know that, you know, protection of data addressed. So securing data when it's stored, that you know, state of the art when it's sent, so you protect data in transit. But some of those, new technologies can also protect data while in use, and that's a pretty big thing. So we will get there. And, of course, like, yeah, please.
Jordan Wilson [00:15:26]:
Wow. No. I'm just I'm just saying, like, I am Y'all, I I talk about AIevery single day, and I don't know about you all listening and tuning in. I am getting a 1st class education right now from from Catherine. And, like, I have so many notes. And if you're like me and if you can't keep up with everything. You know, I'm I'm trying to Google things. I'm writing things down. Don't worry. We are gonna be sharing everything in the newsletter. but I I do have you know, one more big question. I think, Katrina, before, maybe we can get to a comment or 2. So if you do have a question, make sure to get it in now. But I wanna talk about you know, because you kind of talked about how data privacy and, AIgovernance is so much different, in in different countries. but one thing that I think is is really gonna be on people's minds, especially here in the US, is when we talk about AI's impact on democracy, right, because I think especially here in the US, we've already seen it different, political groups using AIin probably ways it was not intended to, to make it seem like a political opponent maybe said something or did something they did not do. So I I I mean, can you just talk just quickly about kind of the the the risks of, AIjust on democracy and how it might influence, you know, public opinion elections and even just how we understand democracy.
AI's impact on democracy
Katharina Koerner [00:16:57]:
Yeah. I mean, that's a big concern for sure. And with the help of AImachine learning, you for sure can aim or try to influence public opinion and elections. So first of all, with election campaigns, for example, you can target specific groups with tailored messages. I mean, that's nothing new, but you can identify because you can analyze this vast amount of data in such a great way. You can identify potential swing voters or craft messages to really influence the decisions, micro targeting so that all leads to manipulation manipulation can also be achieved by, you know, AIgenerated deep fake videos. We had examples where, like, people were already, like, you know, videos were made as if they had made those videos and that was not those people. you have, filter bubbles more or less. So AIalgorithms can personalize content showing individual individuals information that only aligns with the existing beliefs, We have, of course, security risks. That's a big thing so I can be used by malicious actors to hack into election systems, manipulate voter registration data, disrupted voting process itself, and transparency and accountability, I think that also big topic also in in this in this regard. So the use of AIin elections and public discourse, raises the questions who is responsible for the AIalgorithms that influence public opinion, how can we ensure also in this context that AIis used ethical in response in response in any political sphere. And this is why in general, also with, like, you know, I'm sure you talked about hallucinations, by, LLMs or services like chativity, like wrong outputs. I think that education is really so important. I don't know exactly how that works that AIeducation will, find its way into, you know, our public education system. But I recently started an initiative just here for the Bay Area. I would So any school in the bay area that wants me and explain AIto them, like, to the to the kids, I'm happy to do that because I think that's just so important just as we have, like, not every single book just because a book, it's a good book. Right? It's not always like, write what it's in that book just because it's a book. And the same is true. with actually anything that is on the internet and any output by any system. You can never 100% trust. You should always, you know, use your own I don't know, common sense or prove it with different sources. So we have to raise this awareness. It will still self response for, content that we get from the internet and from really super cool applications like Jetty, etcetera.
Jordan Wilson [00:19:46]:
Yeah. Absolutely. And I couldn't agree more with you, Catherine, about the need for better, AIeducation in the school systems. Absolutely. Because it seems like so many schools. I just had an episode on this, recently. So many schools are, you know, shutting it down or not allowing, you know, students to use it when, personally, I think that's not a good idea. But, okay, so we've covered so many things, but I I I do hope we can, get get a question in there too from our audience. So, Nancy saying, good morning. Glad to be back on the show. Great to have you, Nancy. Cecilia, saying data is indeed one of the most important assets to manage right after our relationships and strategies. couldn't agree more. so a question here, if you wanted to take this one. So Ben says, thanks for the comments, and he's asking is it realistic to expect AImodels to be fully transparent and explainable? I think that's a great question. What do you, what would you, respond to on on that one, Kathrina?
Can AI models be fully transparent?
Katharina Koerner [00:20:48]:
yes. Yes. Yes. Very good question. So I think for some models or some applications like white box applications, it is possible. So, I mean, the majority of AIis machine learning. So if we have a a simple, I don't know, recommender system and we the model is a decision tree. So I mean, it's really decision tree. Like, I don't know. You know, up. Yeah. I don't know. you know, apples or oranges, and then does have a warm or not or whatever. So in the end, you have a a healthy apple or whatever. This is a expandable model. Even it already gets a little bit more in complicated if you have a simple model that is still tricky, like, with a a random forest, we have 100 of decision trees and then you, you know, and then you compare the outputs of those or compare them and you find have one model. But of of course, we have, like, models and there are many approaches trying to solve the explainability issue, but I do not see that the issue is solved So, I think in the future or actually already right now, I I think we will see more AIspecific applications. I mean, sector specific applications of AIso that models are built from the onset on with explainability in mind. So I know, like, recently came across one company focusing only on lender decisions and really having this approach from the very beginning on, and this is very, natural or a very, you know, a foreseeable development, I would say, because we have those principles privacy by design. So you have to build privacy in the whole architecture from the from the design phase on, or we have security by design. So now we have responsible AIdesign. So you have to build them from the beginning. I think with a lot of models that have already been built, you will probably not achieve full transparency and explainability. So I'm totally with you here.
Are certain countries ahead in AI governance?
Jordan Wilson [00:22:55]:
Wow. What what a great response tackling that from all possible angles. so I think we have time for hopefully one more question here, Katharina. So, maybe it asking because, you know, you you talked already about how different countries are handling, privacy differently. So, Maybrit's question, is do you find a specific country or area more effective in their AIprivacy and AIgovernments? Because, obviously, you know, the EU and some other, you know, countries throughout, throughout the world, kind of govern their their their AIprivacy and and and data much differently than here in the u USA, which essentially at least for now is self governance from from the largest, you know, companies here in the US. So what's your take? Are there certain, you you know, in in your opinion, in your experience, Are there certain areas or countries that are maybe more effective, at data governance and and AIprivacy?
Katharina Koerner [00:23:54]:
I mean, yes, of course, the first thought is the EU has already this, golden standard of data protection with the, GD ER and is coming up with the EU AIact, which will also have extraterritorial effect because if you will offer your services in the EU, you will also have to comply with the EU AIact, classifying AIsystems into different risk categories, etcetera. But, I mean, what is what is effective? I mean, if there's also this concern that those that the
Jordan Wilson [00:24:25]:
That's a good good point.
Katharina Koerner [00:24:28]:
I act is so effective that it will actually might, you know, hinder, Hinder of, help me here with the work. Like, you know, the the startup scene, like, you know, make it more difficult. Gross. Yeah.
Jordan Wilson [00:24:43]:
It might stop companies from being able to grow fast. Yeah.
Katharina Koerner [00:24:47]:
Yeah. So, I mean, what is effective? And also, I just wanted to mention because you said, talked about the US to mention that we do already have a lot of regulations also in the US which applied to AI. It's not that there is nothing here. So We have a lot of sexual law. There are a lot of state laws now popping up, which tackle AI. A lot of, AIissues are also, regulated the privacy law. So there's not nothing. It's just too complex. It's like a patchwork. So, more effective, but I think more effective would mean it's there's legal clarity. So I think if we have one law in the EU and maybe one law or executive order or whatever in the United dates on a federal level that will make it more effective, and I think this will be coming. So this was a kind of, Yeah. Not not not so clear of an answer, but, you know, some thoughts.
Jordan Wilson [00:25:39]:
No. I mean, but that's that's such a good point because you know, one person's definition of effective AIprivacy and governance might might mean to someone else you know, stunting business growth or, you know, keeping companies from fully scaling, at the pace to to which they, might want. So, we tackled so many important topics, on today's show. Katarina, thank you so much for joining us because I know we went all over the horn. thank you for sharing your expertise, and knowledge with us all. Thank you so much for joining.
Katharina Koerner [00:26:13]:
Thank you so much for having me. Thank you, everyone.
Jordan Wilson [00:26:16]:
Alright. And just as a quick reminder, like Val is saying, looking for the newsletter on the topic, don't worry. go to your everyday ai.com. Or if you're listening on the podcast, just look at the show notes. You'll find a link there because we're gonna be sharing some of the, articles that Katharina mentioned that she wrote, and we had so much in this episode dealing, with with AIprivacy and government. So don't worry. Go sign up at your everyday ai.com. So thank you for joining us, and we hope to see you back again for another edition of everyday ai. Thanks, y'all.