Commentary
The much-touted California state AI safety bill, SB 1047, considered the most stringent in the nation, has been
vetoed by the California governor. Baby steps in addressing some of the low-hanging fruits in the AI regulation landscape are better than giant leaps. One such step would be to mandate the traceability of data. California should take the lead on this, just like it did on
data privacy.
Misinformation has been playing the devil with the war in
Ukraine. The problem is
compounded with the advent of Generative Artificial Intelligence (AI). Deepfakes are a major
problem. Governments naturally want to fight misinformation and fraud. Mandating traceability of data, also called provenance, is an important step in that direction and is a more comprehensive solution than just requiring watermarks in the metadata of AI-generated photos as proposed in California state’s
AB 3211 that some
companies are supporting.
Data is the new oil that is running a substantial amount of machinery in our daily lives, particularly in the form of AI. It is, therefore, important that the integrity of data be preserved. The traceability factor that data provenance assures is an important aspect of integrity. It can help in detecting anomalies and errors in data, just like tracing money can help with the integrity of the economy.
Provenance helps create
trust in digital items found online. There is some
indication that provenance is effective in reducing the users’ vulnerability to misinformation. It is an important
ethical measure that deters the misuse of digital items such as photographs. Provenance should include documenting the method used for generating the data. Mandating that the lineage, transformations, and the context of such transformations of data are maintained along with the data is likely to reduce the deliberate creation and use of fake content such as the
deep fakes used in scams.
I have been a victim of misinformation in multiple ways and multiple times, such as on anonymous
websites. Although most reviews about my teaching on the website RateMyProfessors.com are affirmatively positive, the few that are false still hurt. In rigorous research
studies, anonymous websites such as RateMyProfessors.com have been proven to be inaccurate and biased. Still, the website
proclaims, “The law protects Rate My Professors from legal responsibility for the content submitted by our users.” It is not clear if the Federal Trade Commission
rule banning fake reviews will have any impact on the website, but mandating data provenance may.
As the
research states, on such websites, “there is no guarantee that a ’student reviewer' is even a student,” implying a lack of provenance, and the information there is “unsuitable for use in a decision making process.” Still, many students use it for decision making, and the government seems not to want to do anything about it—until perhaps someone starts a website along the lines of “rate my lawmaker” or opinion pieces like this make a difference.
Enforcing traceability of the information on websites may not eliminate misinformation but can effectively reduce it.
Scams targeting young adults are increasing on social media. I reported at least one that I encountered to Facebook but to no avail. The content moderators did not think that it was an attempt at scamming, probably because the conversation on Facebook’s messenger was in vernacular language. Such scam attempts can at least be partly attributed to the lack of traceability.
There are more compelling reasons for mandating data provenance. It is touted as the “secret weapon” to protect businesses from
fraud and promises to be a
new beginning in cybersecurity. It facilitates investigating data breaches and narrowing down the specific information that is breached. A big debate with the advent of Generative AI is
copyright. Data lineage can establish ownership and help resolve disputes around intellectual property. It is, of course, a great resource in forensic analysis.
Tracking the origins, movements, and transformations can enable the assessment of any biases in the data and provide an indication of reliability. Reproducibility is important in many research and decision-making setups. Data provenance partly ensures the reproducibility of experiments, the decision-making process, and can help in the recovery of data lost due to various catastrophes.
Provenance is essential for the data ecosystem in the age of Generative AI. Mandating it will enhance the reliability, accountability, and trustworthiness of data, the systems built on top of it, and the decisions derived from them. The good news is that the technical community is increasingly realizing the importance of data provenance and effectively working toward ensuring it through
standards,
initiatives, and technologies such as the
Data Fabric. It is time that its necessity gets the attention of lawmakers. Until that time, industries should volunteer to take the lead.
Opinions expressed are Vishnu’s and not those of his employer or any other entity that he is affiliated with.
Views expressed in this article are opinions of the author and do not necessarily reflect the views of The Epoch Times.