[Blog] Securing AI: FPGAs and Data Provenance
Posted 01/23/2025 by Eric Sivertson, VP of Security
Data powers nearly every facet of today’s world, and the volume of data being generated, processed, shared, or otherwise handled increases with every passing year. It is estimated that 90% of the world’s data was created in the last two years alone and over 80% of organizations expect to be managing zettabytes of data in 2025, with 147 zettabytes of data having been generated in 2024 alone. For perspective, if a grain of rice was a byte, a zettabyte of rice could cover the entire surface of the earth several meters thick.
While this data explosion means more valuable insights, it also increases the likelihood of breaches or attacks and raises questions about safe and responsible use. For this reason, it’s critical that organizations not only have effective management strategies in place, but also strategies to ensure data integrity, especially in regard to data that is used to develop models or drive decision-making or innovation.
In this context, the concept of data provenance — tracking movements and transformations of each datapoint from origination onward — has evolved from a nice-to-have defensive measure to a key component of cyber security. This is becoming especially critical as organizations continue to increase their adoption of AI and ML technologies, which are only as trustworthy and reliable as their underlying data.
Solid Foundation for Data Integrity
Data provenance holds the key to preventing data tampering and designing secure systems that are trustworthy and compliant. At a high level, the process involves cryptographically binding metadata to data to create a transparent record of each point’s complete history, thereby ensuring its integrity and helping to mitigate cyber threats. Provenance systems work by tracking data from its point of origin all the way through its current state of use, creating an unbroken chain of trust.
When information is first digitized in a system, it’s ideally tagged with the time, date, location, origination device type, privacy rights, and more. All that information is then cryptographically bound to the data itself, documenting immutable moments in time. While today’s systems have varying levels of data provenance insight, the goal is to add and rebind metadata at every transformation point throughout the system. Emerging technology like blockchains and other distributed ledgers will underpin these tamper-proof systems.
Organizations that fail to prioritize data provenance may make decisions based on inaccurate or manipulated information, leading to negative outcomes or even customer harm. In the case of generative AI and large language models (LLMs), failing to properly trace the history of data can also lead to copyright issues. However, when organizations successfully implement provenance systems to assess the authenticity of data at every step of its lifecycle, they’ll gain a trusted competitive edge with customers, partners, and even regulators.
Providing Transparency in AI
Across all industries, there has been a significant rise in embedding AI- and ML-powered systems into operations. While this innovation has boosted efficiency, AI systems are susceptible to threats that may impede data integrity, and these threats are becoming increasingly sophisticated.
Consider a smart factory that uses AI-powered digital twins to simulate and optimize their production. The approach only works when the training data used in the system is accurate and timely, so it’s critical the data is trustworthy. Provenance systems would allow this factory to review the source records for the model, as well as if and when it has been modified, allowing facility managers to validate outputs and more easily detect potential threats or time-based drifts in the data fidelity.
Unfortunately, even though data provenance is essential to building and maintaining trusted AI systems, it’s not as broadly established as it should be. Due in part to a lack of approved standards to follow, very few models today implement or enforce the necessary requirements, leaving them open to threats from bad actors looking to:
Even without external intervention, lacking provenance insights can cause problems for businesses, like data drift. This occurs when the properties of the data that the algorithm has been trained on shift without the model being adjusted accordingly, making its outputs less accurate. Maintaining data provenance is the best way to ensure the outputs of these systems remain reliable over time.
FPGAs Bridging the Gap
To increase cyber resilience, system designers can integrate Field Programmable Gate Arrays (FPGAs) into data provenance systems. Unlike fixed-function processors, FPGAs provide truly flexible, reprogrammable hardware that is capable of parallel processing and real-time security operations. Their built-in security features, like encryption and authentication mechanisms, help safeguard and securely tag data during processing. Since FPGAs are often the origination point for system data, they play an important role in the cryptographic binding process. Moreover, the inherent flexibility of FPGAs allows them to be programmed and reprogrammed to perform specific tasks over time. This customizability allows organizations to adjust their approach to capturing and managing provenance information as their needs change.
FPGAs also optimize the performance of systems, including AI and ML models. Due to their real-time processing capabilities, FPGAs can manage large quantities of data across diverse sources with minimal latency. This processing speed supports data provenance by ensuring data transactions are recorded and cryptographically bound promptly, and that provenance records reflect the most up-to-date information. Additionally, FPGAs can execute many operations in parallel. This allows them to collect data, perform cryptographic operations, and monitor security all at once, without impacting the performance of the system.
Implications of Quantum Computing
Because cryptographic operations are essential to the metadata binding process, the cryptographic algorithms used must be future proof. This is an incredibly timely matter since advancements in quantum computing are threatening to break the classical asymmetric cryptography protections we rely on today.
The movement to post-quantum cryptography (PQC), a new form of cryptography, is underway to protect our digital data against the impending era of quantum computers. PQC algorithms use different, innovative math that’s capable of withstanding quantum threats. Because this encryption approach is so new, the “crypto agility” of FPGAs will be critical. If an FPGA running a PQC algorithm goes to field and a vulnerability is found, its programming can be updated without the need for hardware replacements. This flexibility makes FPGAs a trailblazing asset in navigating the transition to PQC and complying with changing regulations.
The Future of Trust
As data provenance becomes a bigger focus, industry and government standards bodies must create new provenance guidelines that require at least some level of disclosure about the data provenance composure of models. However, it’s not yet clear what form these measures will ultimately take.
One option is tiering data provenance systems by their robustness, with the bottom tier representing a lack of data provenance mechanisms and the top tier representing clearly documented chains of trust outlining the history of datapoints. Similarly, compliance and enforcement mechanisms will need to be assessed in this framework to mitigate risks associated with data misuse and ensure transparency and accountability. Incorporating independent, third-party validation of adherence to these standards will reduce the potential for conflicts of interest and ensure best practices of evaluating the trustworthiness of data provenance.
In the near future, we’re also likely to see an increase in the implementation of immutable data options, as developers embrace the idea that data should not be altered or deleted after it has been recorded. Blockchain technology is one such solution due to its decentralized security and distributed attributes. With a blockchain network, each transaction or piece of data is cryptographically linked to the previous one, and once a transaction is added to the blockchain, it becomes virtually impossible to modify or remove, making an immutable chain of trust on the data.
Data is now being used to develop crucial systems and drive consequential decisions, so it’s imperative that enterprises can track and trust it with confidence. The rise in AI-powered systems further emphasizes the need for effective data provenance to detect threats to these models and ensure their reliability over time. In 2025 and beyond, data provenance will emerge as a cornerstone of cyber security, cyber resilience and cyber trust – helping organizations identify threats to data integrity, comply with new regulations, and build trust within their customer and partner networks.
For more insights, check out Lattice’s LinkedIn Live on Cybersecurity Solutions for the AI and Quantum Era. If you’d like to learn more about how Lattice FPGA solutions can help secure and future-proof your system designs, reach out to our team.