Executive  summary        
Do you know what a  non-fungible token (NFT) is? This question started a chain reaction that  resulted in an investigation by a diverse team into how Statistics Canada (StatCan)  could use blockchain, or distributed ledger technology, to authenticate a  document. The question was posed as part of a more significant idea of how the  Dissemination Division might use NFTs, or similar technology, to authenticate  the products leaving the StatCan website. Initially, our team was composed of  internal StatCan employees: Mathieu Laporte, Director of the Dissemination  Division; Jacqueline Luffman, Chief of Publishing Services; and Lillian Klein,  Research Librarian. These individuals discussed the idea among other StatCan  staff to evaluate whether it was feasible. However, as we recognized a gap in  our blockchain experience, we reached out to academics who research various  aspects of blockchain technology. Through those meetings, we were connected to  four blockchain experts: Dr.Florian Martin-Bariteau from the University  of Ottawa, Dr.Jeremy Clark from Concordia University, Dr.Victoria  Lemieux from the University of British Columbia and Dr.Tracey Lauriault  from Carleton University. We met with these experts for a brainstorming session,  where Jeremy Clark presented the idea of using digital signatures to  authenticate StatCan documents. With this idea in mind, a team of researchers  was formed to explore up-to-date cryptographic technology and applications to  develop a comprehensive understanding of the technology and determine whether using  this technology in StatCans work would be meaningful. Our research team  includes Kathryn Fedchun, a PhD student at Carleton University; Didem Demirag,  a PhD candidate at Concordia University; and Lillian Klein, a research  librarian with StatCan. This paper summarizes months of collaborative work  completed by this team.
The main focus of this  project is to understand more about blockchain and see whether, as StatCan expands  its website, it could use blockchain technology to enable users to authenticate  the data downloaded from the website. With an increased understanding of these  emerging technologies, the aim of this project is to develop a process of  authentication that would allow users to verify that the material downloaded  from the StatCan website has not been tampered with and was produced by StatCan.  This would increase overall trust in the agency as a statistical organization.  By using blockchain to determine the authenticity of its data, StatCan has the  ability to increase social trust with its users. It was identified that the  ideal method of authenticating the data should be easy to use and available in  an online and offline format to ensure users with varying degrees of Internet  connection can authenticate their data. 
Our research  successfully defined and explained what blockchain is and identified how  blockchain is currently being used in a Canadian context. We found that there  has recently been a call to action for government agencies to embrace  blockchain technology and take strides to implement it in their work. To create  a well-rounded assessment of the technology, we included a review of concerns  regarding blockchain. We focused primarily on the environmental impact, the  public perception of the technology and any potential backlash our team could  anticipate, the lack of regulations, and the potential to be blinded by the  hype of blockchain technology. Finally, we completed a brief comparison of five  blockchains that could be used in our solution. This comparison focuses on  general information about each chain, along with the transactions per second,  the consensus mechanism, whether it is private or public, and each blockchains  environmental impact. This analysis enabled us to decide that Avalanche is the  best option for us as we move forward with our technical solution. 
With the knowledge  gained from this research, our team recommends that this project could be the  agencys opportunity to answer the call to action. We propose that StatCan conduct  a pilot project based on Jeremy Clarks idea about using digital signatures and  build an application that users can download to authenticate their data. We  propose using a hybrid model with a blockchain that will allow both online and  offline users to authenticate their data. The technical details of this project  are explained in depth below; to summarize: 
          In  the hybrid solution, authentication will occur through an application that  users must download. The list of hashes of files is updated periodically to  contain the hashes of new StatCan products. The authentication of a file will  occur as follows: the user will need to upload the file needing authentication  to the app. This action will prompt the app to compute the hash of the file and  compare it with the list of already existing hashes from StatCan products. The  app will then inform the user whether the file is valid.
          This solution adds  tremendous value to the agencys transparency and trust with users. Hosting the  hash values on the blockchain creates an immutable record over time of the  products the agency has released and increases users ability to trust the  information downloaded from the StatCan website. This project is an opportunity  to experiment with blockchain technology without overhauling the agencys  existing system. 
In the age of  information, it is necessary to acknowledge the growing amount of digital  information available to Canadians and their increasing distrust of digital  sources (Ipsos Public Affairs for Canadas Centre for International Governance  Innovation [CIGI-IPSOS], 2019). According to Ipsos Public  Affairs for Canadas Centre for International Governance Innovation (2019), 36%  of Canadians feel that the government contributes to their sense of distrust in  the Internet. As Statistics Canada (StatCan) is the branch of government  responsible for disseminating information to Canadians, it should not ignore  this statistic. During the 2020/2021 fiscal year, the StatCan website had over  28million web page visitors and 766,589 table downloads (Statistics  Canada, 2021). StatCan prides itself on its transparency and accountability to  the public and strives to meet the needs of its users (Statistics Canada,  2018). As an organization, StatCan advertises itself as being a trusted source  of statistics on Canada (Statistics Canada, 2018). According to StatCans  Trust Centre, the people of Canada can trust that information gathered from  them, and about them, is done so for themand  that these activities are carried out with integrity and the highest ethical  standards (Statistics Canada, 2018). The Statistics  Act guides StatCan to ensure that it promotes and develops integrated  social and economic statistics pertaining to the whole of Canada (Statistics Act, 1985). 
Users count on the  agency and expect to access and download authentic, reliable data when they  enter the StatCan website. But once a product has been downloaded, it is  challenging to validate that it belongs to StatCan and has not been tampered  with by a malicious actor. This means that users may believe they are accessing  untampered data from StatCan when downloading a corrupted comma-separated value  (CSV) file. Regarding the likelihood of StatCan becoming a victim of cyber  threats at the hands of malicious agents, the increased number of ransomware  attacks on Canadian organizations shows that the country is a potential target  (Communications Security Establishment Canada, 2021). Therefore, as StatCan begins to plan the expansion and  innovation of its website, it is essential that it consider how it can give  users the ability to verify and authenticate the data they download from the  website.
This research aims to  investigate whether StatCan could respond to the authentication gap on its website  by integrating emerging technologies into its existing publication methods. To  find answers, we began by familiarizing ourselves with the current research  surrounding blockchain and distributed ledger technology. We then considered  the importance of record keeping, confidentiality, trust and authentication. We  looked at multiple examples of other Canadian organizations and government  agencies using blockchain and found multiple articles calling on the government  to adopt this new technology. However, we also considered concerns related to  these emerging technologies, including environmental impact, public image and  potential backlash, a lack of regulation, and the possibility of being blinded  by the hype. We investigated five blockchains that could be used in our  system design: Ethereum, Avalanche, Cardano, Hyperledger and Solana. With a  better understanding of the technology available to StatCan, we worked to  conceptualize a system that allows users to authenticate the data they download  from the website. Our goal is that the system enables users to verify that the  material downloaded from the website has not been tampered with and was  produced by StatCan. We believe that our method of authenticating data should  be available in online and offline formats to ensure that users with varying  degrees of Internet connection can authenticate the data. Our team prioritized  this component to serve all Canadian users, knowing that high-speed Internet  connection is inconsistent because of the digital divide in the country (Canadas Public Policy Forum, 2014).  Additionally, we prioritized usability when considering options for a solution,  which needs to be as simple as possible to ensure the technology is accessible  and easy to understand by users. 
Before a solution can be  recommended, it is necessary to introduce the technology behind it to provide the  context required to understand how the technology can help StatCan accomplish its  goal. The main features that need to be understood are the digital signatures  and hash functions that support our concept. In addition to the introduction  and literature review, Appendix A has a glossary of terms to help readers understand  the more technical material. 
Throughout the research process, we found a  few gaps in the literature. Given that blockchain is still a relatively new  technology, especially for government use, it is not surprising that gaps were  found. It was difficult to find any concrete Canadian government regulations or  policies on how to incorporate blockchain. This means that directives on the  implementation of blockchain within the government are still coming to light.  This gap leaves our team with questions surrounding how policies might change  in the future to simplify or complicate the implementation of this project.  Another identified gap is the lack of variety in the way organizations have  published their method of incorporating blockchain into their daily work. We  found a lot of material about how blockchain is being used in cryptocurrency,  record keeping and financial technology (fintech). However, it was difficult to  determine how blockchain is used by organizations on a daily basis. We were  also unable to find significant information on the legal implications of using  blockchain for our purposes. For example, in the case of health records  discussed below, it was difficult to determine how patient files were uploaded  or tracked on the blockchain. Furthermore, it was difficult to find research on  similar projects. We were unable to locate published research seeking to  address the issue of how to give users the ability to authenticate data that have  been downloaded from a website. We believe that our project fills some of the  gaps in the literature and is a valuable step in the direction of new  technology for StatCan. 
We performed a  systematic literature review for this study, which allowed us to understand  the breadth and depth of the existing body of work and identify gaps to  explore (Xiao and Watson, 2019, p.93). A successful systematic  literature review involves three stages: planning, conducting and reporting  (Xiao and Watson, 2019, p.102). The first stage, planning, is when  researchers identify the need for a review, specify research questions, and  develop a review protocol (Xiao and Watson, 2019, p.102). In the second stage,  researchers conduct the research and identify and select primary studies,  extract, analyze, and synthesize data (Xiao and Watson, 2019, p.102).  Finally, the third stage involves researchers writ[ing] the report to  disseminate their findings (Xiao and Watson, 2019, p.102). For this  project, we had the following three research questions in mind:
In the planning phase of  this study, we compiled a list of search terms that focused on our areas of interest  in this project. The list of search terms can be found in Appendix B. As Xiao  and Watson (2019) described in their article on how to conduct a systematic  literature review, we used these search terms to identify relevant articles. As  we collected academic research articles, our team added more search terms. We  then used a variety of combinations of the search terms listed in Appendix B  with Boolean operators to focus our results. In total, we completed 15 unique  searches.
Depending on the number  of results listed in a search, we reviewed between 100 and 300 results. If the  number of results listed was below 1,000, we examined the first 100. If the number  of results was below 100,000, we reviewed the first 200; if there were over  100,000 results, we examined the first 300. In the review process, we assessed  academic articles based on their relevance to this study using the title of the  article, the abstract and the listed keywords. Overall, we collected 59 papers  and entered the source information into a spreadsheet, including the title,  authors, year the article or book was published, abstract, and complete  citation.
Upon collecting the  sources, we began reviewing each article to determine its relevance to this  project. We assessed the abstracts in further detail and skimmed through the  articles to assess their usefulness. Of the 59 papers, we found 18 sources that  proved significantly valuable for this project. Most of the excluded papers  were too technical for the purpose of this literature review. While we have  attempted to make this paper relatively accessible, we have provided a list of  technical terminology and definitions in Appendix A. While some of these  definitions are paraphrased, they contain a fair amount of quoted material to  maintain integrity.
From our 18 sources, we  extracted relevant information and data and synthesized them into the  literature review below. Using the research questions listed above, we provided  a detailed overview of the technology; considered the significance of record keeping,  confidentiality, trust and authentication; and provided a list of examples of  other government organizations and agencies using blockchain. In addition, we  were surprised to find multiple articles calling on the government to use these  new technologies, and we also included this as a theme below.
Beyond academic  articles, we reviewed multiple articles on the concerning aspects of blockchain  related to environmental impact, public image and potential backlash, a lack of  regulation, and the potential to be blinded by the hype of blockchain  technology. We also researched five specific blockchains: Ethereum, Avalanche,  Cardano, Hyperledger and Solana. The number of blockchains available grows each  day, but our team chose to investigate these five. Ethereum is an extremely  popular peer-to-peer blockchain that uses a fair amount of energy. Avalanche is  a more environmentally friendly proof-of-stake blockchain, like Cardano, which  is also a proof-of-stake blockchain that is easy on the environment compared with  Ethereum. Hyperledger is an umbrella project of open-source blockchains and  related tools, and Solana is a carbon-neutral, proof-of-stake blockchain. More  information about these five blockchains and their differences is provided  below. This systematic literature review strengthened our knowledge of this  technology and supported us in creating recommended solutions and next steps  for this project, found below.
This project aims to explore how technology  can help users verify and authenticate data from the StatCan website. This  literature review begins with a brief overview of cryptographic technology.  Next, we consider the importance of record keeping, confidentiality, trust and  authentication. We provide examples of organizations, agencies and companies in  Canada that use this technology. Then, we list multiple sources that call on  the government to move toward new technology such as blockchain. Next, we  consider potential concerns with using blockchain, such as environmental  impact, public image and potential backlash, a lack of regulation, and the  possibility of being blinded by the hype of blockchain technology. Finally, we  compare five blockchains: Ethereum, Avalanche, Cardano, Hyperledger and Solana.  This project is a small step for StatCan toward new technology that can better  protect its data. 
In the early 1990s, cryptographers  Scott Stornetta and Stuart Haber conceived the idea of connecting blocks via  hashed data (Treiblmaier and Clohessy, 2020, p. v). Almost 20 years later, on  October 31, 2008,
          A  mysterious individual, or group of individuals, known only as Satoshi Nakamoto,  posted a link to a paper entitled Bitcoin:  A Peer-to-Peer Electronic Cash System to an obscure mailing list called  Cryptography List. In this paper, Nakamoto proposed the creation of what would  become known as a blockchain as a means of enabling an electronic payment  system that did not require a trusted third-party intermediary (Urban and  Pineda, 2018, p.5). 
          A blockchain is a  digital, decentralized and distributed ledger in which transactions are logged  and added in chronological order with the goal of creating permanent and  tamper-proof records (Treiblmaier, 2018, p.547). The idea of the ledger  has existed for a long timeit is a permanent collection of recorded  transactions, historically written in a physical book. Moving the ledger online  into a digital currency is where blockchain originated. Since then, blockchain  has broadened to include digital security beyond digital currency such as  Bitcoin.
Much of this technology  stems from cryptography. The term cryptography is derived from the Greek word kryptos, which is used to describe  anything that is hidden, veiled, secret, or mysterious (Mohamed, 2020, np).  Cryptography secures communication and information using technology and codes.  It is well known that data are valuable and often vulnerable. In todays  world, producing fake documents is becoming more common. As the fake ones  accurately look like the originals, it is impractical for a common man to  identify the real and the duplicate one (Prathibha and Krishna, 2021, p.71).  Given this information, technology that uses cryptography and blockchain can  protect the information, making it tamper-resistant [and] exceptionally hard  to change or delete (DeFilippi, 2018, p. 3435). As people begin to  recognize the significant and inherent value of data, blockchain and  distributed ledger technology may force some organizations fundamentally to  rethink their relations with users and approaches to privacy (Maull et al.,  2017, p.484). Before providing some examples of blockchain use in Canada,  we will discuss the importance of record keeping, confidentiality, trust and  authentication for our project.
Victoria Lemieux, an  archival studies scholar, claims that much of the discussion about trusted  records or systems boils down to two interlinking concepts: reliability and  authenticity (2016a, p.112). When a user accesses a record, they  consider any potential risks associated with the data (Lemieux, 2016a). Users  determine the reliability of data based on how they are accessing the data and  on record creation, including who created the record and how (Lemieux, 2016a).  Lemieux argues that long-term preservation of information in digital form  requires      that technical dangers  to the longevity of authentic information be addressed (2016a, p.114).  In our case, the purpose of what is actually stored on chain  is not  archiving but rather to establish that the original transaction record is  authentic (Lemieux, 2016b, p.15). The aim of this project is to  proactively safeguard StatCan data through the added value of blockchain  technology.
This project demonstrates that StatCan  recognizes the importance of confidentiality. When dealing with data,  confidentiality refers to the protection of information, such as computer  files or database elements, so that only authorized persons may access it in a  controlled way (Mohamed, 2020, np). StatCan data need to be protected from  potential threats or attacks. To accomplish this, we must determine the  vulnerability or weakness of the current StatCan system (Mohamed, 2020). It is  possible that data on the StatCan website could be altered without the users  knowledge. This project attempts to fix the potential risk by addressing  confidentiality and ensuring that information can be authenticated by the user.
According to a chapter on how authenticity can  transform social trust, Batista et al. illustrate the three most important  aspects of trust: accuracy, reliability and authenticity (2021, p.112).  They argue that accurate [and reliable] records are precise, correct, truthful   consistent, complete, and objective (Batista et al., 2021, p.114). To  generate trust, the authors describe that authentic records need to preserve  their identity and integrity over the period of long-term preservation  (Batista et al., 2021, p.116). In the case of digital archives, the  authors describe the difficulty in maintaining trust with a digital document.  For example, suppose a statistical document has been altered. In this case, it  might be challenging to detect the variances between the original and the copy  that has been tampered with, and this can negatively impact social trust because  of what they call uncertain authenticity (Batista et al., 2021, p.117).  This project seeks to improve trust between StatCan and its users by providing  a way to authenticate data from the StatCan website and removing uncertainty.
Authentication refers to the ability to  determine the validity of a source. It answers the question, How does a  receiver know that [the] remote communicating entity is who it is claimed to  be? (Mohamed, 2020, np). In this project, StatCan wants to help users  determine the validity of a source through an authentication process.  Cryptographic algorithms support authenticated encryption, meaning that users can  be sure the source is authentic (Mohamed, 2020). This verification also  instills integrityit means they can know that the information has not been  modified unless StatCan employees changed it through proper authorization  (Mohamed, 2020). Evidently, record keeping, confidentiality, trust and  authentication are significant factors in this project. Next, we provide examples  in Canada that demonstrate this technology in use.
Many examples were found in our research of  the Canadian government incorporating blockchain into specific projects. In a  policy book published by the Mowat Centre for Policy Innovation at the  University of Toronto, Urban and Pineda (2018, p. 6162) list many Canadian  government agencies experimenting with blockchain, such as Innovation, Science  and Economic Development Canada; the Treasury Board of Canada Secretariat; and  the National Research Council Canada (NRCC). In January 2018, the Industrial  Research Assistance Program in the NRCC used an Ethereum blockchain to  proactively publish grants and contribution data in real-time (Industrial  Research Assistance Program, 2019). This project was an experiment that ran for  one year and concluded on March 1, 2019. While the experiment is not ongoing, this  work has provided constructive insight into the potential for this technology  and how it may be used for more open and transparent operations for public  programs (National Research Council Canada, 2018).
Multiple levels of government have moved toward  using blockchain for permits, including the Government of Ontario, the City of  Toronto and the Government of British Columbia (Urban and Pineda, 2018, p.62).  One article lists a variety of ways that governments are using blockchain,  including for digital identity, the storing of judicial decisions, financing  of school buildings and tracing money, marital status, e-voting, business  licenses, passports, criminal records and even tax records (lnes, Ubacht, and  Janssen, 2017, 357). The Government of Ontario also ran a blockchain hackathon  that generated a number of ideas for other blockchain applications in  government (Urban and Pineda, 2018, p.62). Supporting pilot projects  that use blockchain is an effective way for the government to begin using these  new technologies successfully (Urban and Pineda, 2018, p.67). Governments  are using blockchain in many areas, and StatCan can use this knowledge and  build upon their work in this project. 
In addition to government agencies  implementing blockchain and distributed ledger technology, health care is  moving rapidly toward blockchain and digital health care records. Storing  electronic health records on a blockchain is not only improving record keeping  but also giving patients greater control over their own health and medical  treatments (Urban and Pineda, 2018, p.42). Doctors, nurses, hospitals  and other health care institutions are using blockchain to certify the health  of patients (DeFilippi, 2018, p.112). It is being used to store  encoded personal health records (Zheng, Zhu, and Si, 2019, p.17). The blockchain  can provide access to specific individuals, so a persons health records can be  secure and confidential when stored in a distributed ledger (Zheng, Zhu, and  Si, 2019). Lemieux writes, the underlying conditions in Canada are  particularly well-suited to leading blockchain research and implementation  Canada  has a vibrant, highly active blockchain technoscape, with a diversity of  start-ups and consultancies doing innovative work (2016b, p.5). We are  excited to add to this work in our project.
Multiple papers called on governments to move  toward new technology to better secure their data. Urban and Pineda argue that  blockchain can offer governments the possibility of improved transparency,  efficiency, and effectiveness (2018, p.42). While blockchain is not a  new technology, its use in government is relatively new, so the level of  blockchain expertise and capacity within Canadian governments and regulators is  currently limited (Urban and Pineda, 2018, p.61). They claim that one of  the first things the Canadian government should do is what we are doing  currently in this project: building up groups of technologists and  policymakers within government who understand the technology, its implications,  and the potential opportunities and challenges that flow from it (Urban and  Pineda, 2018, p.61). While Urban and Pineda (2018) are pushing for more  blockchain in government, lnes, Ubacht and Janssen emphasize that the  government should shift from a technology-driven to need driven approach with  blockchain applications (2017, p.355). They argue that blockchain will  lead to innovation and transformation of governmental processes (lnes,  Ubacht, and Janssen, 2017, p.355). Considering the ease with which  digital files can be altered (Bell et al., 2019, p.6), we argue that  this project is driven by a need for authentication on the StatCan website.
According to DeFilippi, governments  have established and stewarded a variety of systems and institutions designed  to enhance social welfare and provide the foundational infrastructure for  economic and political growth throughout history (2018, p.107). In an  article on cryptography and government, Aljeaid et al. argue that e-government   acts as a communication bridge  between government to citizen, or government  to government, or government to business in efficient and reliable ways (2014,  p.581). The authors emphasize the importance of data security in  government related to potential vulnerability if left unsecured. They claim  that end users need robust security solutions to achieve assurance when  dealing with e-government systems (Aljeaid et al., 2014, p.581).  Creating a tamper-resistant and resilient repository for public records (DeFilippi,  2018, p. 107108) using cryptography and blockchain can help the government  avoid data leaks, data loss and other vulnerabilities. We agree with this call  to action and believe that this project will improve public trust in StatCan  and the Government of Canada.
While the call to action is significant, we  also want to take the time to investigate any potential concerns regarding  blockchain. We have summarized our findings into four categories: environmental  impact, public image and potential backlash, a lack of regulation, and the  potential to be blinded by the hype of blockchain technology.
There  have been many claims about the environmental impact of new blockchain  technology. In November 2021, a blockchain project called Solana contracted Robert Murphy, a  climate and energy advisor, to publish an energy use report (Solana,  2021). They compared common activities that involve energy consumption with one  Solana transaction, one  Ethereum transaction and one Bitcoin transaction (Solana, 2021). While they did not include all of  the blockchain options that we have chosen to investigate, it is helpful to  consider how blockchain transactions compare with everyday activities. Conducting  a single Google search uses 1,080 joules of energy, working on a computer with  a monitor for an hour uses 46,800 joules, and using one gallon of gasoline uses  121,320,000 joules (Solana, 2021). By comparison, one Solana transaction  uses 1,837 joules of energy, one Ethereum transaction uses 692,820,000 joules,  and one Bitcoin transaction uses 6,995,592,000 joules (Solana, 2021). According  to Huang, ONeill, and Tabuchi for The  New York Times, the process of creating Bitcoin to spend or trade consumes  around 91 terawatt-hours of electricity  annually, more than is used by Finland, a nation of about 5.5million  (2021). While we are not using Bitcoin for our project, these numbers are  staggering.
Many of the big players in blockchain,  including Ethereum, are using an astonishing amount of energy because of their proof-of-work  (PoW) consensus mechanism. PoW requires network participants on the blockchain  to expend large amounts of computational resources and energy on generating  new valid blocks (Chandler, 2021). In comparison, proof of stake (PoS)  requires network participants on the blockchain to stake cryptocurrency as  collateral in favor of the new block they believe should be added to the chain  (Chandler, 2021). Chandler argues that PoW, such as Ethereum, can be more  secure and decentralized, but also uses an immense amount of electricity, is  slower and is less scalable (Chandler, 2021). By contrast, PoS, such as  Avalanche, Cardano and Solana, has a smaller environmental impact and allows  for faster transactions and better scaleability, but it is a newer form of  technology and may not be as secure or tamper-resistant as proof of work (Chandler,  2021). Evidently, both PoS and PoW have advantages and disadvantages, and we  consider the specific environmental impact of five blockchains (Ethereum,  Avalanche, Cardano, Hyperledger and Solana) in the chart below. 
There have been multiple examples of companies  and organizations that received backlash when attempting to use blockchain. In  December 2021, Kickstarter announced that it was moving to blockchain  (Plunkett, 2021). The blog post, titled Lets Build Whats Next for Crowdfunding  Creative Projects, received many critiques and complaints from creators  (Plunkett, 2021). Kickstarter responded by providing a frequently asked  questions section, where it claims it is confident that a crowdfunding  protocol built on top of Celo will not significantly negatively impact our  carbon emissions given its underlying architecture (Kickstarter, 2022). Still,  many creators and backers have claimed that they will no longer be using  Kickstarter, given this information (Morse, 2021). 
Similar to Kickstarter, the digital  communication platform Discord tweeted about integrating Ethereum into its platform  in November 2021 (Pearson, 2021). The founder and chief executive officer of  Discord, Jason Citron, quickly backed off the project two days later, after  public backlash (Pearson, 2021). Pearson states that people in the game  industry hate blockchain either because of the environmental impact of  proof-of-work tokens on Ethereum, the idea that blockchain collectibles are a  grift based on mythical thinking, or both (2021). Many users unsubscribed from  the platforms premium Nitro paid service or threatened to do so (Jiang,  2021). Given that both of these examples took place recently, in November and  December 2021, it is difficult to consider what the public opinion might be regarding  StatCan and this project. However, it is important to be aware of these  examples and recognize that backlash is a potential outcome.
Another concern is the decentralized and  unregulated nature of blockchain. Given that control and decision making about  the blockchain is not conducted by a single entity, this is an area of concern  for StatCan. Rather than putting trust in one entity, trust is put in  mathematical algorithms. Given that there have been other blockchain projects  by Canadian governments, they should be used as a guide for StatCan policies  regarding this project. Between the five blockchains we look at below, each has  different regulations, goals and abilities. It can also be difficult to scale,  depending on the blockchain chosen. This may be a concern because it has not  yet been decided how many StatCan products will be available for  authentication. Since we looked at trust and confidentiality earlier in this  literature review, the lack of regulation is less worrisome than the impact on the  environment and public image. In fact, this project is an opportunity to be an  early example and leader in blockchain implementation regulations, and we hope  that we will be able to incorporate new policies into our project.
The overall hype of blockchain technology  needs to be addressed. According to Victoria Lemieux, we need to address the  shortcomings in designs and implementations of blockchain record keeping so as  to be better able to realize the worthy vision of blockchains (Lemieux, 2019).  She writes, claims associated with use of blockchain technology for  recordkeeping are, in a number of cases, overhyped. As an example, blockchain  solutions that claim to provide archival solutions do not actually preserve  or provide for long-term accessibility of records (Lemieux, 2016b, p.4).  She claims that the biggest danger in blockchain comes from blindly trusting it  (2016b, p.23). However, critically investigating these limitations is the  key to successfully leveraging technological innovations like the blockchain  for the benefit of all Canadians (Lemieux, 2016b, p.8). While blockchain  technology does not solve every problem that it has been claimed to, it is a  useful technology that will continue to be used in industry and is deserving  of further research and experimentation (Ruoti et al., 2020, p.53).  While this relatively new technology is exciting, and considering risks can  bring up fears of stifling innovation (Lemieux, 2016b, p.5), it is  imperative that we are critical of the potential limitations and concerns about  blockchain technology to have the best possible outcome in this project.
For this project, we chose to evaluate and  compare five different blockchains, with specific considerations. We decided to  look at Ethereum, Avalanche, Cardano, Hyperledger and Solana. Ethereum is one  of the most popular blockchains, yet it conducts the fewest transactions per  second and has significant energy consumption compared with other options because  it uses proof of work (PoW). PoW means that a majority of users need to vote on  each new blockchain, and this takes more time and effort than proof-of-stake  (PoS) blockchains. We also included Avalanche and Cardano, which are both PoS  public blockchains. While Avalanches environmental impact is carbon neutral,  its transaction rate per second is the highest, compared with the other four  blockchains we analyzed. Meanwhile, Cardano is less energy efficient and slower  than Avalanche. We also chose to include Hyperledger, as it is a private  blockchain that uses Practical Byzantine Fault Tolerance as its consensus  mechanism. It is a private blockchain, which means that it is centralized. This  potentially impacts trust, as fewer nodes can make the network less secure.  Finally, we included Solana because it is carbon neutral, uses PoS and has  provided a report on energy consumption in comparison with blockchains such as  Ethereum. All of the blockchains outlined below have advantages and  disadvantages. Upon reviewing them, we have decided to use Avalanche for this  project. Avalanche is an open-source PoS blockchain with the highest  transaction rate per second, at 4,500. Additionally, it is a public network  that is carbon neutral, an important consideration for us.
Figure 1 displays an overview of five blockchains in a chart: Ethereum, Avalanche, Cardano, Hyperledger and Solana. In the chart, we provide general information about each blockchain, the transaction per second rate, the type of consensus mechanism that each blockchain uses, whether the blockchain is public or private, and the environmental impact of each blockchain. We also include a link to the website for each blockchain. 
Our research team has designed a solution that  incorporates blockchain technology using the knowledge gained from our  literature review and pre-existing technical experience. This section outlines  system details and the recommended solution for enabling users to authenticate  documents downloaded from the StatCan website. We will begin by introducing  three technical elements that are the pillars of our solution: digital  signatures, hash functions and secure tunnels. These three technical elements  interact as follows: a hash computed over the file that belongs to StatCan is  used to make sure the file has not been tampered with; a digital signature over  this hash proves that the file is owned by StatCan, and the secure tunnel  ensures secure communication between the user and the StatCan website. In this  section, we explain how these building blocks work and how they are integrated  into our proposed solutions. 
When users download a file from the StatCan  website, there are two questions that they may have. First, do the data  actually belong to StatCan? And second, have the data been tampered with?
To address this question, we propose using a  digital signature. The idea is similar to signing a document with a penif you  receive a signed letter or document from x, you can check whether the  signature on the document belongs to x and consequently whether the document  is theirs. In a digital signature scheme, a private-public key pair is used to  sign a document and verify the signature over a documents hash. There are  three steps to a digital signature scheme: StatCan needs to (1) generate the  public-private key pair, so that (2) it can sign the hash of the document with  its private key, and (3) any user with the public key can verify the signature.
  Step1: Key generation.
  Using a function that generates keys,  StatCan can obtain a public-private key pair. The public key is shared on the  website for users to download and use during the signature verification.  StatCan would not share the private key, as it might lead to a malicious actor  using the private key to forge StatCans signature on documents. It is  important to note that key generation is a one-way function, which means that  it is infeasible to compute the private key, given the public key. StatCan would  use its private key to generate the signature over a documents hash rather  than the document itself, as it is faster and more efficient, and the resulting  signature is shorter. Consider the signature generation as a function that asks  the user to provide their private key and hash of the document and generates a  file that contains the signature.
  Step2: Signing the hash of a document
To create the signature, StatCan needs its  private key and the hash of the document. It is infeasible to compute a  signature on the hash of a document if the private key is not known. The  resulting signature is kept in a separate file. StatCan would upload the  signature file and its public key on its website, so that users can download (1)  the file they want to use, (2) the signature file created over the hash of that  document and (3) StatCans public key. Consider the signature verification as a  function that asks the user to provide the three files that they downloaded  from the website.
  Step3: Verifying a signature
  Any user can verify the validity of the  signature by providing (1) the file they want to check, (2) the signature file  created over the hash of that document and (3) StatCans public key. If the  signature is verified, the user can be sure that the file actually belongs to  StatCan.
  Public key  infrastructure binds public keys with identities. This is done through a  registration process where a certification authority (CA) issues certificates  by signing StatCans public key. As a result, a CA verifies that the public key  really belongs to StatCan. CAs are entities that issue certificates used to  verify the ownership of a public key. Any user with access to the CAs public  key can verify the certificate issued over StatCans public key. The  certificates are valid for a specific amount of time.
Hash functions are used to create a unique  fingerprint for the input message. This technology gives StatCan the ability to  hash a document (such as a CSV file) and create a unique fingerprint of it in  the form of a fixed-size hash. Once StatCan computes the hash of the file, it  uploads it to the website. When users download a file, the document is hashed. The  resulting hash is compared with the uploaded value to make sure that the file has  not been tampered with. This part of the process is handled by the application  itself. We will explain this in more detail in the proposed solutions.
To solve users concern about the  authentication of their downloaded file, along with digital signatures, we must  use hash functions in our solution. This is common practice in cryptography, as  hash functions are known to be secure (Al-Kuwari, Davenport, and Bradford,  2011). They are used against malicious parties that may try to change data  deliberately. Using hash functions fills a demand in our proposed system, because  an attacker should not be able to create a file with a particular hash and  replace it with a file from StatCan. For the hash functions to operate  effectively, they require certain properties. For example, when two people hash  the same document using the same hash function, they get the same hash value.  The hash function produces the same output for a given input (which is also  called pre-image); this means that hash functions are deterministic. Even if  a single letter is added to a single cell in the document, the resulting hash  will be different (see Figure2). The determinism property is relevant in  the context of guessing the pre-image. Input to the hash function cannot be  computed by just looking at the hash value. However, one can try to guess the  pre-image, hash it and compare it with the hash value. Consider user  authenticationpasswords are generally stored as hashes. If an attacker can  access this database of hashes, they can pick a password (for example, one of  the most commonly used passwords), hash it and compare it against the database  to see whether there is a match.
Figure 2 is an illustration of how hashing works. The image shows a document with the word Hello, pointing toward a centre black box with the words hash function. On the right side of the image, there is a randomly generated list of numbers and letters. Below, there is a document with the word Hello! pointing toward the hash function black box. Because this document includes an exclamation mark, the hash output is distinct as well. When a document is hashed, a fixed-size output is created. Each distinct document has its own distinct output, even if only a single character is different.
Note:  This image illustrates how hashing works. Document1 contains the word  Hello, and the hash function creates Hash 1 over this document. The second  document differs from Document1 by one character: Hello! The hash function creates Hash 2  over Document2. Hash 1 and Hash 2 have different values, as Document1  and Document2 are different. Hash 1 and Hash2 are the same size, as  the hash function produces fixed-size outputs.
Most relevant to our project, it is imperative  to note that we expect a hash function to have the collision resistance property,  meaning that it is infeasible to find any two different messages that have the  same hash. In other words, an adversary cannot find another CSV file with  different content that has the same hash as the original document and cannot  replace the original document with another one.
For the sake of a comprehensive understanding,  we must also mention the other two properties that a hash function should have.  To ensure clarity, note that a message to be hashed is known as the pre-image,  and the resulting hash is known as an image. Pre-image resistance implies  that given the hash of a message, it is infeasible to find a corresponding  message. Weak collision resistance states that given a message, it is  infeasible to find another message with the same hash. As previously mentioned,  the hash function is also needed for the signing operation. StatCan signs the  hash of the document, rather than the document itself, to have a shorter  signature. This increases efficiency, as signing the hash is much faster. Since  the hash is used in the signature function, we need the collision resistance  property.
There are well-known hash functions, such as  MD5, SHA1, SHA2 and SHA3. However, not all are secure. MD5 and SHA1  are proven to be insecure, as they do not have the collision resistance  property. While it takes longer to attack SHA1 than MD5, both are  currently considered weak. Hash functions can break over time, but they get  replaced with secure ones. For now, we know that SHA2 and SHA3 are secure  (National Institute of Standards and Technology, 2015). As SHA3 is more secure  than SHA2, we propose using SHA3 in our solution.
The proposed solutions require a secure tunnel  between the user and the StatCan website for communication. In both the offline  and hybrid solutions found below, the user has to download an application from the  StatCan website. The user has to make sure they get the actual application, and  a secure tunnel is needed between the user and StatCan for that purpose. Also,  in the online solution, the user communicates with the StatCan website using  the secure tunnel. Https provides a secure tunnel, meaning that if an  attacker observes the traffic in the tunnel, they will not know the content of  the message being transmitted. All an attacker can observe is that there is  traffic between two parties.
The secure tunnel provides
There are three potential solutions that could  be implemented using the previously mentioned technology to resolve user needs  to authenticate a StatCan document. Offline and hybrid solutions require the  creation of an application that is downloaded by the user. In these solutions,  the user interacts with the application to check the validity of a document. 
Figure 3 is a detailed image of our offline solution. It displays the setup of the solution, which includes how Statistics Canada will hard-code the keys into the application, and an illustration of the secure tunnel between Statistics Canada and the application. The figure continues onto the next page and displays how the solution is used. This section involves a user uploading a .csv file to the application, the application computing the hash, the user providing the signature file to the application and the application checking whether the signature is verified over the hash.
In this solution, the user downloads an  application from the StatCan website through the secure tunnel. This enables  the user to ensure that the application they download belongs to StatCan. The  application checks the validity of the users document. The user takes the CSV  file and signature file they downloaded from the website together, then drags  and drops the CSV file into the app. The app computes the hash over the file,  then prompts the user to provide the corresponding signature file computed over  the hash of the CSV file. The application checks whether the signature is  verified over the hash. To do so, the StatCan keys must be hard-coded into the  app (setup phase in Figure3). The key is needed to verify the signature  over a file.
Figure 4 is a detailed illustration of our online solution. In this image, the setup involves the list of hashes on the Statistics Canada server and a secure tunnel between the Statistics Canada website and the user. Then, the figure displays the use of the online solution, which involves the user dragging and dropping a file they want to check into the Statistics Canada website through the secure tunnel. Next, the website computes the hash client-side over the file. Then, the website compares the hash with the list of hashes, and finally, the application prompts the user whether the file is valid. 
In this solution,  StatCan maintains a page on its website for the user to check document  validity. The user communicates with the StatCan website  using the secure tunnel, and they drag and drop a  file that they want to check. Since the website knows the list of hashes of all  files, it can compute the hash client side over the file provided by the user  and compare it with the list; StatCan maintains a server where the list  of hashes is kept. The user then learns whether the  file they uploaded is valid. If valid, the file has not been tampered with and  belongs to StatCan. Compared with the offline solution, this approach offers a  more straightforward experience for the user, as they only have to provide the  products file. However, this solution requires the user to be online, unlike  the previous application that runs offline. 
Figure 5 is a detailed illustration of the hybrid solution. In this image, the setup is more complex. Statistics Canada adds file hashes to the Statistics Canada server, which then get pushed to the website and application through a secure tunnel. The Statistics Canada server updates the list of hashes every three days. There is an illustration below of how the hybrid solution is used. It shows how the user uploads the .csv file to the application, how the application computes the hash, then how the application compares the hash against the updated list of hashes, and finally how the application prompts the user whether the file is valid.
In the hybrid solution,  the user must download an application (similar to the offline solution) over  the secure tunnel. The app has a list of hashes of files that belong to  StatCan. To authenticate the document, the user uploads the file to the app, which  computes the hash and compares it with the list. Then, the app informs the user  whether the file is valid. The app occasionally connects to the StatCan website  to update the list of hashes; StatCan maintains a server  where the list of hashes is kept. While we suggest  that the app connect every three days, the duration can be greater or shorter,  depending on how frequently StatCan shares files. Every three days, the  app receives the updated list of hashes that is kept on the server to have the  most recent list. A signature over a hash proves  ownership. Receiving the list of hashes over the secure connection means that  StatCan is the owner of the hashes. This solution eliminates the step of  providing the signature file, if the hash of the file that the user offers  appears in the list of hashes. If the hash is not in the list, the app prompts  the user to provide the signature file over the hash, so the app can compute  the hash and verify the signature over the file. This situation might occur if  a user tries to authenticate a file before the app has the opportunity to  connect to the StatCan website and update the list of hashes. 
Figure 6 is a detailed illustration of our recommended solution, which is the hybrid solution with the addition of blockchain. For the setup, Statistics Canada adds the hash of the file to the Statistics Canada server, and it is logged on the blockchain. This, in turn, gets pushed to the Statistics Canada website and then through a secure tunnel to the application. The list of hashes is updated every three days in the application. In the illustration of how the hybrid solution is used, the user uploads the .csv file to the application, the application computes the hash, then the application compares the hash against the updated list of hashes, and finally, the application prompts the user whether the file is valid. The image of the users experience is the same as Figure 5. It shows how the user uploads the .csv file to the application, how the application computes the hash, then how the application compares the hash against the updated list of hashes, and finally, how the application prompts the user whether the file is valid.
All three solutions offer users the  opportunity to authenticate data from the StatCan website. However, they do not  all equally meet the standards we set in our objectives for the project. While  the offline solution meets our objective of allowing users across the digital  divide to authenticate data, it requires the user to submit the corresponding  signature file to the app. With regard to the online solution, the user only  needs to provide the CSV file, minimizing the number of downloads for the user.  Therefore, the online solution offers better usability compared with the  offline solution. However, the online solution does not meet the requirement to  provide an accessible method of authentication, regardless of the users access  to the Internet. 
For these reasons, we have decided that the  hybrid solution is ideal because it provides a usability level comparable to  the online solution and does not require the user to be online to check the  file they have. This solution addresses the barriers discussed above regarding  consistent access to the Internet. Adding blockchain to the hybrid solution  provides improvements; it affects a subcomponent of the proposed solutionthe  way the hash of a file is stored. StatCan creates the hash of a file and logs  this hash on the ledger. When compared with Figure5, the hashes are  logged on the blockchain, and the app receives the updated list of hashes from it.  The added element of blockchain increases trust between StatCan and the public:  StatCan cannot change the data once it is posted. If StatCan changes the data,  a history of that change is recorded. Another benefit of including blockchain  is that hashes can still be reached if the StatCan website is down, as they are  recorded on the blockchain. Blockchain also offers better archival properties,  as it ensures that the recorded data are reachable over a longer period than if  the data are stored on a server. The server may go down or may not be  continuously maintained, making the data unreachable. Blockchain provides  provenance over the data (hash of the file) for a long time, but does not  actually archive files. A possible drawback of incorporating blockchain into  the hybrid solution is that if the ledger nodes manipulate the list of hashes,  StatCan cannot do anything about ita global network has control over the data.  Ledger nodes are the entities in this network that accept or reject a block of  transactions based on their validity; they broadcast these transactions so all  of the nodes stay up to date. However, in the hybrid solution without  blockchain, StatCan maintains exclusive control.
Al-Kuwari,  S., Davenport, J. H., and Bradford, R. J. (2011). Cryptographic Hash Functions:  Recent Design Trends and Security Notions. Short  Paper Proceedings of Inscrypt 10, 137. Retrieved 2022, from  https://eprint.iacr.org/2011/565.pdf. 
  Aljeaid,  D., Ma, X., and Langensiepen, C. (2014). Biometric identity-based cryptography  for e-Government environment. Proceedings of 2014 Science and Information  Conference, SAI 2014. 581-588. https://doi.org/10.1109/SAI.2014.6918245.
  Batista, D., Kim, H., Lemieux, V.L., Stancic, H., and Unnithan, C. (2021). Block and provenance: How a  technical system for tracing origins, ownership and authenticity can transform  social trust. In V. Lemieux and C. Feng (eds.) Building decentralized trust: Multidisciplinary perspectives on the  design of blockchains and distributed ledgers (First ed., pp. 111128).  Springer. https://doi.org/10.1007/978-3-030-54414-0.
  Bell,  M., Green, A., Sheridan, J., Collomosse, J., Cooper, D., Bui, T., Thereaux, O.,  & Higgins, J. (2019). Underscoring archival authenticity with blockchain  technology. Insights: The UKSG Journal, 32. https://link.gale.com/apps/doc/A594619025/AONE?u=anon~eafb9693&sid=googleScholar&xid=e1544e71
  Canadas  Public Policy Forum. (2014). Northern  Connections: Broadband and Canadas Digital Divide. Public Policy Forum:  Reports. Retrieved January 27, 2022, from https://ucarecdn.com/68b98fff-32c9-4904-904c-09b1d98cdd2e/ 
  Chandler,  S. (2021, December 22). Proof of stake vs. proof of work: Key differences  between these methods of verifying cryptocurrency transactions. Business Insider. https://www.businessinsider.com/personal-finance/proof-of-stake-vs-proof-of-work
  Chowdhury, N. (2019). Inside blockchain, Bitcoin, and  cryptocurrencies. Auerbach.
  Communications  Security Establishment Canada. (2021, December 6). Ministers urge Canadian  organizations to take action against Ransomware. Canada.ca. Retrieved January  26, 2022,  https://www.canada.ca/en/communications-security/news/2021/12/ministers-urge-canadian-organizations-to-take-action-against-ransomware.html 
  Computer  Security Resource Center. (n.d.a). Certification authority. In Computer Security Resource Center: Glossary. Retrieved from, https://csrc.nist.gov/glossary/term/certification_authority 
  Computer  Security Resource Center. (n.d.b). Collision. In Computer Security Resource Center: Glossary. Retrieved from,  https://csrc.nist.gov/glossary/term/collision 
  Computer  Security Resource Center. (n.d.c). Collision resistance. In Computer Security Resource Center: Glossary. Retrieved from, https://csrc.nist.gov/glossary/term/collision_resistance
  Computer  Security Resource Center. (n.d.d). Key generation. In Computer Security Resource Center: Glossary. Retrieved from,  https://csrc.nist.gov/glossary/term/key_generation 
  Computer  Security Resource Center. (n.d.e). Message digest. In Computer Security Resource Center: Glossary. Retrieved from,  https://csrc.nist.gov/glossary/term/message_digest 
  Computer  Security Resource Center. (n.d.f). Node. In Computer  Security Resource Center: Glossary. Retrieved from,  https://csrc.nist.gov/glossary/term/node 
  Computer  Security Resource Center. (n.d.g). Preimage. In Computer Security Resource Center: Glossary. Retrieved from,  https://csrc.nist.gov/glossary/term/preimage 
  Computer  Security Resource Center. (n.d.h). Preimage resistance. In Computer Security  Resource Center: Glossary. Retrieved from,  https://csrc.nist.gov/glossary/term/preimage_resistance 
  Computer  Security Resource Center. (n.d.i). Public key infrastructure (PKI). In Computer Security Resource Center: Glossary. Retrieved from,  https://csrc.nist.gov/glossary/term/public_key_infrastructure. 
  De Filippi, P. D. F. (2018). Blockchain and the law: The  rule of code. Harvard University Press. https://doi.org/10.4159/9780674985933
  Hirose,  S. (2004). Yet another definition of weak collision resistance and its  analysis. In Lim JI., Lee DH. (eds) Information  Security and Cryptology - ICISC2003. ICISC2003. Lecture Notes  in Computer Science, vol.2971. Springer, Berlin, Heidelberg.  https://doi.org/10.1007/978-3-540-24691-6_8 
  Huang,  J., ONeill, C., and Tabuchi, H. (2021, September 3). Bitcoin uses more  electricity than many countries. How is that possible? The New York Times. https://www.nytimes.com/interactive/2021/09/03/climate/bitcoin-carbon-footprint-electricity.html
  Industrial  Research Assistance Program. (2019). Blockchain publishing prototype. National  Research Council of Canada. Government of Canada.  https://nrc-cnrc.explorecatena.com/en
  Ipsos  Public Affairs for Canadas Centre for International Governance Innovation.  (2019). Global Survey Internet Security & Trust. CIGI-Ipsos Global Survey  Internet Security Trust Part I & II: Internet security, online privacy  & trust. Retrieved from, https://www.cigionline.org/sites/default/files/documents/2019%20CIGI-Ipsos%20Global%20Survey%20-%20Part%201%20%26%202%20Internet%20Security%2C%20Online%20Privacy%20%26%20Trust.pdf 
  Jiang,  S. (2021, November 9). Discords hints about crypto, NFTs are tearing its community  apart. Kotaku.  https://kotaku.com/discords-hints-about-crypto-nfts-are-tearing-its-commu-1848023955
  Katz,  A., & Dash, S. (n.d.). Error  correcting codes. Brilliant Math & Science Wiki. Retrieved January 27,  2022, from https://brilliant.org/wiki/error-correcting-codes/ 
Read more here:
Investigating the Use of Blockchain to Authenticate Data from the Statistics Canada Website - Statistique Canada