A committee under the Department of Telecommunications has released a draft framework of the Indian Artificial Intelligence Stack which seeks to remove the impediments to AI deployment, and essentially proposed to set up a six-layered stack, each handling different functions including consent gathering, storage, and AI/Machine Learning (AI/ML) analytics. Once developed, this stack will be structured across all sectors, including data protection, data minimisation, open algorithm frameworks, defined data structures, trustworthiness and digital rights, and data federation (a single database source for front-end applications), among other things. The paper also said that there is no uniform definition of AI.
This committee AI Standardisation Committee had, in October last year, invited papers on Artificial Intelligence, addressing different aspects of AI such as functional network architecture, AI architecture, and data structures required, among other things. At the time, the DoT had said that as the proliferation of AI increases, there is a need to develop an Indian AI stack so as to bring interoperability, among other things. Here is a summary of the draft Indian AI Stack, comments to which can be emailed at aigroup-dot@gov.in or diradmnap-dot@gov.in, until October 3.
The stack will be made up of five main horizontal layers, and one vertical layer:
This is the root layer of the Indian AI stack over which the entire AI functionality is built. The layer will ensure setting up of a common data controller, and will involve multi-cloud scenarios both private and public clouds. This is where the infrastructure for data collection will be defined. The multilayer cloud services model will define both relations between cloud service models and other functional layers:
This layer will have to define the protocols and interfaces for storing hot data, cold data, and warm data (all three defined below). The paper called this as the most important layer in the stack regardless of size and type of data, since value from data can only be derived once it is processed. And data can only be processed efficiently, when it is stored properly. It is important to store data safely for a very long time while managing all factors of seasonality and trends, ensuring that it is easily accessible and shareable on any device, the paper said.
The paper has created three subcategories of data depending on the relevance of data and its usability:
Categories of data
This layer, through a set of defined protocols and templates ensures an open algorithm framework. The AI/ML process could be Natural Language Processing (NLP), deep learning and neural networks. This layer will also define data analytics that includes data engineering, which focuses on practical applications of data collection and analysis, apart from scaling and data ingestion. The technology mapping and rule execution will also be part of this layer.
The paper acknowledged the need for a proper data protection framework: the Compute layer involves analysis to mine vast troves of personal data and find correlations, which will then be used for various computations. This raises various privacy issues, as well as broader issues of lack of due process, discrimination and consumer protection.
The data so collected can shed light on most aspects of individuals lives. It can also provide information on their interactions and patterns of movement across physical and networked spaces and even on their personalities. The mining of such large troves of data to seek out new correlations creates many potential uses for Big Personal Data. Hence, there is a need to define proper data protection mechanism in this layer along with suitable data encryption and minimisation. from the paper
The compute layer will also define a new way to build and deploy enterprise service-oriented architectures, along with providing transparent computing architecture over which the industry could develop their own analytics. It will have to provide for a distinction between public, shared and private data sources, so that machine learning algorithms can be applied against relevant data fields.
The report also said that the NITI Aayog has proposed an AI specific cloud compute infrastructure which will facilitate research and solution development in using high performance and high throughput AI-specific supercomputing technologies. The broad specifications for this proposed cloud controller architecture may include:
Proposed architecture of AI specific controller
The paper described this as a purpose-built layer through which software and applications can be hosted and executed as a service layer. This layer will also support various backend services for processing of data, and will provide for backend services and a proper service framework for the AI engine to function. It will also keep track of all transaction across the stack, helping in logging auditing activities.
This layer will define the end customer experience through defined data structures and proper interfaces and protocols. It will have to support a proper consent framework for access to data by/for the customer. Provision for consent can be for individual data fields or for collective fields. This layer will also host gateway services. Typically, different tiers of consent will be made available to accommodate different tiers of permissions, the paper said.
This layer also needs to ensure that ethical standards are followed to ensure digital rights. In the absence of a clear data protection law in the country, the EUs General Data Protection Regulation (GDPR) or any of the laws can be applied. This will serve as interim measure until Indian laws are formalised, the paper said.
This layer will ensure the process of security and governance for all the preceding five horizontal layers. There will be an overwhelming flow of data through the stack, which is why there is a need to ensure encryption at different levels, the paper said. This may require setting up the ability for handling multiple queries in an encrypted environment, among other things. Cryptographic support is also an important dimension of the security layer, the paper said.
Why this layer is important, per the paper: data aggregated, transmitted, stored, and used by various stakeholders may increase the potential for discriminatory practices and pose substantial privacy and cybersecurity challenges. The data processed and stored in many cases include geolocation information, product-identifying data, and personal information related to use or owner identity, such as biometric data, health information, or smart-home metrics
Data storage in backend systems can present challenges in protection of data from cyberattacks. In addition to personal-information, privacy concerns, there could be data used in system operation, which may not typically be personal information. Cyber attackers could misuse these data by compromising data availability or changing data, causing data integrity issues, and use big data insights to reinforce or create discriminatory outcomes. When data is not available, causing a system to fail, it can result in damagefor example a smart homes furnace overheats or an individuals medical device cannot function, when required. from the paper
How the proposed AI stack looks like
According to the report, the key benefits of this proposed AI stack are:
This is how the paper proposes data flow through the stack:
Proposed AI flowchart
In AI, the thrust is on how efficiently data is used, the paper said, noting that if the data is garbage then the output will also be so. For example, if programmers or AI trainers transfer their biases to AI; the system will become biased, the paper said. There is a need for evolving ethical standards, trustworthiness, and consent framework to get data validation from users, the paper suggested.
The risks of passive adoption of AI that automates human decision-making are also severe. Such delegation can lead to harmful, unintended consequences, especially when it involves sensitive decisions or tasks and excludes human supervision, the paper said. It gave the example of Microsofts Twitter chatbot Tay as an example of what can happen when garbage data is input into an AI system. Tay had started tweeting racist and misogynist remarks in less than 24 hours of its launch.
Need for openness in AI algorithms: The paper said it was necessary to have an open AI algorithm framework, along with clearly defined data structures. It referenced on how the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) software used by some US courts in predicting the likelihood of recidivism in criminal defendants was demonstrated to be biased since the AI black box was proprietary.
As AI learns to address societal problems, it also develops its own hidden biases. The self learning nature of AI means, the distorted data the AI discovers in search engines, perhaps based upon unconscious and institutional biases, and other prejudices, is codified into a matrix that will make decisions for years to come. In the pursuit of being the best at its task, the AI may make decisions it considers the most effective or efficient for its given objective, but because of the wrong data, it becomes unfair to humans, the report said.
Need to centrally control data: Right after the paper made a pitch for having openness in AI algorithms, it proposed that the data fed into the AI system should be controlled centrally. The data from which the AI learns can itself be flawed or biased, leading to flawed automated AI decisions. This is certainly not the intention of algorithmised decision-making, which is perhaps a good-faith attempt to remove unbridled discretion and its inherent biases. There is thus a need to ensure that the data is centrally controlled including using a single or multiple cloud controllers, the report said.
Proper storage frameworks for AI: An important factor in aiding biases in AI systems is contamination of data, per the paper, which includes, missing information, inconsistent data, or simply errors. This could be because of unstructured storage of data. Thus, there is a need to ensure proper storage frameworks for AI, it said.
Changing the culture of coders and developers: There is a need to change the culture so that coders and developers themselves recognise the harmful and consequential implication of biases, the paper said, adding that this goes beyond standardisation of the type of algorithmic code and focuses on the programmers of the code. Since much coding is outsourced, this would place the onus on the company developing the software product to enforce such standards. Such a comprehensive approach would tackle the problem across the industry as a whole, and enable AI software to make fair decisions made on unbiased data, in a transparent manner, it added.
In the near future, AI will have huge implications on the countrys security, its economic activities and the society. The risks are unpredictable and unprecedented. Therefore, it is imperative for all countries including India to develop a stack that fits into a standard model, which protects customers; users; business establishments and the government.
Economic impact: AI will have a major impact on mainly four sectors, per the paper: manufacturing industries, professional services, financial services, and wholesale and retail. The paper also charted out how AI could be used in some specific sectors. For instance, in healthcare, it said in rural areas, which suffer from limited availability of healthcare professionals and facilities, AI could be used for diagnostics, personalised treatment, early identification of potential pandemics, and imaging diagnostics, among others.
Similarly, in the banking and financial services sector, A can be used for things like development of credit scores through analysis of bank history or social media data, and fraud analytics for proactive monitoring and prevention of various instances of fraud, money laundering, malpractice, and prediction of potential risks, according to the report.
Uses for the government: For governments, for example, cybersecurity attacks can be rectified within hours, rather than months and national spending patterns can be monitored in real-time to instantly gauge inflation levels whilst collecting indirect taxes.
Excerpt from:
All you need to know about the Indian AI Stack - MediaNama.com