Google’s AMP, the Canonical Web, and the Importance of Web Standards – EFF

Posted: July 8, 2020 at 4:03 am

Have you ever clicked on a link after googling something, only to find that Google didnt take you to the actual webpage but to some weird Google-fied version of it? Instead of the web address being the source of the article, it still says google in the address bar on your phone? Thats whats known as Google Accelerated Mobile Pages (AMP), and now Google has announced that AMP has graduated from the OpenJS Foundation Incubation Program. The OpenJS Foundation is a merged effort between major projects in the JavaScript ecosystem, such as NodeJS and jQuery, whose stated mission is to support the healthy growth of the JavaScript and web ecosystem. But instead of a standard starting with the web community, a giant company is coming to the community after theyve already built a large part of the mobile web and are asking for a rubber stamp. Web community discussion should be the first step of making web standards, and not just a last-minute hurdle for Google to clear.

This Google-backed, stripped down HTML framework was created with the promises of creating faster web pages for a better user experience. Cutting out slower loading content, like those developed with JavaScript. At a high level, AMP works by fast loading stripped down versions of full web pages for mobile viewing.

The Google AMP project was announced in late 2015 with the promise of providing publishers a faster way of serving and distributing content to their users. This also was marketed as a more adaptable approach than Apple News and Facebook Instant Articles. AMP pages began making an appearance by 2016. But right away, many observed that AMP encroached on the principles of the open web. The web was built on open standards, developed through consensus, that small and large actors alike can use. Which, in this case, entails keeping open web standards in the forefront and discouraging proprietary, closed standards.

Instead of utilizing standard HTML markup tags, a developer would use AMP tags. For example, heres what an embedded image looks like in classic HTML, versus what it looks like using AMP:

HTML Image Tag:

src

AMP Image Tag:

Since launch page speeds have proven to be faster when using AMP, the technologys promises arent necessarily bad from a speed perspective alone. Of course, there are ways of improving performance other than using AMP, such as minimizing files, building lighter code, CDNs (content delivery networks), and caching. There are also other Google-backed frameworks like PWAs (progressive web applications) and service workers.

AMP has been around for four years now, and the criticisms still carry into today with AMPs latest progressions around a very important part of the web, the URL.

When you visit a site, maybe your favorite news site, you would normally see the original domain along with an associated path to the page you are on:

https://www.example.com/some-web-page

This, along with its SSL certificate would clarify that you are seeing web content served from this site at this URL with a good amount of trust. This is what would be considered a canonical URL.

An AMP URL, however, can look like this:

https://www.example.com/platform/amp/some-web-page

Using canonical URLs, users can more easily verify that the site theyre on is the one theyre trying to visit. But AMP URLs muddied the waters, and made users have to adapt new ways to verify the origins of original content.

One step further is their structure for pre-rendered pages from cached content. This URL would not be in view of the user, but rather the content (text, images, etc.) served onto the cached page would be coming from the URL below.

https://www-example-com.cdn.ampproject.org/c/www.example.com/amp/doc.html

The final URL, the one in view or the URL bar, of a cached AMP page would look something like this:

https://www.google.com/amp/www.example.com/amp.doc.html

This cache model does not follow the web origin concept and creates a new framework and structure to adhere to. The promise is better performances and experience for users. Yet, the approach is implementation first and web standards later. Since Google has become such an ingrained part of the modern web for so many, any technology they deploy would immediately have a large share of users and adopters. This is also paired with other arguments other product teams within Google have made to reshape the URL as we know it. This fundamentally changed the way the mobile web is served for many users.

Another, more recent development is the support for Signed HTTP Exchanges, or SXG, a subset of the Web Packages standard that allows further decoupling of distribution of web content from its origins with cryptographically signed HTTP exchanges (a web page). This is supposed to address the problem, introduced by AMP, that the URL a user sees does not correspond to the page theyre trying to visit. SXG allows the canonical URL (instead of the AMP URL) to be shown in the browser when you arrive, closing the loop back to the original publisher. The positive here is that a web standard was used, but the negative here is the speed of adoption without general consensus from other major stakeholders. Currently, SXG is only supported in Chrome and Chromium based browsers.

News publishers were among the first to adopt AMP. Google even partnered with a major CMS (content management system), WordPress, to further promote AMP. Publishers use CMS services to upload, edit, and host content, and WordPress holds about 60% of the market share as the CMS of choice. Publishers also compete on other Google products, such as Google Search. So perhaps some publishers adopted AMP because they thought it would improve SEO (search engine optimization) on one of the webs most used search engines. However, this argument has been disputed by Google, and they maintain that performance is prioritized no matter what is used to get that page result to that performance measure. Since the Google Search algorithm is mainly in secret, we can only trust these statements at their word. Tangentially, the Top Stories feature in Search on mobile has recently dropped AMP as a requirement.

The AMP project was more closed off in terms of control in the beginning of its launch despite the fact it promoted itself as an open source project. Publishers ended up reporting higher speeds, but this was left up to a time will tell set of metrics. In conclusion, the statement you dont need AMP to rank higher is often competing with just use AMP and you will rank higher. Which can be tempting to publishers trying to reach the performance bar to get their content prioritized.

We should focus less about whether or not AMP is a good tool for performance, and more about how this framework was molded by Googles initial ownership. The cache layer is owned by Google, and even though its not required, most common implementations use this cache feature. Concerns around analytics have been addressed and they have also done the courtesy of allowing other major ad vendors into the AMP model concerning ad content. This is a mere concession though, since Google Analytics has such a large market share of the measured web.

If Google was simply a web performance company that would still be too much centralization of the webs decisions. But they are not just a one-function company, they are a giant conglomerate that already controls the largest mobile OS, web browser, and search engine in the world. Running the project through the OpenJS Foundation is a more welcome approach. The new governance structure consists of working groups, an advisory committee, and a technical steering committee of people inside and outside of Google. This should bring more voices to the table and structure AMP into a better process for future decisions. This move will allegedly de-couple Google AMP Cache, which hosts pages, from AMP runtime, which is the JavaScript source to process AMP components on a page.

However, this is all well after AMP has been integrated into major news sites, e-commerce, and even nonprofits. So this new model is not an even-ground, democratic approach. No matter the intentions, good or bad, those who work with powerful entities need to check their power at the door if they want a more equitable and usable web. Not acknowledging the power one wields, only enforces a false sense of democracy that didnt exist.

Furthermore, the web standards process itself is far from perfect. Standards organizations are heavily dominated by members of corporate companies and the connections one may have to them offer immense social capital. Less-represented people dont have the social capital to join or be a member. Its a long way until a more equitable process occurs for these types of organizations; paired with the lack of diversity these kinds of groups tend to have, the costs of membership, and time commitments. These particular issues are not Googles fault, but Google has an immense amount of power when it comes to joining these groups. When joining standards organizations, Its not a matter of earning their way up, but deciding if they should loosen their reigns.

At this point in time with the AMP project, Google cant retroactively release the control it had in AMPs adoption. And we cant go back to a pre-AMP web to start over. The discussions about whether the AMP project should be removed, or discouraged for a different framework, have long passed. Whether or not users can opt-out of AMP has been decided in many corners of the web. All we can do now is learn from the process, and try to make sure AMP is developed in the best interests of users and publishers going forward. However, the open web shouldnt be weathered by multiple lessons learned on power and control from big tech companies that obtusely need to re-learn accountability with each new endeavor.

Link:

Google's AMP, the Canonical Web, and the Importance of Web Standards - EFF

Related Posts