Pundits in 2023-2024: Siri is terrible and must be fixed. Also, Apple is behind in AI and must do something!
Apple at WWDC 2024: We're fixing Siri which is now powered by Apple Intelligence, our in-house LLM which will also help you edit your local text with suggestions when requested.
Pundits: But, wait, what is this, you trained your LLMs on the open web including our websites, without our permission!
I understand last week's dramatic demonstration of frustration at the discovery that Perplexity and others have been ignoring the request of site owners to not scrape content. That seems like an obvious wrong doing. And it's worth pointing out that Perplexity, OpenAI and others that are being used as a substitute for a standard search engine, are offering up their own summarized content that may be more likely to reduce clicks to original content. This seems like the core of the problem.
I've never made an income from my writing but I understand that independent content creators that are making a living via online publishing need click thru. The indexing and scraping of content is not new, that's what Google and other search engines have been doing for years. It's the presentation of the results that are the real problem.
Perhaps the solution might be an industry standard for the presentation of LLM content to users with a limit to generated text and a greater emphasis on attribution and links to the original sources? For example, consider the traditional display of search results via a search engine with the link at the top followed by a summary. I can easily imagine a page of results that present source links front and center with a summary of the source below. Closer to the traditional search engine result in visual formatting but with a longer, better formatted text description beneath the link.
Moving on to Apple's use of public data scraped from the web, I fail to grasp the problem. If it's being used to train the LLM only, what does it matter? It's a use of text in a generic way, nothing like the above example of Perplexity using scraped and very specific text to reuse in the presentation of search results to users. It's a process of a processing of a mass amount of text to teach the model how language is used. Training an LLM is not about world knowledge, it is about language patterns. These are two very different things.
All of the LLMs have been trained in the same way using massive amounts of text. The reaction of some publishers to this aspect of the use of their text seems over-the-top and, frankly, it feels more like attention seeking outrage.
All that said, I would suggest that most tech folk are missing the most significant problem of AI and that's the increase of energy use and the resulting atmospheric emissions. Unlike others in big tech, Apple has been consistent in its climate goals and commitments and has met many of them. Many of these new features will rely users' local devices and those that don't will connect to Apple's custom servers. That portion of the new offering seems likely to be more aligned with achieving climate goals.
On a very serious downside, Apple will also be adding many millions of new users of ChatGBT via Siri's option to use that service. Of all the concerns swirling around the various AI offerings, the extra energy use and carbon being dumped into the atmosphere should be one that is actively discussed and yet I rarely see Apple pundits bring it up. Why is that?