* Posts by that one in the corner

4459 publicly visible posts • joined 9 Nov 2021

LG to offer subscriptions for appliances and televisions

that one in the corner Silver badge

No LG Talkie Toaster subscription!

Thankfully, LG don't currently make a toaster, although they certainly tried their best back in 2006.

You'll be reassured by the fact that Smeg definitely do make toasters. So far, those do not require a subscription to anything.

Feds want to see what ChatGPT's content is made of

that one in the corner Silver badge

It is very disappointing to see the FTC's request start with a leak

Are FTC investigations supposed to be kept a secret? Serious question.

And what are the chances that, if the FTC aren't happy, we will be able to see the material they gathered?[1]

Can more knowledgeable people chime in on this, please?

[1] I'm assuming that anything found is kept privileged until a slap on the wrist is deemed necessary.

Three signs that Wayland is becoming the favored way to get a GUI on Linux

that one in the corner Silver badge

> Unless they learn the legacy first they won't know that which they're reinventing is supposed to do.

Like 100% supported network transparency between any two boxes.

As Wikipedia (still) says "a compositor can implement any remote desktop protocol" but will it - further, will *all* of them provide both sides of at least one such protocol?

NASA to store pair of probes it's built but can’t send to target asteroids

that one in the corner Silver badge

Re: Send the probes after Snoopy!

> Starman has listened to Space Oddity 538,711 times since he launched in one ear, and to Is there Life On Mars? 725,891 times in his other ear.

I suggest keeping as far away from that Tesla as we can. After that many repeats he will have built up one hell of a case of road rage.

And I bet he hasn't been signalling that turn around the Sun.

that one in the corner Silver badge

Re: Junk boxes

If NASA follow the suggestion of publishing the details of these birds, maybe you'll find you are well on the way to making your own

Though you may need to raid the cutlery drawer for the potato masher <metallic laughter>

Funnily enough, AI models must follow privacy law – including right to be forgotten

that one in the corner Silver badge

Re: Everyone says my data is covered by GDPR but is it?

The same question applies to the US HIPAA regulations.

HIPAA keeps getting bandied about as though it prevents all transmission of US health data except for use by doctors and insurers, even to the point of filling YouTube with videos of people screaming "you can't ask me that, HIPAA it is illegal to ask me that".

But HIPAA is quite clear that it only refers to the transmission of medical data *from* doctors, insurers and other specifically named "entities", in order to stop them selling the details on so that Stanner know you are in need of a new stairlift. If you have passed any of your medical data onto anyone else (say, FaceBook) then it is fair game. If a supermarket infers your medical status from your shopping habits (the apocryphal " congratulations on your pregnancy" stories) they can, and will, sell that on.

It certainly appears that citing training on medical information isn't against HIPAA *unless* you can demonstrate that the data was supplied by a "covered entity". Good luck with that, especially given the point that the LLM may have just hallucinated - or even more simply, just correlated its data - when it printed out that you have a dicky heart and a limp due to undiagnosed gout.

that one in the corner Silver badge

Everyone says my data is covered by GDPR but is it?

GDPR gets flung around a lot in these discussions, but then so does scraping of publicly available information, including stuff posted on FaceBook etc.

My understanding is that GDPR and equivalents are concerned with what I do with data that I have asked you to supply, for some specific reason or other. For example, I ask for your email and an address when you join a local astronomy society because I am going to send you a paper newsletter every two months, with alerts on interesting observations via email. You agree to these uses, even ask to read my data retention policy (you get one more newsletter and emails for two weeks once your subscription ends, then we delete it) first. GDPR in force.

You decide to put your email and home address up onto a public forum and I, along with every Tom, Dick and Harry, can read it. You have handed that over of your own free will, it hasn't been leaked from anywhere, no-one has breached GDPR to make it readable to all. Does the GDPR even get involved in that situation?

If yes, can you give a citation, please. Seriously.

If no, then anyone reading (or "scraping") it can do so with impunity.

that one in the corner Silver badge

Re: Certify Your Corpus

Yes, AI Winter is a bad thing - it is a drying up of funding into actual research and has sod all to do with all the over-hyped bollocks that is going on now, with "AIs" that are pretty damn useless (the risks from hallucinations for starters makes the current deployments stupid).

We want research into pretty much everything to continue at a steady pace, so that people can make a career out of it and actually get research done. Chopping off funding - or just the fear of doing that - causes research to slow down, even stop, and it becomes hard to pick it up again (e.g. if all the experienced tutors retire).

AI research is a bit of a weird one: one of the old sayings was "if we've figured out how to do it, it isn't an AI question any more". In other words, even if you don't believe that "hard AI" is achievable the spin-offs are worth having (spin-offs from many other domains are also highly valuable, it is just that, having realised you could make an MRI, you don't turn aroynd and say "oh, that isn't nuclear physics anymore" [1]!)

[1] oops, bad example. Even I've just used the current term "MRI" which does, indeed, drop the scary "nuclear" word!

that one in the corner Silver badge

Re: 7 Data Protection Principles

Real researchers, as a group if not any specific individual, are interested in unlearning (or, at least, they were). Admittedly, for correcting erroneous data - including on-fly-recalibration and non-monotonic reasoning - rather than following "right to be forgotten".

These guys doing the hype for the money? Nah, they don't give a damn. Don't think they even care if their systems grind to a halt in a year, due to regulation or anything else. Just so long as they can cash out in time.

Bard will just be another Google Beta that vanishes and so on.

that one in the corner Silver badge

Re: Question: what about my memory

Not the same Universe as the Cybermen (nor that other Universe with the Cybermen) but they prompted the thought:

How will these principles be applied once we have abandoned the thinking machines in favour of Mentats?

It is quite clear that not all Mentats are honourable enough to apply the suggested "list of events you must not remember" and amongst those that are honourable they would still be bound to recall everything in the name of Duty to their House: Fred was found innocent and the charges struck, but he may still be vulnerable to blackmail and hence a weak spot.

that one in the corner Silver badge

Re: Question: what about my memory

Nice response.

From my (old and probably horribly out of date) reading, not all 'eidetic' memories work in a way that would allow them to cross-reference a list like that with, say, what they are 'reading' off the memory of a book page[1], so having that list wouldn't really help. But then maybe those people wouldn't be suitable as a Turk in the first place, certainly not for a general-purpose search engine. So, yup, yah called me out on it.

[1] there being a difference between asking "what was on that list?" versus asking "Is this name on that list?" - the latter can result in running through the list from top to bottom looking for the name[2]. What we would call one of the "idiot savant" behaviours in the old days.

that one in the corner Silver badge

Re: The day after the day before

> What's an AI trained only on yesterday's facts going to say about so-and-so?

This highlights an issue with the way these programs are *used* more than anything else (and, yes, how they are being hyped, which is worst).

I'm sure you were just going for the quick comment, but, like so very many, you are saying "an AI is going to say about X" as though the intent of every AI is just to be a glorified search engine for the entirety of published information, an AI is just as good as another and X can be anything under the sun.

The 200 kilo gorillas (OpenAI et al) are trying their damnedest to promote that view[1] but we also know full well that they are the ones that will wriggle their way out of regulatory retaliation - it is anyone smaller, who is doing sensible, targeted smaller-scope use of these models that will be hit proportionately hardest, potentially bringing them to standstill.

But the affordable way to create a targeted model is to get a pre-trained model (buy it in or there are freebies out there) and then shovel in all the domain-specific data relevant to your usage.

Immediately, the problem is obvious: the pre-trained model contains all sorts of guff. You only want it for its general ability to take in queries in natural language and synthesise results within your domain. So long as that is all that the thing is used for you only have to worry about updating your domain knowledge and issuing new models appropriate to that (which may well be "20 years" if the model is accompanying a piece of machinery that is expected to just keep on chuntering away, with the maintenance suggested by you AI assistant). But what happens when someone - maliciously or otherwise - presents a query outside of your domain and it tells you that Old Fred was, indeed, convicted as described because the pre-training corpus dragged in a pile of court records in order to gain that clever-sounding verbose way of talking?[2]

[1] mainly because it is the easiest route for them to take, sheer laziness: just suck up everything and if the pile of nadans gets too big just buy more processor/GPU/RAM power - i.e. throw money at it.

[2] maybe the AI assistant is also writing letters to the owner of the machinery, advising them - in perfect legalese - that they need to pay for repairs 'cos they didn't do the maintenance properly and it ain't the manufacturer's fault, don't get any funny ideas.

that one in the corner Silver badge

Certify Your Corpus

> If the mechanisms in use do not comply with the rules applied to them, then they should not be used until they do.

Totally agree.

However (oh no, here it comes, another rambling post!):

Can we make a clear distinction about what "the mechanisms" are and control each appropriately? Just 'cos there is the risk of over-reaction, not separating the players from the game, throwing the baby out with the bath water, and creating another "AI Winter".

Please, still put the boot into OpenAI, Bard and that other one - they've deliberately[1] pissed on GDPR etc (just group them all together as "privacy" for the moment). But, despite their own hype, they aren't the be-all and end-all.

Now, as with every other system, we've got data collection, data storage, data transformation and data retrieval happening. In (the current crop of) LLMs the first is creating a training corpus, the second the actual training run, the remaining two are munged together when the model is used. Trivially, to comply with privacy, you have precisely two choices: don't feed your system with vulnerable data in the first place or make sure that the stored data can be found and eradicated as required (and without trashing your system as a result[2]). We want to be sure that The Rules reflect the two options (or at least The Procedures Required to comply with The Rules).

The (current) LLMs are, by their very nature, incapable of the second option: some of them have proven to be tweakable (the Rank-One Model Editing (ROME) algorithm) but that isn't anything to rely on. The current Reg article notes some alternate ways to structure the models that will help but we aren't there yet (which is why the risk of another AI Winter is a concern, as that'll shut down the bulk of research into said structuring).

So, right now applications of LLMs[3] can only be managed via the first option. So:

We need to have certification applied to the training data and enforceable requirements on the systems that use certain certifications of the training data - plus a very large rolled-up newspaper applied to the existing suppliers of the training data[4] to get them certified. The requirements would then be along the lines of:

* All systems must identify the corpora used and their certification (this is the big change from the current situation)

* No certificate? Can only be used in-house for limited purposes (e.g. research into option two; demos to PHBs that using this stuff will get them sued), no public access to the model, publicly released responses from it allowed only in research papers with full attribution of the corpus (enough to ensure the results can be replicated, only within another house)

* Certified clean of infringing data (e.g. only uses 18th Century novels)[5]? No restrictions on use.

* Known to contain a specific class of data (e.g. medical records)? Restricted access to identified classes of people, must detail the start and end dates of the data collected, where it was collected, intended use; a stated expiry date (set to comply with the relevant usage - e.g. European data expires within the "right to be forgotten" time) - at the end of the expiry on the corpus, it must be updated and re-certified, any models trained on it must be deleted and new ones trained from the re-certified corpus (and there is an opportunity for supplying automation to users of the models)

* Variants of the "medical data" are required, for example data from a proper double-blind study will be accompanied by any appropriate releases by the members of the study and won't have an expiry date.

* And so on[7]

[1] either it was deliberate or they were all lying through their teeth about how expert and knowledgeable their teams are - or both, of course.

[2] if you just go around cutting bits out of the Net then it is very likely that you'll just increase the rate of hallucinations: if you pose a query to one of these models, you *will* get a reply out; if it can't synthesise something "sensible" because the highly-correlated paths have been broken then it'll just light up the less likely paths and bingo "Donald Trump rode a brontosaur in the Wars of the Roses" when we all know that it was Abraham Lincoln on a T-Rex in the Black Hawk War.

[3] and they are going to be applied, however one feels about that, whilst there is the perception (founded or unfounded) that there is money to be made by doing so. Well, duh, but I wish applications were better thought out than that.

[4] yes, no doubt the well-known names have done a lot of collecting (sscraping) themselves, but they also pulled in pre-existing text corpora; if you are making your own LLM there are suppliers from which you can get a raw-data text corpus or a "pre-trained" model that has already been fed on "the general stuff" (and you are expected to continue training on domain-specific texts).

[5] or any other more sensible[6] criteria that you feel will comply with the concept of "doesn't contain iffy data" or even "has a less than k% chance of containing iffy data" on the grounds that everything in real life has risks and we're looking at managing them.

[6] unless you are doing linguistic research, in which case this is a perfectly sensible corpus

[7] if I try to continue listing variants this will never get posted in time[8]

[8] oi, rude!

that one in the corner Silver badge

Re: Question: what about my memory

> He is a person, not a public-access search tool.

What if he was both? For example, signed up as a Mechanical Turk and automatically passed a query that had otherwise failed?

I know, rather unlikely, who would ever do that etc, but similar scenarios have been discussed as a midway between a totally man-made knowledge base (expensive, slow) and one created the way that the current crop of LLMs are doing it (ingestion of everything in sight, as fast as possible, and fingers crossed something usable comes out of it).

Indian developer fired 90 percent of tech support team, outsourced the job to AI

that one in the corner Silver badge

Re: The last Indian Call Centre...

I was foolish enough to move house after the Demon helpdesk did similarly.

Background: I was in the Tenner A Month prepay that got Demon set up, dial up only of course; paid on my personal credit card but address "care of workplace" as I was renting various small apartments. Bought a house, now paying via Direct Debit on my bank account, they know my home address, yay. Years pass, ADSL becomes available, Demon account moved over, no trouble at all: none of this "ISP provides a ready set up router", all DIY - they even looked up the modem model after I said what I'd bought and agreed it was pucka.

Move again and phone to get the ADSL onto the new number. By now, Demon had also moved on, apparently into its dotage. First, apparently I wasn't the account holder and had no authority to make any changes - seems they had copied across my first account address and no longer understood what "care of" means, let alone that the DD is obviously from a personal account and they had had my long-term address for years! In the end, they created a new account and I cancelled the old DD!

Oh, new account - I've just lost the Demon user name I'd had since Day 1, when Tony and I were hacking at the KA9Q variants that a number of other Demon users were also running. No, it is not possible to change the username on the account, don't be absurd. Actually, one person *did* try, which ended up with my having a *third* username! Don't ask, I have no idea.

Once ADSL was running had no reason to risk tech support again, but the next time they faffed with something (email IIRC) obviously jumped to another ISP (el reg commentards recommended Zen at that time).

PS anyone want a couple of router/modems set up for Demon?

that one in the corner Silver badge

Re: I can see that kind-of "working"

Go stick your head in a pig!

Elon Musk launches his own xAI biz 'to understand reality'

that one in the corner Silver badge

"to understand reality."

And then have the patience to explain to Elon? Is the man showing some self-awareness at last?

"These deadline claims are short, those deliverable products are far away"

Tech execs turn to drink and drugs as job losses mount

that one in the corner Silver badge

Re: So how does that compare to those with similar white collar jobs

> how do we know...

I ain't gonna do them drugs and drink no more; ain't gonna do them no less neither!

Alberto Y Lost Trios Paranoias

(approximately)

Man who nearly killed physical media returns with $60,000 vinyl turntable

that one in the corner Silver badge

Nobody tell them we can measure continental drift down the subsea cables or they'll be flogging a special spring-loaded cable tie to "dampen out the bathypelagic flow signals" from the subcarpet speaker cables.

Hey, I've just this great idea for a Kickstarter campaign...

that one in the corner Silver badge

Re: More money in music these days and far fewer groups

Ba ba ba ba ba ba ba tweet tweet tweet

that one in the corner Silver badge

Re: Surprisingly orthodox for Jony

You are not getting the full effect from just the images on the web.

To enjoy it as Ives intends, the LP12-50 must be placed upon a perfectly flat and level surface, alongside an original LP12. They must be set at an angle of precisely 3.6 degrees from each other and 2.78 inches apart. Arrange your amp and speakers so that the stereo sweet spot is exactly on the line separating the two decks and no more than seven feet away. As you listen to an LP on the new model, defocus your eyes as though looking at a 3D magic image and let the two machines merge into a single form: hold onto this for the first two thirds of the side.

You will now see the subtle differences between the two as a ghostly shimmering, fading in and out of your reality as the rotating LP on one side captures your attention and then releases it again. The newer, more rounded, corners will float before you, pulsating gently. The drift of the tone arm will send colours you have never seen before into your consciousness as the migraine slowly builds.

As the room darkens around you and your vision draws in on itself, the last thing you will observe is the true beauty of this new form sculpted, as only Jony can, from the obsolete forms of yesteryear.

that one in the corner Silver badge

Re: More money in music these days and far fewer groups

Ta - you've reminded me to pick up those Led Zeppelin albums (I'm slowly gathering the stuff that I used to listen to friends' copies off, when we were living closer).

that one in the corner Silver badge

We take the freshest of deuterium nuclei, plucked from the morning dew on the Alpen grasslands, delicately swirled in a classically formed toroidal confinement chamber whose hand-polished curves accentuate the delicate purple hues of the all-natural plasma. Only the most dynamic and excited neutrons are collected in the fluffiest of thermal blankets, heating the organically sourced coolant, to provide the smoothest and most refreshing electron flow available today.

This isn't just any fusion power, this is M&S fusion power.

that one in the corner Silver badge

Re: Balmuda toaster

Close - it has an Artesian Water mode.

Sarah Silverman, novelists sue OpenAI for scraping their books to train ChatGPT

that one in the corner Silver badge

Re: Analyzing isn't copying

Novel argument: two files are around about the same size, therefore they have the same content.

that one in the corner Silver badge

Re: Analyzing isn't copying

It will have read just about all of the Bible and Shakespeare in the form of those very quotations, many, many times, as well as the complete works themselves (and whopping great chunks of each, much longer than your average quote) again, many, many times.

There will be massive correlations between "Out, out" and "damn spot".

Which fits TrueJim's description.

The same will be true for everything you can think of that is considered quotable. It will have been quoted.

If you wish to demonstrate otherwise, you have got to get it to (re)generate something thoroughly dull, that it will only have seen a few times - preferably, only from one single source - and see what it kind of prompting it needs to do that.

that one in the corner Silver badge

The "Copyright management Information" was just the normal stuff like author's name, book title, ISBN (it is described in the PDFs linked to from the article).

I.e. the stuff you would include (some of) as the attribution.

There is no legal problem with leaving great chunks of it out (aka "removing it"): library of congress record number, ISBN, copyright date, printing date, .... Unless, of course, you are explicitly storing a wholesale copy of the work (but even then, just 'cos otherwise the copy is incomplete qua being a copy). For purposes of attribution, book title and author is generally sufficient (academic references and the like require more precision, but not for copyright compliance).

Whether a summary needs to include a statement of what it is summarising (as in, the attribution) is arguable, as I've noted elsewhere: it depends upon how the material is published: if the session log shows that the question provided that identification, there is no necessity for the answer to repeat it (but if only a part of that log is reproduced elsewhere ...)

that one in the corner Silver badge

Re: OpenAI could have avoided all this

> Furthermore, if you think that this kind of negotiation is quick, cheap or easy, where have you been living for the last two decades

Did anyone say it would be easy or quick? Cheap, yes, in terms of the daft sums being poured into the creation of LLMs.

If getting reduced-cost access is too annoying for them, they don't need it: just buy at the standard market price, one book at a time. If that is the better option, that is the one they'd take, but they'd be bound to try asking, just on the off chance (be embarrassing in the finance meeting otherwise).

Hardly a deal-breaker for them, either way. <shrug>

that one in the corner Silver badge

Re: Hmmm, I wonder..

Ban LLM "scraping"[1] and next year, LLMs are no longer being used, we have much better algorithms that we want to test on all this lovely text, off we go.

Ban AIs reading them? But what is an AI? Nope, this is just a program to build up a search index - yes, it does a Certified Really Clever General AI in *that* module, but all that the complete program does is create a really good search index (it is a bit over engineered, if I'm honest, but the CRCGAI was going cheap). Or even " come off it, AI? This is an LLM, you know full well we don't consider them as AIs, not after the Great Hype Ban of 2040".

Ban any machines reading the full texts of, say, novels? Bang goes the study of linguistic forms and how they change (with a whopping great note added by the researcher naming and shaming all the authors and publishers standing in the way of linguistics research).

[1] how is " scraping" different to "reading", btw? The use of unlicenced copyright material? Because that is already disallowed by existing laws and all you ought to need to do is buy a copy from the publisher.

that one in the corner Silver badge

Re: Hmmm, I wonder..

Regurgitating them wholesale, with or without attribution, not allowed (unless you specify otherwise - i.e. you explicitly licence it, perhaps using one of the CC licences).

Creating a précis of them and publishing that without attribution - that is the different matter.

What if a Stackoverflow answer is just a quick précis (such as leaving out all the reasons why) of something the person learnt from your web page?

Does it matter whether the précis was typed out by a human or by a program? Or a human using the program? A human who "read it somewhere and can't remember where"? A program that "read it somewhere and can't remember where"?

What if your web page was the only place that such a brilliant piece of work was ever published and the human could have been reminded where they read it with only a quick search? With a day's hard web searching, because they didn't know the magic search term?

that one in the corner Silver badge

> If, after scraping a lot of novels, an LLM can knock one out in minutes then they stand to lose future income.

Hmm, "if". Surely you can't sue on an "if"?[1]

*When* an LLM has been told to knock out a novel - and, note, it would have to be a novel that clearly infringed on that person's work, not just any old novel, then there may be a case to answer for. But is it OpenAI *or* the person who asked for the novel to be created *or* the person who published it to the world who should be tackled?[2]

Remember, as a private individual you can write a blindingly obvious ripoff of a novel, only having changed the name of the heroine to your own. So long as you keep it locked in your bedside cabinet and only read it to yourself on chilly winter nights, you will be fine. Publish it and prepare to be damned.

[1] okay, yes, that person who has brought a case just on the off-chance that she one day offers wedding websites *and* is asked to create one for a gay couple *and* they sue *if* she refuses, she is trying to bring a case on an "if". Wild.

[2] yes, yes, it will be OpenAI, because they have all the money in the bank - never mind that in this scenario they made either no money or the same as they charge for any arbitrary k many words of output (i.e. the didn't profit because it was a novel that was generated by the runtime they sold).

that one in the corner Silver badge

Re: OED

Upvoted - but can you point to where removal of DRM is given as part of the complaint? I've read the PDFs but must have missed that. Ta.

If you were thinking of this bit of the article:

> certain copyright management information that would have been included in the legit, copyrighted books. This is the basis of the third count they allege against OpenAI, a claim it breached the DCMA by removing the copyright management info.

This isn't referring to DRM - the PDFs describe that "info" as being ISBN, author name, book title etc - i.e. attribution.

But the PDFs also describe that the LLM was prompted to give a precis - and if you ask "precis Twilight book 1" and the session log (which is mentioned as an exhibit in at least one of the cases, but I've not seen) shows the LLM diving straight in, there is a fair argument (IMO, IANAL) that you have to take the question and answer together[1], in which case the attribution is (probably) clearly visible[2]

[1] Q: "What is 2 + 2?" A: "4" - that is a reasonable response to expect in an (automated) conversation, you don't need to demand that the answer repeats the question.

[2] you might be able to create a set of question prompts that get the model to spit out the precis without either question or answer containing attribution, but hopefully the defence would raise objection to a session log that started "Without identifying the book by name, author or other explicit means, precis that awful story about a teenager being groomed by a centuries old glittery vampire"

that one in the corner Silver badge

Re: OED

> If you want to read a story, reading a dictionary isn’t going to be a substitute for that, so it is a transformative use.

Are you referring back to the use of snippets of text as examples in a dictionary? If so, then yes, I'd probably agree (although as IANAL my agreement doesn't stand for much, as previously noted). But then I'm agreeing because it seems "very transformative", i.e. way over to one side of the argument, so it gives one datum but doesn't help much in understanding where the boundaries between "transformative (enough)" and "ripping off" lie - that is still all twisty.

> Reading a pdf of the book on a torrent site would be a substitute for reading it on the Kindle Store.

Sorry, not seeing how that fits into all this. Care to clarify?

that one in the corner Silver badge
Headmaster

Re: OpenAI could have avoided all this

> I remain your obedient servant, Local Internet, LTD

Hah, given that form of address was used at least up until the 1970s, I would be glad to see it return.

Except, of course, it should read:

I remain, Sir, your obedient and faithful servant, Local Internet, LTD

Or at least bring back,

Faithfully, yours, Local Internet, LTD

and "Sincerely" for informal missives.

Pah.

that one in the corner Silver badge

Pretty much so.

Having inwardly digested the work, the LLM has to be operated under exactly the same rules as you or I.

If either you or the LLM print out a new copy for the use of an unlicensed person, slapped wrists.

that one in the corner Silver badge

Re: OpenAI could have avoided all this

True. And you - or OpenAI - pay the publisher to get access to your copy, which you are now at liberty to read, unhindered. In fact, you now have a paper trail to show that you have the right to read and inwardly digest that copy of the work, just in case anyone should ask.

It goes further, as the licence is specifically tied to that copy, which is why both SWMBO and myself can read the same paperback copy and afterwards we can pass it on to a friend, family member or total stranger. All very neat.

Not 100% sure of you were trying to make a point or just show that you know about licensing.

that one in the corner Silver badge

Re: OpenAI could have avoided all this

The attribution question is the core issue in the conversations about Github's Copilot - if you've not been reading those, please do so; there is a lot of info there and it doesn't seem sensible to copy and paste it all here.

However, attribution doesn't seem to be at the bottom of the cases here, unless I missed something: the models were asked to provide a precis of a work, which they did, so the attribution to the original work is clear.

Similarly, this case isn't claiming that the precis were derogatory or otherwise damaging. Again, if I missed where that is claimed, tell me and I shall happily update myself.

Given you highlighted a specific line, are you trying to imply that there is anything wrong in that sentiment, that "They have a paid for copy, they can let the machine read it"? Are you trying to say the the LLM should *not* be allowed to read the work, just in case it did something naughty with it? Because that argument applies to *everyone* and *everything* that is capable of accessing the book, LLM or not.

> Copyright isn't entirely about money

Did anyone say it was? How could it *possibly* be entirely about money, given that copyright equally protects all the works that are made freely available without the need to pay to licence a copy? Such as this very comment you are reading now.

Even though all the wording about "moral rights" etc is there to provide the law with the ability to function and, at bottom, provide for ways to give recompense when the copyrights are breached; and that pretty uniformly comes down to money. Such is the world we live in.

Paying for a licence to access a copy of a copyrighted work is just that - the work has been offered for sale, anyone can buy a copy and read it. This is relevant purely because that is the condition these authors had offered their work under.

Other copyright material has been read as well, a lot of it, which has been offered for general consumption without the need for payment.

All that was suggested is that OpenAI should have just followed the full agreement for access to every work it had the LLM read and that would have prevented the basis for these particular cases.

that one in the corner Silver badge

Re: OED

IANAL and have no idea how they distinguish between "transformative" and "derivative".

Just using commoner's English, not high falutin' lawyer speak, *all* transformative works are derivative, are they not? I.e. your question becomes a trivial observation (sorry).

If I wrote a precis of a book (shades of third form English Lit., shudder) then I'm definitely deriving all my info from the original book, even if the result has no actual sentences copied verbatim. Reminder: stuff like the plot isn't copyright, only the particular expression of it, so even though my precis contains the plot (or I'm getting a C+ at best), that isn't breaching copyright even if it *is* derivative!

In other words, unless we do actually have the lawyerly mindset on this, any discussions on the matter is going to end up getting in a twist.

And that doesn't even consider that the LLM has mushed up its training material so, again trivially, any output is "derivative" of the entire corpus taken as a whole - but so is every word I'll ever utter (from my personal corpus, that is), even when (was going to say "if", but...) I say total gibberish words, they are derived from what I've heard makes up "something that sounds like a word". They we go - another twisty argument, of the sort that His Honour and members of The Bar know how to deal with but is just likely to raise merry hell in an Internet comments section.

Oh, and any use of words like "transformative" are tricksy around programmers and the mathematicians who dreamt up the systems we use: a compiler most definitely transforms source code but no matter how many transforms it goes through, the output object is agreed to be absolutely 100% derived from the sources.

that one in the corner Silver badge

OpenAI could have avoided all this

Just by spending a fraction of their runtime training costs on a bulk deal or two with ebook publishers[1].

They have a paid for copy, they can let the machine read it. Job done.

(And I'm not going over the arguments about whether LLMs just store verbatim text of the books 'cos they don't, and even the plaintiffs are complaining that the program can provide a precis, not about regurgitating the whole).

[1] and snarfed as much copyright free material as they wanted, of course.[2]

[2] Which raises the question: leaving aside features like "I want the LLM to be able, specifically, to precis *this* book", are you going to get more pleasant, literary, witty and urbane English out of a model trained on the out of copyright contents of Project Gutenberg or from scraping the latest airport bonkbusters? I know which one I would personally prefer to read (you, of course, are free to have your own personal preferences) and my suspicion is that, as these things are being pushed as "good for generating texts for businesses to use", mayhap businesses would also be better off with just the older material as a style guide?[3]

[3] Arguments about "the information in old books is out of fate and useless, so that won't work" are met with: "You are getting your knowledge of, e.g. current tax law from bonkbusters? Well, that explains a lot".

Starlink satellites leak astronomy-disturbing EM radiation, say boffins

that one in the corner Silver badge
Mushroom

Light telescopes polluted, radio telescopes polluted

I *really* hope the next story isn't about StarLink (or other) birds drowning out the GRB[1] signals!

[1] Gamma Ray Bursts, that is.

Make sure that off-the-shelf AI model is legit – it could be a poisoned dependency

that one in the corner Silver badge

But Mithril's methodology is pretty shitty

Needing to know what you are using in your software is something that everyone ought to be well aware of nowadays and we can probably criticise HuggingFace of being a bit naive in not looking out for typo squatters etc. They are growing into something non-trivial and should be well aware of the the similar issues faced by other repositories (ooh, what is the name of that JavaScript site, the one with leftpad? On the tip of my tongue).

But I had hoped we'd seen the back of idiots deliberately breaking stuff just to make a point[1] and, worse, as an advert for their product: "'Ere, mate, You need our steering wheel lock, look how easily I just smashed the window and drove away. No? Ok. 'Ere, missus, you need our, oi, stop 'ittin' me, I'm just a salesman!"

Come to think of it, Mithril are worse than that - detecting changes to a binary file, that needs a startup company to, what, run SHA over the file and record the result in your software build records? But, of course, this isn't just any old data, is it? You need our super duper software, with added lemon freshness! /s, obs.

[1] remember these guys: https://www.theregister.com/2021/04/21/minnesota_linux_kernel_flaws_update/

that one in the corner Silver badge

GPT roads leading to ROME

But I am very glad to see that paper on ROME is (so) accessible.

As I've been saying, some clever people are looking into what these models actually do internally and it is good for everyone to see the results.

Although there is a cautionary tone, in that they admit that assumptions about how simple the internals are was found to be true for GPT models and that allows ROME to work. There is no guarantee that the method will work on other models. But damn fine work.

BT CEO Jansen confirms he's quitting within 12 months

that one in the corner Silver badge

anticipates using AI in customer services

So we won't get any answers to "when are coming to fix the bleeping thing", but no problem getting a precis of an obscure children's book or ten lines of non-working Fortran.

Microsoft to hike prices in Australia and New Zealand

that one in the corner Silver badge

On the grounds they oppose freedom of the press

Well, yes, we know that the Cambodian People's Party opposes the freedom of the press, this isn't really news.

Oh, the CPP were objecting about Meta; sorry, sorry, didn't parse the "they" correctly,

Guess it's back to "Pot, meet Kettle". Sorry, sorry, again; "Sen, meet Kettle".

Artificial General Intelligence remains a distant dream despite LLM boom

that one in the corner Silver badge

> When Deep Blue beat Kasparov, I read someone who should have known better saying Go was so much more complicated that a Deep Blue for Go was decades away

Kasparov beaten by Deep Blue: 1997

Fan Hui beaten by AlphaGo: 2015

18 years - close enough to "decades" (two decades, that is) for most estimating purposes.

BTW we still mainly have brute-force approaches to Chess and Go: there is still room to fid a more finessed way of solving these.

And getting a machine to add up *was* hard to do: the fact that we now know how to do it and can replicate it so much faster doesn't stop the original problem, in the original context, from being hard. As soon as any domain is "solved" it stops becoming hard.

And "hard" isn't the same as "I don't have the vocabulary to follow the explanation": I can get eyes to glaze over talking about Finite State Automata used in lexers, but that is such an easy and solved domain that there are programs from the 1970s that will generate the automaton for me.

that one in the corner Silver badge

> Apparently they are trying to train maths specifically so later it may be able to interpret such a question as "what is 6345 multiplied by 4665"

Train? Well, they probably are wasting time training it, instead of just pushing the prompt text into one of those "Simple English[1] calculators" that we used to write in the 1980s[2] and seeing if that can recognise (and then perform) the arithmetic. If the calculator just spits out "parse error" let the LLM have a go.

Hell, just pass it into WolframAlpha first: they've already done the hard part.

[1] other languages are available, please enquire at Customer Services

[2] should've saved those; mine wasn't the best of the bunch, but not too bad for Acorn Atom BASIC!

that one in the corner Silver badge

Re: The Turing test...it's been beaten hundreds of times over decades.

> The distinction between these two meanings of the test is not so much which one is correct but rather in which context the discussion is taking place, i.e. whether it is in a scientific context or common parlance.

Given that the discussions here are (hopefully) based on the Register article and that is pitching researchers against each other, surely it is clear that the common parlance one should be ignored, except when clearly used for supposed comedic effect.

that one in the corner Silver badge

Re: not on the right road

> current computers are basically the same as the PDP11 from the 1970s except a lot faster and with a lot more memory capacity, and continuing in that direction will not lead to AGI.

Intriguing - are you trying to say that just brute-forcing whatever is the sole Golden Boy[1] Mechanic du Jour is not the way, we need more finesse.

OR are you saying that you don't believe that, if an AGI is possible, it can be run on a super-sized PDP11, some other architecture is needed, one that can compute things a PDP could never do, no matter how big and fast?

[1] in the eyes of the quick-fix, quick-buck people (looking at you, OpenAI)

that one in the corner Silver badge

Re: Good read, thanks for the link

In the above, please replace "researchers" with "software engineers". Then remove the las paragraph and I'll agree with your post.

The guys on these teams who are doing actual research are doing it in SwEng, barely scratching Machine Learning: i.e. researching ways of implementing such huge models, of reworking the maths and stats to run on GPUs. Given the huge memory requirements we see quoted for these things, I'm tempted to say that they aren't even working on those problems well (e.g. how to shunt the bulk of the data into file storage and keep a live subset in core) and are just brute forcing everything.

Hmm, maybe I was wrong - don't replace with "software engineer" but with "power systems, electrical and electronic engineer" instead :-)

Your actual ML research would be looking at new ways of doing ML and increasing the explanatory power of the resulting systems; I don't believe those researchers are any more prone to self-deception than other areas, such as, ooh, geologists or knot theorists.

that one in the corner Silver badge

Re: AGI will never arrive

> The concept of "AI" has been so ill defined that in the past researchers conflated it with the ability to play chess

I agree with the point that we've managed to brute force many problems, but I have to disagree with your dismissal of AI researchers.

There was no erroneous "conflation" - the possibility of brute forcing chess was well understood: indeed, that understanding had come from earlier work on problems related to AI, how to express such massive search problems in the first place and prune them sensibly to speed up the search without losing the best path(s).

However, the intent of the research was - and ought to still be - to find a way of playing chess without simple brute force. Unfortunately, the idea of a machine that could beat a human at chess became a Prestige Project: screw any AI research goals, if IBM can create a machine to beat a human Grandmaster this is a massive feather in the corporate cap. Oh look, everyone knows we can brute force it, let's do just that...

As soon as the brute force attack had been actually demonstrated, and at a time when Moore's Law was becoming fairly well known (so more machines could be built more cheaply) the actual problem of playing chess was placed into the "solved" bin by most people - including funding bodies and, yes, yourself.

But that meant that we only have a "chess playing massive search engine", we are *still* without a "chess playing AI", one that doesn't use brute force but a more subtle approach - an approach that, it was (is?) hoped would be applicable to more than just chess *and*, the big dream, would have better explanatory power than just "I tried every path and this got the biggest score". Which is, if we wish to pursue (what is now, annoyingly, called) AGI, a hole tat will need to be filled. But asking to be funded to "solve chess" will be met with derision, coming from the same place as your use of the word "conflation".

> LLMs work differently

They use different mechanics, but still ones that were derived and understood way before OpenAI opened their doors. And had, as the article points out, been put aside as not a solution to AI (sorry, "AGI"), even though it was understood they would exhibit entertaining results if brute forced.

> and get a step closer

Not really - there is even less explanatory power in one of those than there is the decision tree for a chess player: at least the latter can be meaningfully drawn out ("this node G in the tree precisely maps to this board layout, and because that child of G leads to defeat, you can see the weighting on G had been adjusted by k points. Compare G to H, which is _this_ layout, and H has a final weighting of j, so you can see why G was chosen"). Tedious, but comprehensible.

> but still overwhelmingly rely on brute force

ENTIRELY rely on brute force! That is *the* characteristic of an LLM!

> another brute force path where researchers fool themselves into believing "if we just get another 10 or 100x the computing cycles and working memory we'll reach AGI".

Which researchers? As the article points out, not the old guard, the ones you dismissed. The modern "AI researchers" who have only been brought up on these massive Nets? What else are they going to say?

> Spoiler alert: they won't

Yes, we know. Everyone knows (except the snake oil salesmen and everyone else who can make a buck). That really isn't a spoiler, exactly the same way that it wasn't when brute force was applied to chess and the popular press went apeshit over it: the sound of (dare I day, proper?) AI researchers burying their heads in their hands and sighing was drowned out then, as it is being drowned out now.