Once “too scary” to release, GPT-2 gets squeezed into an Excel spreadsheet

Person_Man · Mar 15, 2024

That's pretty cool. I love seeing Excel being used for non spreadsheet type things like playing Doom or whatever.

Civitello · Mar 15, 2024

Is GPT-2 going to join the likes of Doom and "Turing complete" demos?

deltaproximus · Mar 15, 2024

Very neat, I can't wait to try it out at home. I've been meaning to learn more about how LLMs function and having it in excel, which I'm already proficient at, seems like a nice way to make that accessible to someone like me.

everythingallatonce · Mar 15, 2024

Now every company with Excel can get in on the buzzword gold rush

euzeka · Mar 15, 2024

Any idea if this works with LibreOffice? I'm surprised there's no mention at all.

Mr. Kite · Mar 15, 2024

euzeka said:
Any idea if this works with LibreOffice? I'm surprised there's no mention at all.

If not today, give it a couple of days. Somebody’s going to do it. I still wouldn’t recommend loading random huge spreadsheets from unknown parties. Including this one. “Run it on Windows!”

evanTO · Mar 15, 2024

The people at my work think VLOOKUP is "hacking" excel, I'm trying to imagine what they'd do if I showed them this.

lewisje · Mar 15, 2024

JohnDeL said:
If it works with Excel, it should work with Libre as the two are almost completely compatible.

The scripting languages are very different, though; I still will try loading it into LibreOffice.

JohnDeL · Mar 15, 2024

lewisje said:
The scripting languages are very different, though; I still will try loading it into LibreOffice.

If it works, let us know! (Please.)

jamesb2147 · Mar 15, 2024

A rare case where a cloud-hosted spreadsheet isn't ideal because it eats up far more compute than is generally provisioned to handle it.

I'm glad we still have offline Excel available. I say that as a Google Sheets stan.

andygates · Mar 15, 2024

I'm beginning to think this "too scary" thing is like the Segway's "too revolutionary"... a whole lotta guff inhaled by true believers.

lazarus2405 · Mar 15, 2024

lewisje said:
The scripting languages are very different, though; I still will try loading it into LibreOffice.

The impression I get from the article is that all the calculations are performed in the sheets. If it's only using worksheet functions, and the same are available with the same syntax in LibreOffice, yeah, it could work.

I suspect resorting VBA could have trivialized some parts of the project. Admittedly though I haven't tried opening the file to see for myself!

squiggit · Mar 15, 2024

This seems like it has good potential as a teaching tool. So much of transformers and LLM architecture is treated as very arcane even by some people who are active in the tech world. Might be a bit of an outdated model, but it still seems like a very informative way to get a look at how the predictive modeling systems work on a fundamental level.

Rho_Syn · Mar 15, 2024

for anyone more comfortable with sql this does the same thing in postgresql, seeing how the queries work made thing more clear for me anyway

Happy New Year: GPT in 500 lines of SQL - EXPLAIN EXTENDED

A complete GPT2 implementation as a single SQL query in PostgreSQL.

explainextended.com

lewisje · Mar 15, 2024

Rho_Syn said:
for anyone more comfortable with sql this does the same thing in postgresql, seeing how the queries work made thing more clear for me anyway

Happy New Year: GPT in 500 lines of SQL - EXPLAIN EXTENDED

A complete GPT2 implementation as a single SQL query in PostgreSQL.

explainextended.com

The obvious follow-up question is "What about MySQL?" and the answer seems to be that it might be possible, but this post relies on the pgvector extension.

lazarus2405 said:
The impression I get from the article is that all the calculations are performed in the sheets. If it's only using worksheet functions, and the same are available with the same syntax in LibreOffice, yeah, it could work.

I suspect resorting VBA could have trivialized some parts of the project. Admittedly though I haven't tried opening the file to see for myself!

It seems as if he did resort to VBA, because I saw a bunch of "#MACRO?" errors; also, he added an Issue mentioning that the spreadsheet fails in both OpenOffice and LibreOffice and also in the Web version of Excel (the last two were quoting a Hacker News user): https://github.com/ianand/spreadsheets-are-all-you-need/issues/5

Chuckstar · Mar 15, 2024

lewisje said:
The obvious follow-up question is "What about MySQL?" and the answer seems to be that it might be possible, but this post relies on the pgvector extension.

It seems as if he did resort to VBA, because I saw a bunch of "#MACRO?" errors; also, he added an Issue mentioning that the spreadsheet fails in both OpenOffice and LibreOffice and also in the Web version of Excel (the last two were quoting a Hacker News user): https://github.com/ianand/spreadsheets-are-all-you-need/issues/5

Macro for anything besides printing multiple pages or importing CSVs? Not a “true” spreadsheet hacker.

Galeran · Mar 15, 2024

Just in case you're curious about the resource consumption. On my Windows 11 system, the spreadsheet opened in "Microsoft® Excel® for Microsoft 365 MSO (Version 2402 Build 16.0.17328.20124) 64-bit" idles at about 2GB RAM, 8-9GB (10-11GB peaks) while calculating (seems to utilize 4 cores fully). It was feeling like it was taking maybe half a minute to compute the 11th token (i9-14900k), but I didn't actually time it.

I started with "Water" and " is" as my first two tokens, expecting it to suggest something obvious like "wet" as the next token. When it didn't, I copied its suggested tokens one at a time to get "Water is a great way to get a little more energy". Hmm. I suppose 10-token hallucinations would be fairly mild.

J.C. Helios · Mar 15, 2024

euzeka said:
Any idea if this works with LibreOffice? I'm surprised there's no mention at all.

Not unless "Error 520" means that the A.I. is on strike to demand wages.

xizar · Mar 15, 2024

andygates said:
I'm beginning to think this "too scary" thing is like the Segway's "too revolutionary"... a whole lotta guff inhaled by true believers.

You say that, but given how hallucinations from later LLMs have been used, it doesn't seem like an unreasonable concern. (A minor example is the lawyer that used LLM output that hallucinated court cases as precedents.)

Sei_kū · Mar 15, 2024

Cognitive dissonance will make sure it doesn’t happen but,
this could be a useful tool for people who still believe copies of training data just live unedited in the system.

Fred Duck · Mar 15, 2024

Person_Man said:
That's pretty cool. I love seeing Excel being used for non spreadsheet type things like playing Doom or whatever.

If art counts,
Link
Other link

Galeran said:
I started with "Water" and " is" as my first two tokens, expecting it to suggest something obvious like "wet" as the next token. When it didn't, I copied its suggested tokens one at a time to get "Water is a great way to get a little more energy". Hmm. I suppose 10-token hallucinations would be fairly mild.

Yes, when I'm knackered, there's nothing like a glass of ice water to...wait, what?

Smeghead · Mar 15, 2024

J.C. Helios said:
Not unless "Error 520" means that the A.I. is on strike to demand wages.

Mine shows a dialogue warning about exceeding the maximum number of columns in a sheet when loaded. Apparently the limit is 16k.

adamsc · Mar 15, 2024

andygates said:
I'm beginning to think this "too scary" thing is like the Segway's "too revolutionary"... a whole lotta guff inhaled by true believers.

Magazines and online publishers are getting deluged in robot content, fake voice and video are becoming routine, plagiarism is becoming increasingly hard to detect, both people hiring and applying for jobs are complaining about LLMs, academic journals are dealing with LLM content leaking into published papers, and the librarians I know are talking about getting requests for help finding hallucinated citations … I think they were right to worry but perhaps wrong about the timeframe.

real mikeb_60 · Mar 15, 2024

JohnDeL said:
If it works with Excel, it should work with Libre as the two are almost completely compatible.

It loads into Libreoffice Calc 24.2.1.2 without throwing any obvious error. During loading, according to Win11 Task Manager, memory usage peaks at about 6 GB settling to about 4.8 GB after finishing the load process. Best to set up LO Calc to not recalc on load, then recalc manually, to avoid having everything jam up doing a recalc in the late part of the load process.

HOWEVER, I couldn't get it to work. Error:520 and #MACRO all over the place rather than values. There's a possible way to fix it by saving as .ods and reloading, then addressing all the #NAME errors that occur, but I didn't have time to fiddle with it that much today.

IOW, LO is not perfectly compatible with Excel in this case. But it does load the file, and does not crash.

Chuckstar · Mar 15, 2024

Sei_kū said:
Cognitive dissonance will make sure it doesn’t happen but,
this could be a useful tool for people who still believe copies of training data just live unedited in the system.

I used to think that way. But we’ve seen too many examples of training data being closely regurgitated, which is where the real copyright violation occurs, IMHO.

Using the analogy of whether it’s copyright violation for a human to learn by reading books, and regurgitate that information later, if such a human were to regurgitate whole passages of multiple paragraphs long from previously-read books, then yes that would be copyright violation.

But I’m not one that agrees the training process is prima facie copyright violation.

EDIT: Analogies are pretty much always imperfect, but can be helpful in thinking through things. Just letting everyone know I’m aware that stating such an analogy is not conclusive, but I do think it’s a useful analogy.

huckl · Mar 15, 2024

Okay, now train an AI in excel and we’ll be all set for armageddon.

redtomato · Mar 15, 2024

adamsc said:
Magazines and online publishers are getting deluged in robot content, fake voice and video are becoming routine, plagiarism is becoming increasingly hard to detect, both people hiring and applying for jobs are complaining about LLMs, academic journals are dealing with LLM content leaking into published papers, and the librarians I know are talking about getting requests for help finding hallucinated citations … I think they were right to worry but perhaps wrong about the timeframe.

I recently interviewed several people for a full time role in an org very far away from technology. Several applicants submitted ai-written applications. One was a law graduate who apparently hadn’t reviewed her ai-written application for basic errors of fact in the field covered by the role. Another applicant submitted their ai-written cover statement as one giant block of text, no paragraphs or line breaks.

Part of the interview process was a real world task where they had to do a complex task in a short time. We allowed full internet access, just like in work. One very good candidate used ai intensively to help with their task but got all the basic facts right, and delivered a well formed and formatted on-point task with no obvious errors. Better in fact than I would have done given the time allowed. Scored highly on that part.

So, my experience shows hmm … AI can be a useful prop for someone who already knows what they’re doing but an idiot with AI is still an idiot?

Sonio · Mar 15, 2024

So we can run GPT instances locally on systems that still can't run Crysis?

Maybe these LLMs aren't such a big deal after all.

reyan · Mar 15, 2024

evanTO said:
The people at my work think VLOOKUP is "hacking" excel, I'm trying to imagine what they'd do if I showed them this.

Lmfao. I literally lol'd reading this.

reyan · Mar 15, 2024

real mikeb_60 said:
It loads into Libreoffice Calc 24.2.1.2 without throwing any obvious error. During loading, according to Win11 Task Manager, memory usage peaks at about 6 GB settling to about 4.8 GB after finishing the load process. Best to set up LO Calc to not recalc on load, then recalc manually, to avoid having everything jam up doing a recalc in the late part of the load process.

HOWEVER, I couldn't get it to work. Error:520 and #MACRO all over the place rather than values. There's a possible way to fix it by saving as .ods and reloading, then addressing all the #NAME errors that occur, but I didn't have time to fiddle with it that much today.

IOW, LO is not perfectly compatible with Excel in this case. But it does load the file, and does not crash.

As mentioned earlier in the thread by someone else, you are seeing the #MACRO errors because VBA was used in parts of the sheet. While it loads, it would need to be "ported" to LibreOffice to function correctly.

reyan · Mar 15, 2024

huckl said:
Okay, now train an AI in excel and we’ll be all set for armageddon.

I have used LLM's many times to help me with Excel. I love programming, but I HATE Excel.... I'm somewhat proficient but don't enjoy a second of it.

ChatGPT is quite good (although not perfect) at Excel formulas.

reyan · Mar 15, 2024

redtomato said:
I recently interviewed several people for a full time role in an org very far away from technology. Several applicants submitted ai-written applications. One was a law graduate who apparently hadn’t reviewed her ai-written application for basic errors of fact in the field covered by the role. Another applicant submitted their ai-written cover statement as one giant block of text, no paragraphs or line breaks.

Part of the interview process was a real world task where they had to do a complex task in a short time. We allowed full internet access, just like in work. One very good candidate used ai intensively to help with their task but got all the basic facts right, and delivered a well formed and formatted on-point task with no obvious errors. Better in fact than I would have done given the time allowed. Scored highly on that part.

So, my experience shows hmm … AI can be a useful prop for someone who already knows what they’re doing but an idiot with AI is still an idiot?

I concur. It's a tool, not magic. The people who think it is magic that can be trusted implicitly are quickly found out to be fools (or lazy).

reyan · Mar 15, 2024

Sonio said:
So we can run GPT instances locally on systems that still can't run Crysis?

Maybe these LLMs aren't such a big deal after all.

Well... GPT-2 with < 200M parameters, and running like molasses in a spreadsheet, is not reflective of the current state of the art.

adio · Mar 15, 2024

evanTO said:
The people at my work think VLOOKUP is "hacking" excel, I'm trying to imagine what they'd do if I showed them this.

I reckon pitchforks, burning torches and a dunking stool could be in your future.

pseudonomous · Mar 15, 2024

adamsc said:
Magazines and online publishers are getting deluged in robot content, fake voice and video are becoming routine, plagiarism is becoming increasingly hard to detect, both people hiring and applying for jobs are complaining about LLMs, academic journals are dealing with LLM content leaking into published papers, and the librarians I know are talking about getting requests for help finding hallucinated citations … I think they were right to worry but perhaps wrong about the timeframe.

I don't think there's any question that LLMs were going to cause all sorts of problems. I think the interesting question is whether OpenAI's reluctance to release GPT2 was authentic concerns about potential antisocial uses of the technology vs. cynical playing for time so that they could develop a commercializeable version of the product. I don't mean this in a snarky way, I'm genuinely interested whether they actually had initial cold-feet about opening this pandora's box or not.

Edit: spelling

CluelessOne · Mar 15, 2024

I continue to be amazed how many ways people can use / abuse spreadsheet, especially Excel. It's like Excel is the Sauron Ring, the one to rule them all, and in VBA script binds them all. :biggreen:

mfirst · Mar 16, 2024

As everyone is using different AI tools (like versions of Chat-GPT) to solve all sorts of problems - and more and more serious and bigger problems that impact the world - and we know that the "accuracy" (or credibility? value? precision?) of the responses are a function the sophistication of the LLM system, then how do we stop "users" from cutting corners and using "cheaper" systems to get an answer that might not be the best - but good enough or just 'ok'... and how do the judge the quality of those answers if we dont know what tools are being used?

Once “too scary” to release, GPT-2 gets squeezed into an Excel spreadsheet

Ars Scholae Palatinae

Ars Praetorian

Ars Scholae Palatinae

Wise, Aged Ars Veteran

Ars Praefectus

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Praetorian

Ars Tribunus Angusticlavius

Ars Scholae Palatinae

Ars Praefectus

Ars Centurion

Smack-Fu Master, in training

Smack-Fu Master, in training

Ars Praetorian

Ars Legatus Legionis

Ars Tribunus Militum

Ars Praetorian

Attachments

Ars Tribunus Militum

Wise, Aged Ars Veteran

Ars Praefectus

Ars Praefectus

Ars Praefectus

Ars Praefectus

Ars Legatus Legionis

Wise, Aged Ars Veteran

Ars Praefectus

Ars Scholae Palatinae

Ars Centurion

Ars Centurion

Ars Centurion

Ars Centurion

Ars Centurion

Smack-Fu Master, in training

Ars Praetorian

Ars Scholae Palatinae

Wise, Aged Ars Veteran