Once “too scary” to release, GPT-2 gets squeezed into an Excel spreadsheet

Post content hidden for low score. Show…

Mr. Kite

Ars Scholae Palatinae
829
Subscriptor
Any idea if this works with LibreOffice? I'm surprised there's no mention at all.
If not today, give it a couple of days. Somebody’s going to do it. I still wouldn’t recommend loading random huge spreadsheets from unknown parties. Including this one. “Run it on Windows!” 🤔
 
Upvote
6 (23 / -17)

lazarus2405

Ars Centurion
355
Subscriptor++
The scripting languages are very different, though; I still will try loading it into LibreOffice.

The impression I get from the article is that all the calculations are performed in the sheets. If it's only using worksheet functions, and the same are available with the same syntax in LibreOffice, yeah, it could work.

I suspect resorting VBA could have trivialized some parts of the project. Admittedly though I haven't tried opening the file to see for myself!
 
Upvote
20 (20 / 0)

squiggit

Smack-Fu Master, in training
6
This seems like it has good potential as a teaching tool. So much of transformers and LLM architecture is treated as very arcane even by some people who are active in the tech world. Might be a bit of an outdated model, but it still seems like a very informative way to get a look at how the predictive modeling systems work on a fundamental level.
 
Upvote
14 (14 / 0)
for anyone more comfortable with sql this does the same thing in postgresql, seeing how the queries work made thing more clear for me anyway

The obvious follow-up question is "What about MySQL?" and the answer seems to be that it might be possible, but this post relies on the pgvector extension.
The impression I get from the article is that all the calculations are performed in the sheets. If it's only using worksheet functions, and the same are available with the same syntax in LibreOffice, yeah, it could work.

I suspect resorting VBA could have trivialized some parts of the project. Admittedly though I haven't tried opening the file to see for myself!
It seems as if he did resort to VBA, because I saw a bunch of "#MACRO?" errors; also, he added an Issue mentioning that the spreadsheet fails in both OpenOffice and LibreOffice and also in the Web version of Excel (the last two were quoting a Hacker News user): https://github.com/ianand/spreadsheets-are-all-you-need/issues/5
 
Last edited:
Upvote
28 (28 / 0)

Chuckstar

Ars Legatus Legionis
29,379
Subscriptor++
The obvious follow-up question is "What about MySQL?" and the answer seems to be that it might be possible, but this post relies on the pgvector extension.

It seems as if he did resort to VBA, because I saw a bunch of "#MACRO?" errors; also, he added an Issue mentioning that the spreadsheet fails in both OpenOffice and LibreOffice and also in the Web version of Excel (the last two were quoting a Hacker News user): https://github.com/ianand/spreadsheets-are-all-you-need/issues/5
Macro for anything besides printing multiple pages or importing CSVs? Not a “true” spreadsheet hacker. ;)
 
Upvote
7 (9 / -2)

Galeran

Ars Tribunus Militum
1,878
Subscriptor
Just in case you're curious about the resource consumption. On my Windows 11 system, the spreadsheet opened in "Microsoft® Excel® for Microsoft 365 MSO (Version 2402 Build 16.0.17328.20124) 64-bit" idles at about 2GB RAM, 8-9GB (10-11GB peaks) while calculating (seems to utilize 4 cores fully). It was feeling like it was taking maybe half a minute to compute the 11th token (i9-14900k), but I didn't actually time it.

I started with "Water" and " is" as my first two tokens, expecting it to suggest something obvious like "wet" as the next token. When it didn't, I copied its suggested tokens one at a time to get "Water is a great way to get a little more energy". Hmm. I suppose 10-token hallucinations would be fairly mild.
 
Upvote
27 (27 / 0)

xizar

Ars Tribunus Militum
1,640
Subscriptor++
I'm beginning to think this "too scary" thing is like the Segway's "too revolutionary"... a whole lotta guff inhaled by true believers.
You say that, but given how hallucinations from later LLMs have been used, it doesn't seem like an unreasonable concern. (A minor example is the lawyer that used LLM output that hallucinated court cases as precedents.)
 
Upvote
24 (24 / 0)
That's pretty cool. I love seeing Excel being used for non spreadsheet type things like playing Doom or whatever.
If art counts,
Link
Other link

I started with "Water" and " is" as my first two tokens, expecting it to suggest something obvious like "wet" as the next token. When it didn't, I copied its suggested tokens one at a time to get "Water is a great way to get a little more energy". Hmm. I suppose 10-token hallucinations would be fairly mild.
Yes, when I'm knackered, there's nothing like a glass of ice water to...wait, what?
 
Upvote
13 (13 / 0)

adamsc

Ars Praefectus
3,641
Subscriptor++
I'm beginning to think this "too scary" thing is like the Segway's "too revolutionary"... a whole lotta guff inhaled by true believers.

Magazines and online publishers are getting deluged in robot content, fake voice and video are becoming routine, plagiarism is becoming increasingly hard to detect, both people hiring and applying for jobs are complaining about LLMs, academic journals are dealing with LLM content leaking into published papers, and the librarians I know are talking about getting requests for help finding hallucinated citations … I think they were right to worry but perhaps wrong about the timeframe.
 
Upvote
54 (54 / 0)

real mikeb_60

Ars Praefectus
11,129
Subscriptor
If it works with Excel, it should work with Libre as the two are almost completely compatible.
It loads into Libreoffice Calc 24.2.1.2 without throwing any obvious error. During loading, according to Win11 Task Manager, memory usage peaks at about 6 GB settling to about 4.8 GB after finishing the load process. Best to set up LO Calc to not recalc on load, then recalc manually, to avoid having everything jam up doing a recalc in the late part of the load process.

HOWEVER, I couldn't get it to work. Error:520 and #MACRO all over the place rather than values. There's a possible way to fix it by saving as .ods and reloading, then addressing all the #NAME errors that occur, but I didn't have time to fiddle with it that much today.

IOW, LO is not perfectly compatible with Excel in this case. But it does load the file, and does not crash.
 
Upvote
21 (22 / -1)

Chuckstar

Ars Legatus Legionis
29,379
Subscriptor++
Cognitive dissonance will make sure it doesn’t happen but,
this could be a useful tool for people who still believe copies of training data just live unedited in the system.
I used to think that way. But we’ve seen too many examples of training data being closely regurgitated, which is where the real copyright violation occurs, IMHO.

Using the analogy of whether it’s copyright violation for a human to learn by reading books, and regurgitate that information later, if such a human were to regurgitate whole passages of multiple paragraphs long from previously-read books, then yes that would be copyright violation.

But I’m not one that agrees the training process is prima facie copyright violation.

EDIT: Analogies are pretty much always imperfect, but can be helpful in thinking through things. Just letting everyone know I’m aware that stating such an analogy is not conclusive, but I do think it’s a useful analogy.
 
Upvote
20 (25 / -5)
Magazines and online publishers are getting deluged in robot content, fake voice and video are becoming routine, plagiarism is becoming increasingly hard to detect, both people hiring and applying for jobs are complaining about LLMs, academic journals are dealing with LLM content leaking into published papers, and the librarians I know are talking about getting requests for help finding hallucinated citations … I think they were right to worry but perhaps wrong about the timeframe.
I recently interviewed several people for a full time role in an org very far away from technology. Several applicants submitted ai-written applications. One was a law graduate who apparently hadn’t reviewed her ai-written application for basic errors of fact in the field covered by the role. Another applicant submitted their ai-written cover statement as one giant block of text, no paragraphs or line breaks.

Part of the interview process was a real world task where they had to do a complex task in a short time. We allowed full internet access, just like in work. One very good candidate used ai intensively to help with their task but got all the basic facts right, and delivered a well formed and formatted on-point task with no obvious errors. Better in fact than I would have done given the time allowed. Scored highly on that part.

So, my experience shows hmm … AI can be a useful prop for someone who already knows what they’re doing but an idiot with AI is still an idiot?
 
Upvote
90 (90 / 0)

reyan

Ars Centurion
227
Subscriptor
It loads into Libreoffice Calc 24.2.1.2 without throwing any obvious error. During loading, according to Win11 Task Manager, memory usage peaks at about 6 GB settling to about 4.8 GB after finishing the load process. Best to set up LO Calc to not recalc on load, then recalc manually, to avoid having everything jam up doing a recalc in the late part of the load process.

HOWEVER, I couldn't get it to work. Error:520 and #MACRO all over the place rather than values. There's a possible way to fix it by saving as .ods and reloading, then addressing all the #NAME errors that occur, but I didn't have time to fiddle with it that much today.

IOW, LO is not perfectly compatible with Excel in this case. But it does load the file, and does not crash.
As mentioned earlier in the thread by someone else, you are seeing the #MACRO errors because VBA was used in parts of the sheet. While it loads, it would need to be "ported" to LibreOffice to function correctly.
 
Upvote
9 (9 / 0)

reyan

Ars Centurion
227
Subscriptor
Okay, now train an AI in excel and we’ll be all set for armageddon.
I have used LLM's many times to help me with Excel. I love programming, but I HATE Excel.... I'm somewhat proficient but don't enjoy a second of it.

ChatGPT is quite good (although not perfect) at Excel formulas.
 
Upvote
-2 (2 / -4)

reyan

Ars Centurion
227
Subscriptor
I recently interviewed several people for a full time role in an org very far away from technology. Several applicants submitted ai-written applications. One was a law graduate who apparently hadn’t reviewed her ai-written application for basic errors of fact in the field covered by the role. Another applicant submitted their ai-written cover statement as one giant block of text, no paragraphs or line breaks.

Part of the interview process was a real world task where they had to do a complex task in a short time. We allowed full internet access, just like in work. One very good candidate used ai intensively to help with their task but got all the basic facts right, and delivered a well formed and formatted on-point task with no obvious errors. Better in fact than I would have done given the time allowed. Scored highly on that part.

So, my experience shows hmm … AI can be a useful prop for someone who already knows what they’re doing but an idiot with AI is still an idiot?
I concur. It's a tool, not magic. The people who think it is magic that can be trusted implicitly are quickly found out to be fools (or lazy).
 
Upvote
15 (15 / 0)
Magazines and online publishers are getting deluged in robot content, fake voice and video are becoming routine, plagiarism is becoming increasingly hard to detect, both people hiring and applying for jobs are complaining about LLMs, academic journals are dealing with LLM content leaking into published papers, and the librarians I know are talking about getting requests for help finding hallucinated citations … I think they were right to worry but perhaps wrong about the timeframe.

I don't think there's any question that LLMs were going to cause all sorts of problems. I think the interesting question is whether OpenAI's reluctance to release GPT2 was authentic concerns about potential antisocial uses of the technology vs. cynical playing for time so that they could develop a commercializeable version of the product. I don't mean this in a snarky way, I'm genuinely interested whether they actually had initial cold-feet about opening this pandora's box or not.

Edit: spelling
 
Last edited:
Upvote
29 (29 / 0)
Post content hidden for low score. Show…

mfirst

Wise, Aged Ars Veteran
115
As everyone is using different AI tools (like versions of Chat-GPT) to solve all sorts of problems - and more and more serious and bigger problems that impact the world - and we know that the "accuracy" (or credibility? value? precision?) of the responses are a function the sophistication of the LLM system, then how do we stop "users" from cutting corners and using "cheaper" systems to get an answer that might not be the best - but good enough or just 'ok'... and how do the judge the quality of those answers if we dont know what tools are being used?
 
Upvote
-13 (0 / -13)