Meet AdVon, the AI-Powered Content Monster Infecting the Media Industry
May 9, 2024 4:29 PM   Subscribe

Maggie Harrison Dupré, writing for Futurism, goes on a deep, deep dive into AdVon, a fine purveyor of content slurry.
posted by ursus_comiter (48 comments total) 30 users marked this as a favorite
 
I have worked with this company before. They were an excellent employer giving me many benefits.

-plannedchaos
posted by Windopaene at 4:42 PM on May 9 [10 favorites]


I'm sorry, but what do you mean? Is Scott Adams involved with this?
posted by JHarris at 5:23 PM on May 9


On Mastodon, or at least the corner of it that I witness, the term of art going around is "slop" not "slurry". The sentiment is the same, but slop does seem to be gaining some traction.
posted by hippybear at 6:05 PM on May 9 [3 favorites]


More important, Advon [sic] stands by its statement that all of the articles provided to Sports Illustrated were authored and edited by human writers," AdVon's attorney wrote.

Wow, that "authored" is doing a lot of work there, and it's a good thing, too, because "written" would be a lie.
posted by surlyben at 6:14 PM on May 9


No, just an attempt at internet sarcasm...
posted by Windopaene at 6:29 PM on May 9 [1 favorite]


Related previously, How Google is killing independent sites like ours, from the perspective of a niche air purifier testing and review site who found themselves being throttled out of search results by the garbage being described here. So this is where it's coming from!
posted by coolname at 6:33 PM on May 9 [8 favorites]


It's okay windowpaene, it's not the first time I failed to see sarcasm yay
posted by JHarris at 6:36 PM on May 9 [3 favorites]


Meanwhile, user "sarcasm yay" is standing right there next to you, having talked to you at the Meetup for the last 15 minutes, and is now slinking away thinking "I thought I was seen this time!"
posted by hippybear at 6:38 PM on May 9 [11 favorites]


From the article:

The quality of AdVon's work is often so dismal that it's jarring to see it published by respected publications

... this! I have read such reviews when researching a purchase. They vary from uselessly anodyne to seriously wrong. And, after reading more than a few different sites reviewing the same items, the amount of similarity, often rising to near identical passages. Human- or AI- generated, they were shite! I can't believe that any self-respecting publisher would knowingly carry such stuff under their brand. (And then I look at the usual CNN Underscored content ... and I believe)

For me it's not about AI, it's about publishing garbage. I feel the same way about crap-aggregators like Taboola and Outbrain.

Anyway those results have a stink to me now, and I know to avoid them. And these brands that publish crap are diminished and have lost my trust.

Should it be illegal to fake authors or attributions? Corporate ethics don't seem up to the job.
posted by Artful Codger at 7:51 PM on May 9 [18 favorites]


I am mildly upset that Good Housekeeping used them - their magazine has been a very reliable source of good hands-on reviews for things I need to buy, and seeing them dilute their brand for cost-savings is awful. The review slop has a very distinctive style (AdVon seems like a perfect Leverage target!) which can be avoided, but man do they pollute review searches.

I have to use AI editing at work, and am being asked to include it in more and more things. It takes a conscious effort to add guardrails to check for hallucinations and garbage, and people under deadlines/profit will skip them.
posted by dorothyisunderwood at 8:24 PM on May 9 [17 favorites]


I was about to blow an askMefi question on who was the source of articles linked in my browser opening pages. Which are amazingly bad and stalker familar.
posted by jadepearl at 8:33 PM on May 9 [1 favorite]


Happily AskMe questions are 24 hours now, not a week, so it's less painful to use one.
posted by hippybear at 8:35 PM on May 9


I see The Great Enshittification is doing well.
posted by Vatnesine at 9:50 PM on May 9 [8 favorites]


It takes a conscious effort to add guardrails to check for hallucinations and garbage, and people under deadlines/profit will skip them.

This is the real problem one of the real problems, IMO. Line must go up -> increase output or decrease pay -> margins grow ever tighter -> actual use of generative output increasingly slides to worst practices (copy+paste+publish).

If it were a Faustian bargain “never write another sentence of homework or line of boilerplate code, in return for looking the other way about how it was created,” that would be one thing. But that’s not even the deal on the table: this was supposed to be an enabling technology (leisure) and instead it’s an enabling technology (layoffs).

“Would you consider not treating your workers like slaves?” “I would rather die.” Apparently they meant it.
posted by Ryvar at 10:07 PM on May 9 [13 favorites]


this was supposed to be an enabling technology (leisure) and instead it’s an enabling technology (layoffs)

The thing about labour-saving devices is that that's save as in reduce, not save as in preserve.
posted by flabdablet at 10:36 PM on May 9 [4 favorites]


this was supposed to be an enabling technology (leisure) and instead it’s an enabling technology (layoffs)
As has always been the case.
posted by dg at 11:02 PM on May 9 [3 favorites]


Yeah, extra leisure should've always meant layoffs, but instead we spent everyone's time doing whatever else.

As a silly tech hack, we could've some browser extension that warnned when sites hosted AI generated content:

We need some real human credentail of course. As a user, you highlight the AI generated text and press "Accuse", and sign using a anonymous crednetial with threshold deanonymization, so then several randomly selected users fetch the content themselves, and check that the content looks AI generated. We do not have nice random samples upon which we could do statistics, but the browser extension could display a ranking that gets worse as sites accumulate more true accusations.

It's expensive to check content being AI generated of course, but These cost dynamics could be flipped if sites lose something whenever they host AI generated content. We end up having the whole internet be untrustworthy anyways, but this adjusts the rate per site. lol
posted by jeffburdges at 2:50 AM on May 10 [3 favorites]


For me it's not about AI, it's about publishing garbage. I feel the same way about crap-aggregators like Taboola and Outbrain.

Honestly, my biggest thought throughout all of this was "Now someone do an article about how Taboola's stuff is AI generated too." Because how could it not be?
posted by EmpressCallipygos at 3:59 AM on May 10 [2 favorites]


I saw the best brands of my generation
Destroyed by bad prose, bot spam, logorrhea
posted by snortasprocket at 6:13 AM on May 10 [5 favorites]


It doesn't surprise me that some amoral ad startup would do bullshit like this. It does surprise me that legitimate publications would destroy their reputation by publishing this garbage. I'd love to read more about the editorial discussions that led to approving these deceptive AI-generated ads. And were they properly disclosed as paid advertising in every publication?

A partial list of the publications mentioned in the article for running AdVon: USA Today, Sports Illustrated, Yoga Journal, Backpacker, Clean Eating, Hollywood Life, Us Weekly, Los Angeles Times, Miami Herald, Sacramento Bee, Tacoma News Tribune, Rock Hill Herald, Modesto Bee, Fort Worth Star-Telegram, Merced Sun-Star, Ledger-Enquirer, Kansas City Star, a bunch of other McClatchy-owned newspapers, PC Magazine, Mashable, AskMen, Good Housekeeping, People, Parents, Food & Wine, InStyle, Real Simple, Travel + Leisure, Better Homes & Gardens, Southern Living.

The LA Times and Sacramento Bee both hit particularly hard because they are the best newspapers we have left in California.

I attended a talk recently from the publisher of our tiny local paper. He commented that historically their revenue was 75% ads, 25% subscribers. But they're now more like 50/50 because ad revenue is on the decline. Cue the heartfelt entreaties for us all to subscribe to our local paper to help it out, a sort of news barnraising. But why would I pay to subscribe to something that then publishes AI generated ad garbage masquerading as editorial content?
posted by Nelson at 6:45 AM on May 10 [7 favorites]


It takes a conscious effort to add guardrails to check for hallucinations and garbage, and people under deadlines/profit will skip them.

I'm with jeffburges on this. For maybe a year I've fantasized about a browser plug-in that parsed what you're currently reading, and it would raise a flag or warning if the content was demonstrably false, or known to be false or misleading. And yes a "check" button if you think what you're reading is questionable, so that checking would be done, and if the content is dubious, the next person reading it will receive a warning. A crowd-sourced bs detector.

AI should be checking AI! I don't understand why people aren't using AIs to also perform post-generation validity checks on its output. For example, if an AI used for legal research generates some reference to nonexistent cases, is that not relatively easy for AI to check? Eg- fed back as a question: does Coyote vs ACME, Nevada, 1962 exist?

I have a theory here. Ahem. Here is my theory, which is mine. Hem hem. Large-language models were indiscriminately fed huge swathes of unverified material, with the prime goal of understanding... language. You don't need factually accurate feedstock to just learn the language. At some point, someone started asking it regular questions, and they were hugely surprised that, a lot of the time, they got reasonable answers back. Responses that were often correct... and almost always plausible.

And seeking that first-mover advantage, they have rushed this half-baked, incomplete AI model into the market, where it's currently cornered the market ...for bullshit. To me, the current AIs have been a breakthrough in machines "understanding" natural language, and the essential next step is to train different iterations on specialist, validated datasets. You don't want a medical AI dipping into its scrapings from reddit or antivaxxers for a medical answer. I expect this training will happen, but in the meantime, despite the incomplete training and current propensity to confabulate and bullshit, businesses are rushing AI out in all directions, and it's flaws are irrelevant when it comes to commercial content generation.
posted by Artful Codger at 6:52 AM on May 10 [2 favorites]


fritz leiber (in 1961) called it "wordwooze".
we might as well.
posted by graywyvern at 7:15 AM on May 10 [1 favorite]


I don't understand why people aren't using AIs to also perform post-generation validity checks on its output. For example, if an AI used for legal research generates some reference to nonexistent cases, is that not relatively easy for AI to check?

The TL;DR is a flat no, none of this is possible - browser plugins or checking.

Reinforcement-based learning - both with humans (RLHF) and automated is a standard part of the fine-tuning of every major new model; not just OpenAI's or Google's but also the major open source models. The benchmarks that new models are scored with directly correlate to that whole problem space.

In short: the models are already doing the best they can at this, right out of the box. When they run backpropagation to optimize each weight in the network to minimum local loss? This is all incorporated into the definition of "minimum" they're optimizing towards.

But with all LLMs there is nobody at home, nothing capable of doing what you're asking. You can maybe get at least partway there with Q*, and I left a massive comment about why here. I know it's really long and dense but that was the best I could do to walk through it starting from first principles while explaining the jargon as I went.

As to the rest, including browser plugin detection, this is really just the "why can't we detect AI in homework?" question all over again. One of the few bright points in Metafilter machine learning discussions is that nearly every person who genuinely understands how this stuff works tends to leap out of the woodwork on those threads in order to discourage educators here from needlessly punishing students. Detection is simply not possible in principle or as a practical matter. The entire point of the LLM exercise is to mathematically reduce the gap between model output and human output, if you have a detection method that actually works it immediately becomes one of the new tests. Just getting to detection of a single model would require spending more research and GPU compute time than OpenAI and Google did in producing that model. It is a war of attrition you will lose 100% of the time, and your odds of making false accusations - positive or negative - are always, always >50%.

Beyond that on the practical matter side, creating a LoRA (think Photoshop filter for LLMs) that significantly alters intonation for all future output and dodges that detection is something any bright 15-year-old nerd can do over the course of a weekend with a gaming computer; time traveled from the mid 90s at least four other people in my high school of 1700 students would've been up to it, not just me. At least three of us would've. You only need to do it once to fool every AI detector out there for years.

I know that's not the answer people here want but there are enough experts who genuinely understand these systems far better than I do, who are sympathetic to what is being asked, that if it were possible this would already exist.
posted by Ryvar at 7:53 AM on May 10 [10 favorites]


I am not sure replacing real writers with AI writers matters when they were just being paid to churn out pink slime anyway.

Even the human-written AdVon texts were fake reviews designed to dupe you into clicking affiliate links. The AI stuff just makes this a bit more lucrative because you can lay off more of the humans, but the entire business is scummy regardless.
posted by BungaDunga at 8:09 AM on May 10 [2 favorites]


I'm finding LLMs are much more useful at answering questions verifiably when they provide sources. Both Bing and Phind have prioritized putting links to references in their answers so it's relatively straightforward for me to read the sources to figure out if the LLM is telling me something true or just making something up / misinterpreting it. Neither product has quite closed the loop on having some other AI process do that verification, I suspect that'd be pretty difficult to automate.

Anyway as BungaDunga says none of that helps with fraud like AdVon. They're using AI to intentionally generate garbage and the only editing they need to do is to give it a thin veneer of plausability to dupe people into clicking on the affiliate links.
posted by Nelson at 8:45 AM on May 10 [2 favorites]


Seems to me that we don't so much need an aggregated AI sludge detector as an aggregated sludge detector period. I don't actually need to know whether the pointless timewasting shit that's just been offered up is AI-generated in order to judge it a pointless waste of time, and if I so judge it, others quite probably will as well.

Something along the lines of SponsorBlock, perhaps, that works as an adjunct to uBlock Origin?
posted by flabdablet at 9:08 AM on May 10 [2 favorites]


I appreciate your much deeper understanding about this field, and the effort you've put into your detailed comments here. Nonetheless...

The TL;DR is a flat no, none of this is possible - browser plugins or checking.

Respectfully, I disagree.

I've manually modelled this in a simplistic way when ChatGPT first came out, as we all have. It spits out an answer; we pick out a fact in the answer and re-input it as a question, and see what it says. My legal example is hardly a stretch, is it - to pick out a cited case in generated output, and re-query to search a legal database to verify the existence of that case?

As Nelson observed: I'm finding LLMs are much more useful at answering questions verifiably when they provide sources

Secondly, we have discussed here how people are improving their Google (or whatever) search results in certain cases by restricting the search to reddit or Wikipedia, with good results. Two examples of crowdsourced "knowledge", however flawed. And there's Snopes. We need more of these approaches anyway to deshittify the Internet, right?

And as flabdablet just suggested:
Seems to me that we don't so much need an aggregated AI sludge detector as an aggregated sludge detector period. I don't actually need to know whether the pointless timewasting shit that's just been offered up is AI-generated in order to judge it a pointless waste of time, and if I so judge it, others quite probably will as well.
As to the rest, including browser plugin detection, this is really just the "why can't we detect AI in homework?" question all over again.

No, these are different problems. the "AI in homework" problem is at heart attempting to assess whether the student actually wrote their own assignment, or cribbed it from another student, a "papers" website, or AI. The solution here: a return to oral dissertations? Writing papers in-class with an invigilator?

The problem of validation I'm discussing (eg: AI checking itself, or a browser plugin) is to try to pull out "facts" or assertions in some text, and to a) validate them against other records, and b) to see if the fact/assertion matches previously validated (or debunked) items. regardless of authorship.

The entire point of the LLM exercise is to mathematically reduce the gap between model output and human output

Uh, to me the whole point of pursuing AI is to create resources that are better than human in terms of not making mistakes, in processing and evaluating the best possible information much faster than a human, and providing valid, correct and useful information. Sure we want to see self-correction and iterative improvement... but Garbage In-Garbage Out is still a thing.

We have vast repositories of specialized and vetted knowledge. To me the current hurdle is how to use AI to only use valid domain-specific knowledge when "working" in that domain - medicine, law, engineering, etc. And to validate its own output whenever possible.

This is of course miles away from AdVon and bullshit-generation.
posted by Artful Codger at 9:34 AM on May 10 [1 favorite]


Flabdablet: I wrote some ideas on that in the recent search thread here.

Short version: what if we took everything uBlock is filtering and used that as a training set for a zero-shot classifier (blah blah blah technically still copyright theft but seriously all marketing is non-consensual brainwashing and can fuck off)?

uBlock isn’t a perfect set of everything I don’t want to see on the Internet (my kingdom for a " DESTROYS " Youtube title filter), but it’s a great start.
posted by Ryvar at 9:40 AM on May 10 [1 favorite]


Setting up something crowdsourced, SponsorBlock style, strikes me as probably offering a more rapidly usable and more reliable result.

Susceptible to DOS attacks via floods of false accusations of course, but no more so than SponsorBlock is and that works very well. In any case, Wikipedia has shown us how to reduce the impact of DOS attacks on crowdsourced information to quite tolerable proportions.

Trying to outperform a motivated if rather bolshie crowd with a bot would indeed be an interesting research project, though.
posted by flabdablet at 9:56 AM on May 10 [3 favorites]


Both Bing and Phind have prioritized putting links to references in their answers so it's relatively straightforward for me to read the sources to figure out if the LLM is telling me something true

I am confused — this seems strictly worse than the old system of just providing links which you then read for info. Not just because of the extra step, but having to deal with the anchoring effect of plausible-tuned prose output.

to mathematically reduce the gap between model output and human output

"the" is sweeping a hell of a lot under the rug there
posted by clew at 10:04 AM on May 10 [2 favorites]


God, that explained so, so, so, so much about how terrible the online product review space has become. It's fraud on both ends meeting in the middle and now it's impossible to find any useful information on products online unless you already know where to look.
posted by jacquilynne at 10:07 AM on May 10 [4 favorites]


I trust AvE for tool reviews.
posted by flabdablet at 10:18 AM on May 10


Even before AI, the number of clickbait garbage reviews had started substantially exceeding real ones; there were a lot of monkey writers doing just doing "feature comparisons" off of Amazon and quoting a few reviews off it. Now they're just being done more efficiently (and obviously.)

IMO Unless you can have reasonable confidence that the reviewer actually has hands-on experience with the specific products being reviewed, the review's worthless even if it's human-generated.
posted by microscone at 10:35 AM on May 10 [2 favorites]


I agree that reviews were already heavily enshittified, but at least when people were doing the feature comparisons they were usually vaguely accurate to the features a given product actually offered. Now there's so much crap out there I sometimes can't even figure out if a given product does or does not have a feature I want.
posted by jacquilynne at 10:47 AM on May 10 [1 favorite]


there were a lot of monkey writers doing just doing "feature comparisons"

That's similar to the business plan of Mahalo.com, Jason Calacanis' sleazy company that paid people in cheap labor countries to write garbage articles to get Google juice so they could show Google ads and make money on the arbitrage. (Google Panda put an end to that by recognizing Mahalo as spam. Unfortunately it also took out a lot of more useful sites, including Ask Metafilter).

AdVon is a similar business in many ways, and Google's role in the middle is similar. The difference with AdVon is they're placing their garbage on formerly legitimate websites like the LA Times. At least Mahalo kept its spam confined to its own domain. I still think the sites that choose to run AdVon content have a lot to answer for.

this seems strictly worse than the old system of just providing links which you then read for info.

It's not. This topic is a derail so I'll keep it short. The LLMs are able to probe the meaning of the content of a site way better than traditional keyword-matching search engines. So that means it finds things better than basic keyword search. (Google has been doing this for years too). The synthesized textual answers are also helpful in being way easier to understand than a list of links. Either way you still have to verify the references actually support your conclusion, but the LLM search engines are already drafting that conclusion.
posted by Nelson at 11:19 AM on May 10 [2 favorites]


Affiliate links aren't hard to detect, I don't think. Neither are adtech trackers (DuckDuckGo puts out handy-dandy lists that I am currently using in my research). Downrank the hell out of sites that use them to excess (and yes, "excess" would need an operational definition), and voilà, we're back to useful web search engines!

But Google won't do that because Google is the biggest beneficiary of ad spend, ergo the entity that makes the most money off AI SEO slop.

Why DuckDuckGo hasn't figured this one out I'm honestly not sure, though. Seems like a no-brainer.
posted by humbug at 11:22 AM on May 10


My legal example is hardly a stretch, is it - to pick out a cited case in generated output, and re-query to search a legal database to verify the existence of that case?

Hardly a stretch to verify (citation database entry) Case A and slightly mangled (LLM generated citation) Case B are a match, and use that to unmangle Case B's citation text in the output. Automated verification that Case A is actually appropriate to cite, whether it makes sense to do so and correlates with the ostensible reasoning behind the citation is currently impossible. Statistically you can probably get it right most of the time because hopefully the training text cites are usually correct, and to a very limited extent the logic behind the authors' selecting those citations will be partially baked into the model (this becomes a lot more true if you have a few tens of thousands of real world examples). But you're definitely going to want a human lawyer to review, which might be a huge portion of that job in the future.

Automated verification of fitness to intended purpose would require modeling not just all the conceptual relationships of one specific legal system, but also how they interact with the host society and its concepts, and human behavior both observed and predictive, and adversarial prediction of likely human attempts to circumvent the law. Most of those are hard Turing-level problems of agent future state prediction, some of them are recursive in nature, and you're probably going to need decades-out real AGI not 2026~2028 OpenAI Q* "AGI" to solve them.

they have rushed this half-baked, incomplete AI model into the market, where it's currently cornered the market ...for bullshit. To me, the current AIs have been a breakthrough in machines "understanding" natural language, and the essential next step is to train different iterations on specialist, validated datasets.

The thing is that this is not half-baked or incomplete except in the sense that it isn't meeting the expectations of one person, or one group of people, or one use case. It isn't advertised as specialist in legal services because that was never the goal.

With generic text-generation LLMs there is typically only one goal: "given your training data, what is the most likely continuation of the prompt text?" And one approach (which as above: optimize for minimum possible difference between observed human-authored text and the network's output)

That prompt includes not only the words you write but the template bracketing it (which you can often see in part or in whole when you "break" an LLM), that training data includes not only all 2021 Internet text but also both human and automated adversarial training during fine-tuning. The network slightly modulates by a random seed value and a "temperature" value (which throttles how far outside a purely deterministic response the output is allowed to stray, roughly equivalent to a creativity slider when paired with a random seed, for image generating diffusion models the preferred term is "conformance"). The weights of the network are optimized to output the highest probability of "what comes next?" after factoring in all of the above considerations.

...that any of this works ever, even slightly, is the end product of unbelievable amounts of blood, sweat, tears, applied human intellect, and the wholesale theft or at least unsanctioned use of everything ever uploaded to the Internet.

All that said, mixture-of-experts models have already been the standard for text generation in every major language model since GPT-4 was trained in 2022. Definitely check this with an actual ML expert but my impression is this was initially motivated more by inference hardware memory limitations - if the broader model is actually an initial routing network that identifies the type of prompt, and hands it off to one or two out of several specialized networks, then we only need to load those one or two in VRAM instead of the full eight that comprise the entire network. My understanding is that the division of experts usually isn't by subject but by modality (fancy autocomplete for programmers gets an exception here, partly because it's very structurally regular and distinct from speech, and partly naked self-interest by the researchers behind most models: it's low-hanging fruit and they really want it).

I think what you're actually looking for is either a heavy fine-tuning of an existing model for a specific purpose plus a lot of special case LoRA's, or honestly just rolling purpose-built models from scratch (which is going to run a few million in GPU compute costs, plus non-trivial ML expert R&D salaries, plus you'd definitely become personally culpable for use of the Common Crawl to achieve English fluency). But even then, for anything like verifying suitability of legal citations you're still going to run up against the Turing-level problems above.
posted by Ryvar at 2:06 PM on May 10 [3 favorites]


(also my apologies if we're talking past each other. Two all-nighters this week, and a Friday ending in Visual Studio claiming 37 separate reasons for errors breakingpointing in alloc.c when I'm 99.9% certain memory allocation has nothing to do with the problem whatsoever. The constant detonation of scope and nested logic has left me utterly unable to stop going off on tangents in English, or stick to a fucking point. I haven't left this tiny apartment in three days or spoken to a human outside Zoom in five and I need to get off Metafilter and go do something about that. So yeah: my apologies because I probably missed your point completely)
posted by Ryvar at 2:50 PM on May 10 [3 favorites]


I can't believe that any self-respecting publisher would knowingly carry such stuff under their brand.

For the people in charge the usefulness of a brand is the arbitrage between its current prestige and eliminating the expenses that made building that prestige possible. They will ride it all the way to the bottom and toss it aside once it’s an empty husk.
posted by Horace Rumpole at 3:11 PM on May 10 [5 favorites]


So... I've been close enough to gen AI experiments and work that's actually shipped to see what current state is and how it's evolved over the last 18-24 months (longer if you go back before the hype cycle). AdVon isn't in the business of producing good work so the work will be garbage regardless. For people trying to use the current gen AI models for production quality work I liken it to a bunch of AI robots stocking shelves in a grocery store. It works pretty well most of the time, but the rest of the time the AI robots drop products, break things, and then you need humans to clean everything up with a bucket and mop because the janitor AI robots just smear everything everywhere and call it a day. Humans have to hand edit everything that they could have done better in the first place if you don't like piles of smeared garbage and broken glass. Now we get to step over piles of garbage and broken glass while other shelf stocking robots are using that as new data to train them on those patterns. At web scale.

Will gen AI get better? That's problem #2. Increasing the compute to create these types of larger models is super expensive. You also need net new data (synthetic data doesn't work for pretraining or fine tuning) if you want the models with these architectures to get better.

It's a mess. And I've seen this across copy, images, videos (oy), and other mediums (looking at you multimodals too). Yes, it gets better over time, but these types of models are hitting scaling walls. New architectures will come about, but without new breakthroughs these things are hitting a plateau.

* In one instance we had a team trying really hard to "prompt engineer" a SOTA LLM to get a very specific type of output. A hidden markup model solved the problem much faster and cleaner. If all you have is a really expensive hammer...

Re: homework, I saw this one live at a science fair. A student's project used a a few SOTA LLMs to do the same assignment as a few classmates to compare outputs and quality. It wasn't even close. Just looking at the output you could tell which one is summarizing a bunch of text and which one is writing, drawing images, designing and conducting physical experiments, coming up with novel solutions, etc. It wasn't the LLMs. You could also go talk to one of the students who was presenting that project at the same science fair a few rows away.
posted by ryoshu at 3:18 PM on May 10 [3 favorites]


Ryvar & Artful Codger: Above my point was primarily that AI warning extensions could afford to use humans to detect AI content, so long as they penalize the guilty site sufficently. It's tricky to know if you've penalized the sites sufficently though.
posted by jeffburdges at 4:25 PM on May 10


I'm finding LLMs are much more useful at answering questions verifiably when they provide sources. Both Bing and Phind have prioritized putting links to references in their answers so it's relatively straightforward for me to read the sources to figure out if the LLM is telling me something true or just making something up / misinterpreting it. Neither product has quite closed the loop on having some other AI process do that verification, I suspect that'd be pretty difficult to automate.
Thankfully I have no direct knowledge of either Bing or Phind (BTW: dumbest name I’ve seen in a while), but I wonder how you know the stuff they’re linking to isn’t generated too? Because if it isn’t now, it’s probably a project μsoft and their co-evils are working on furiously.
posted by Gilgamesh's Chauffeur at 6:29 PM on May 10


Wow. That McClatchy newspapers were running this garbage - product reviews with clearly idiotic contradictory statements, written by AI from a company being paid by clients mentioned in the reviews - is so damning. What the hell was McClatchy thinking?

I mean, we know what they were thinking, but...

wow.
posted by mediareport at 1:23 AM on May 11 [1 favorite]


Apologies for continuing this derail from sleazy AI ads but here's an example of what I mean about LLM search being useful with references. I've asked Phind the etymology of a particular type of place name "Bar". In the first answer Phind-70b gives me back a clear plausible answer with references for further reading. Unfortunately none of the references exactly have the definition that Phind has given me and my second query for a source doesn't really get anything new. Often the references Phind offers easy verification of Phind's calculations, but not in this case.

However there's enough context in the references offered to give me confidence Phind's answer is correct. IMHO this is particularly remarkable, it sure looks like the LLM has synthesized knowledge and isn't just repeating text it has stored. I don't generally trust an LLM to do that but it seems to have worked here.

I'm sure a traditional keyword search could find an answer to my question as well but it will take a lot more coaxing, particularly to work around the other common meanings of "bar". The specialist sources I know like Etymoline don't even have this particular meaning.

I wonder how you know the stuff they’re linking to isn’t generated too?

I look at at the stuff and evaluate its quality? Just like any form of research.
posted by Nelson at 7:57 AM on May 11


Wikipedia: Citation Needed. A new experimental Chrome browser extension that lets you select a sentence from a web page, then use an LLM in the background to try to find a Wikipedia page that supports the assertion in the sentence.
posted by Nelson at 6:52 AM on May 12 [2 favorites]


^^^ This. It's a start.
posted by Artful Codger at 10:59 AM on May 12


Both Bing and Phind have prioritized putting links to references in their answers so it's relatively straightforward for me to read the sources to figure out if the LLM is telling me something true or just making something up / misinterpreting it
I haven't used Phind, but my experience with various AI tools is that, in many if not most cases, the references themselves are made by the AI out of whole cloth and, while they 'look' correct, never lead anywhere in about 90% of cases. Most often, it's simply more work to try and verify the 'information' than it would be to write it yourself in the first place.
posted by dg at 8:43 PM on May 12


That is not a problem with Bing or Phind. Consider giving it a try.
posted by Nelson at 9:46 PM on May 12 [1 favorite]


« Older Fear, Cynicism, Nihilism, and Apathy   |   25 Newer »


You are not currently logged in. Log in or create a new account to post comments.