phpList.org

Rss plugin doing strange things with certain feed items

Tags: #<Tag:0x00007f0e9247c460>

I am using the RSS plugin (version 2.8.2+20200413) to send newsletters on the basis of an ATOM feed. The feed validates as valid Atom 1.0 and I use a very simple custom template:

<h3><a href="[URL]">[TITLE]</a></h3>
[CONTENT]
<hr />

This works well for the most part, but it does strange things with the content for certain alternate links. For now, I noticed this when the link is to a PDF or to a YouTube URL.

For example for YouTube the feed says the following:

<entry>
<title type="html">Understanding Digital Racism After COVID-19</title>
<link rel="alternate" href="https://www.youtube.com/watch?v=2V0PNzybYwQ"/>
<id>https://insights.hansdezwart.nl/link/1142</id>
<published>2020-11-30T11:05:14+01:00</published>
<updated>2020-11-30T11:06:17+01:00</updated>
<summary type="html">
<![CDATA[<p>From <a title="See all links from YouTube" href="https://insights.hansdezwart.nl/source/youtube">YouTube</a> on November 12, 2020</p><p>The Oxford Internet Institute hosts Lisa Nakamura, Director Digital Studies Institute, Gwendolyn Calvert Baker Collegiate Professor, Department of American Culture, University of Michigan, Ann Arbor. Professor Nakamura is the founding Director of the Digital Studies Institute at the University of Michigan, and a writer focusing on digital media, race, and gender. 'We are living in an open-ended crisis with two faces: unexpected accelerated digital adoption and an impassioned and invigorated racial justice movement. These two vast and overlapping cultural transitions require new inquiry into the entangled and intensified dialogue between race and digital technology after COVID. My project analyzes digital racial practices on Facebook, Twitter, Zoom, and TikTok while we are in the midst of a technological and racialized cultural breaking point, both to speak from within the crisis and to leave a record for those who come after us. How to Understand Digital Racism After COVID-19 contains three parts: Methods, Objects, and Making, designed to provide humanists and critical social scientists from diverse disciplines or experience levels with pragmatic and easy to use tools and methods for accelerated critical analyses of the digital racial pandemic.'</p><p>Tagged with: <a href="https://insights.hansdezwart.nl/tag/black-struggle" title="See all links that are tagged with 'black-struggle'">black-struggle</a> · <a href="https://insights.hansdezwart.nl/tag/covid-19" title="See all links that are tagged with 'covid-19'">covid-19</a> · <a href="https://insights.hansdezwart.nl/tag/not-read" title="See all links that are tagged with 'not-read'">not-read</a> · <a href="https://insights.hansdezwart.nl/tag/racist-technology" title="See all links that are tagged with 'racist-technology'">racist-technology</a></p>]]>
</summary>
<category term="black-struggle"/>
<category term="covid-19"/>
<category term="not-read"/>
<category term="racist-technology"/>
</entry>

And the HTML of the email then says:

<hr /><h3><a href="https://www.youtube.com/watch?v=2V0PNzybYwQ">Understanding Digital Racism After COVID-19</a></h3>
<iframe width="560" height="315" src="//www.youtube.com/embed/2V0PNzybYwQ" frameborder="0"></iframe>

For some reason the plugin has turned the content which should just be the text from the summary into an attempt to embed a YouTube video.

Another example is a link to a PDF. This is what is in the feed:

<entry>
<title type="html">Discriminating Systems: Gender, Race, and Power in AI</title>
<author>
<name>Kate Crawford</name>
</author>
<author>
<name>Meredith Whittaker</name>
</author>
<author>
<name>Sarah Myers West</name>
</author>
<link rel="alternate" href="https://ainowinstitute.org/discriminatingsystems.pdf"/>
<id>https://insights.hansdezwart.nl/link/1124</id>
<published>2020-11-25T11:30:03+01:00</published>
<updated>2020-11-28T21:49:57+01:00</updated>
<summary type="html">
<![CDATA[<p>Written by <a href="/author/kate-crawford" title="See all articles written by Kate Crawford">Kate Crawford</a>, <a href="/author/meredith-whittaker" title="See all articles written by Meredith Whittaker">Meredith Whittaker</a> and <a href="/author/sarah-myers-west" title="See all articles written by Sarah Myers West">Sarah Myers West</a></p><p>From <a title="See all links from AI Now Institute" href="https://insights.hansdezwart.nl/source/ai-now-institute">AI Now Institute</a> on April 1, 2019</p><p>The diversity crisis in AI is well-documented and wide-reaching. It can be seen in unequal workplaces throughout industry and in academia, in the disparities in hiring and promotion, in the AI technologies that reflect and amplify biased stereotypes, and in the resurfacing of biological determinism in automated systems.</p><p>Tagged with: <a href="https://insights.hansdezwart.nl/tag/not-read" title="See all links that are tagged with 'not-read'">not-read</a> · <a href="https://insights.hansdezwart.nl/tag/racist-technology" title="See all links that are tagged with 'racist-technology'">racist-technology</a></p>]]>
</summary>
<category term="not-read"/>
<category term="racist-technology"/>
</entry>

And this is the HTML in the email:

<hr /><h3><a href="https://ainowinstitute.org/discriminatingsystems.pdf">Discriminating Systems: Gender, Race, and Power in AI</a></h3>
<a href="https://ainowinstitute.org/discriminatingsystems.pdf" target="_blank">https://ainowinstitute.org/discriminatingsystems.pdf</a>

For good measure, this is an example of when it does work. From the feed:

<entry>
<title type="html">This is the Stanford vaccine algorithm that left out frontline doctors</title>
<author>
<name>Eileen Guo</name>
</author>
<author>
<name>Karen Hao</name>
</author>
<link rel="alternate" href="https://www.technologyreview.com/2020/12/21/1015303/stanford-vaccine-algorithm/"/>
<id>https://insights.hansdezwart.nl/link/1292</id>
<published>2020-12-23T09:55:02+01:00</published>
<updated>2020-12-26T14:54:33+01:00</updated>
<summary type="html">
<![CDATA[<p>Written by <a href="/author/eileen-guo" title="See all articles written by Eileen Guo">Eileen Guo</a> and <a href="/author/karen-hao" title="See all articles written by Karen Hao">Karen Hao</a></p><p>From <a title="See all links from MIT Technology Review" href="https://insights.hansdezwart.nl/source/mit-technology-review">MIT Technology Review</a> on December 21, 2020</p><p>The university hospital blamed a “very complex algorithm” for its unequal vaccine distribution plan. Here’s what went wrong.</p><p>Tagged with: <a href="https://insights.hansdezwart.nl/tag/algorithmic-bias" title="See all links that are tagged with 'algorithmic-bias'">algorithmic-bias</a> · <a href="https://insights.hansdezwart.nl/tag/algorithmic-regulation" title="See all links that are tagged with 'algorithmic-regulation'">algorithmic-regulation</a> · <a href="https://insights.hansdezwart.nl/tag/covid-19" title="See all links that are tagged with 'covid-19'">covid-19</a> · <a href="https://insights.hansdezwart.nl/tag/racist-technology" title="See all links that are tagged with 'racist-technology'">racist-technology</a> · <a href="https://insights.hansdezwart.nl/tag/vaccination" title="See all links that are tagged with 'vaccination'">vaccination</a></p>]]>
</summary>
<category term="algorithmic-bias"/>
<category term="algorithmic-regulation"/>
<category term="covid-19"/>
<category term="racist-technology"/>
<category term="vaccination"/>
</entry>

And the HTML in the email:

<hr /><h3><a href="https://www.technologyreview.com/2020/12/21/1015303/stanford-vaccine-algorithm/">This is the Stanford vaccine algorithm that left out frontline doctors</a></h3>
<p>Written by <a href="https://insights.hansdezwart.nl/author/eileen-guo" rel="noreferrer" target="_blank">Eileen Guo</a> and <a href="https://insights.hansdezwart.nl/author/karen-hao" rel="noreferrer" target="_blank">Karen Hao</a></p><p>From <a href="https://insights.hansdezwart.nl/source/mit-technology-review" rel="noreferrer" target="_blank">MIT Technology Review</a> on December 21, 2020</p><p>The university hospital blamed a “very complex algorithm” for its unequal vaccine distribution plan. Here’s what went wrong.</p><p>Tagged with: <a href="https://insights.hansdezwart.nl/tag/algorithmic-bias" rel="noreferrer" target="_blank">algorithmic-bias</a> · <a href="https://insights.hansdezwart.nl/tag/algorithmic-regulation" rel="noreferrer" target="_blank">algorithmic-regulation</a> · <a href="https://insights.hansdezwart.nl/tag/covid-19" rel="noreferrer" target="_blank">covid-19</a> · <a href="https://insights.hansdezwart.nl/tag/racist-technology" rel="noreferrer" target="_blank">racist-technology</a> · <a href="https://insights.hansdezwart.nl/tag/vaccination" rel="noreferrer" target="_blank">vaccination</a></p>

Could it be because I use a <summary> tag rather than a <content> tag?

If anybody would like to test this out, you can find the feed here.

@hansdezwart Not sure what you are doing. The feed https://insights.hansdezwart.nl/tag/racist-technology/feed/newsletter is using content elements but the extracts that you showed have summary elements.

When I copied the feed and changed content elements to summary then the output looks correct
image

The plugin uses summary in preference to content, but I’m not sure why the link element is being used. I will look into that.

Apologies, I changed my feed generation to use content rather than summary right after I wrote the post.

But I can confirm that I had these issues (the link being used) when I was using summary. I have not been able to test it yet now that I am using content. Does the plugin use content if no summary is present?

Not sure what your screenshot is showing? I don’t think there should be a date underneath the title?

Anyways, I’ve now switched the feed back to summary for easy testing. Do let me know if there is anything I can do to help with the testing…

@hansdezwart I was testing an existing campaign with a different template to yours. Using your template the result is

Using the summary element instead of content should work, it has done for me. You might need to clear down the plugins tables in the database, depending on how far you have got with testing this. Otherwise use the “Delete outdated RSS items” with a value of 1 day to remove most of the items. Then fetch the feed again.

The plugin uses the Picofeed library for the RSS feed handling and that does some special processing when the link is to a youtube video or to a pdf file. I hadn’t realised that as no-one else has reported it as a problem. So long as you use summary element then that special processing should not matter. I cannot immediately see a way to disable it though.

Thank you @duncanc. I deleted all the relevant items in the three RSS plugin database tables and reloaded the feed. From a first glance it does indeed seem to work. If that changes, I’ll keep you posted.

I too can’t see a sustainable and non-hacky way for turning off the processing that Picofeed does. I’ve therefore raised an issue on the picofeed Github.

@hansdezwart Thanks. This made me look again at Picofeed and found that the original repository (https://github.com/fguillot/picoFeed ) on GitHub no longer exists. I knew that it was not being developed but had not realised that it had gone away.

As a short term measure I will change the plugin so that picofeed is not managed by composer, in effect freeze the code locally. I can then make a change to avoid the problem with the content that you found.

There are a few forks of picofeed, the one that you referenced and also https://github.com/nicolus/picoFeed
I’ll take a closer look at those to see whether it is worth switching, but I’m reasonably happy to continue using the current version of picofeed so long as no other problems occur.

Thanks for the update. That sounds like a very sensible approach: it is not as if the RSS and ATOM specifications are in continuous flux…

@hansdezwart There is a new release of the plugin that lets you select whether to use the special processing for Youtube videos and pdf files. The default is not to do that. You can update the plugin on the Manage Plugins page.

The new config settings are explained on the plugin’s documenation page https://resources.phplist.com/plugin/rssfeed#configuration

1 Like

Thanks! I’ve updated the plugin. I’ll let you know if I run into any problems, but don’t expect them…

1 Like