RegExp problem, remove iframe

Post Reply
Posts: 8
Joined: Tue May 19, 2020 9:58 am

RegExp problem, remove iframe

Post by guidoolijslager »


I have a problem to replace tekst in a string with a regular expression.
The regular expression works on (see enclosed picture). I found it on the Internet.
I can't get it to work in my script, it does nothing.
I tried many variations.

dummyTekst = dummyTekst.replace(/(?:<iframe[^>]*)(?:(?:\/>)|(?:>.*?<\/iframe>))/g, "");

Below the tekst to be cleaned example:

En het gaat steeds sneller. In de periode van 1996 tot 2015 is het areaal ruim 5% kleiner geworden, blijkt uit cijfers van het CBS. <iframe class=localfocusvisual" frameborder="0" style="width:100%;height:400px;overflow:hidden" src=""></iframe> Ruim de helft van die 125.000 ha voor de landbouw verloren hectares kreeg woningbouw als nieuwe bestemming. <iframe class="localfocusvisual" frameborder="0" style="width:100%;height:400px;overflow:hidden" src=""></iframe> Iets meer dan de helft van de landbouwgrond in 2018 was grasland.

What is wrong with my script? The iframe tags with the text inside has to be removed.

Thanks in advance,

Schermafbeelding 2020-05-20 om 10.31.58.png
Schermafbeelding 2020-05-20 om 10.31.58.png (93.67 KiB) Viewed 11152 times
Advanced member
Posts: 358
Joined: Mon Jun 12, 2017 8:48 pm
Location: Belgium

Re: RegExp problem, remove iframe

Post by Padawan »

Personally, I would have preferred to use this:
dummyTekst = dummyTekst.replace(/<iframe.+?><\/iframe>/g, "");

But unfortunately, Switch Regular expressions don't support the lazy quantifier (the "+?"), so it doesn't work. You can see what Switch regex supports in the documentation: ... sions.html

I have a very ugly solution which works in your test:
dummyTekst = dummyTekst.replace(/<iframe.{3,180}><\/iframe> ?/g, "");

Basically, I assume that the amount of text will always be between 3 and 180 characters. If you can be sure that the text of the iframe will always be more or less the same length, then this can be a workable solution.
Posts: 8
Joined: Tue May 19, 2020 9:58 am

Re: RegExp problem, remove iframe

Post by guidoolijslager »

Hi Padawan,

thanks for your reply.
The solution worked for me. :)

Do you also have an solution to quick test the regular expression from Switch?
Advanced member
Posts: 358
Joined: Mon Jun 12, 2017 8:48 pm
Location: Belgium

Re: RegExp problem, remove iframe

Post by Padawan »

I used a script to test it, I think in your use case this is the only option.
Posts: 8
Joined: Tue May 19, 2020 9:58 am

Re: RegExp problem, remove iframe

Post by guidoolijslager »

New problem with regexp.
I want to filter out <blockquote.........blockquote> See text below.

The script beneath doesn't work:

theBody = theBody.replace(/<blockquote.{3,8000}><\/blockquote> ?/g, "");

The text contains about 7402 characters.

What is wrong about the script?

<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-permalink=" ... gn=loading" data-instgrm-version="12" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:540px; min-width:326px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="16px;"> <a href=" ... gn=loading" style=" background:#FFFFFF; line-height:0; padding:0 0; text-align:center; text-decoration:none; width:100%;" target="_blank"> <div style=" display: flex; flex-direction: row; align-items: center;"> <div style="background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 40px; margin-right: 14px; width: 40px;"></div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center;"> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 100px;"></div> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; width: 60px;"></div></div></div><div style="padding: 19% 0;"></div> <div style="block; height:50px; margin:0 auto 12px; width:50px;"><svg width="50px" height="50px" viewBox="0 0 60 60" version="1.1" xmlns="" xmlns:xlink=""><g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g transform="translate(-511.000000, -20.000000)" fill="#000000"><g><path d="M556.869,30.41 C554.814,30.41 553.148,32.076 553.148,34.131 C553.148,36.186 554.814,37.852 556.869,37.852 C558.924,37.852 560.59,36.186 560.59,34.131 C560.59,32.076 558.924,30.41 556.869,30.41 M541,60.657 C535.114,60.657 530.342,55.887 530.342,50 C530.342,44.114 535.114,39.342 541,39.342 C546.887,39.342 551.658,44.114 551.658,50 C551.658,55.887 546.887,60.657 541,60.657 M541,33.886 C532.1,33.886 524.886,41.1 524.886,50 C524.886,58.899 532.1,66.113 541,66.113 C549.9,66.113 557.115,58.899 557.115,50 C557.115,41.1 549.9,33.886 541,33.886 M565.378,62.101 C565.244,65.022 564.756,66.606 564.346,67.663 C563.803,69.06 563.154,70.057 562.106,71.106 C561.058,72.155 560.06,72.803 558.662,73.347 C557.607,73.757 556.021,74.244 553.102,74.378 C549.944,74.521 548.997,74.552 541,74.552 C533.003,74.552 532.056,74.521 528.898,74.378 C525.979,74.244 524.393,73.757 523.338,73.347 C521.94,72.803 520.942,72.155 519.894,71.106 C518.846,70.057 518.197,69.06 517.654,67.663 C517.244,66.606 516.755,65.022 516.623,62.101 C516.479,58.943 516.448,57.996 516.448,50 C516.448,42.003 516.479,41.056 516.623,37.899 C516.755,34.978 517.244,33.391 517.654,32.338 C518.197,30.938 518.846,29.942 519.894,28.894 C520.942,27.846 521.94,27.196 523.338,26.654 C524.393,26.244 525.979,25.756 528.898,25.623 C532.057,25.479 533.004,25.448 541,25.448 C548.997,25.448 549.943,25.479 553.102,25.623 C556.021,25.756 557.607,26.244 558.662,26.654 C560.06,27.196 561.058,27.846 562.106,28.894 C563.154,29.942 563.803,30.938 564.346,32.338 C564.756,33.391 565.244,34.978 565.378,37.899 C565.522,41.056 565.552,42.003 565.552,50 C565.552,57.996 565.522,58.943 565.378,62.101 M570.82,37.631 C570.674,34.438 570.167,32.258 569.425,30.349 C568.659,28.377 567.633,26.702 565.965,25.035 C564.297,23.368 562.623,22.342 560.652,21.575 C558.743,20.834 556.562,20.326 553.369,20.18 C550.169,20.033 549.148,20 541,20 C532.853,20 531.831,20.033 528.631,20.18 C525.438,20.326 523.257,20.834 521.349,21.575 C519.376,22.342 517.703,23.368 516.035,25.035 C514.368,26.702 513.342,28.377 512.574,30.349 C511.834,32.258 511.326,34.438 511.181,37.631 C511.035,40.831 511,41.851 511,50 C511,58.147 511.035,59.17 511.181,62.369 C511.326,65.562 511.834,67.743 512.574,69.651 C513.342,71.625 514.368,73.296 516.035,74.965 C517.703,76.634 519.376,77.658 521.349,78.425 C523.257,79.167 525.438,79.673 528.631,79.82 C531.831,79.965 532.853,80.001 541,80.001 C549.148,80.001 550.169,79.965 553.369,79.82 C556.562,79.673 558.743,79.167 560.652,78.425 C562.623,77.658 564.297,76.634 565.965,74.965 C567.633,73.296 568.659,71.625 569.425,69.651 C570.167,67.743 570.674,65.562 570.82,62.369 C570.966,59.17 571,58.147 571,50 C571,41.851 570.966,40.831 570.82,37.631"></path></g></g></g></svg></div><div style="padding-top: 8px;"> <div style=" color:#3897f0; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:550; line-height:18px;"> Dit bericht bekijken op Instagram</div></div><div style="padding: 12.5% 0;"></div> <div style="display: flex; flex-direction: row; margin-bottom: 14px; align-items: center;"><div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(0px) translateY(7px);"></div> <div style="background-color: #F4F4F4; height: 12.5px; transform: rotate(-45deg) translateX(3px) translateY(1px); width: 12.5px; flex-grow: 0; margin-right: 14px; margin-left: 2px;"></div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(9px) translateY(-18px);"></div></div><div style="margin-left: 8px;"> <div style=" background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 20px; width: 20px;"></div> <div style=" width: 0; height: 0; border-top: 2px solid transparent; border-left: 6px solid #f4f4f4; border-bottom: 2px solid transparent; transform: translateX(16px) translateY(-4px) rotate(30deg)"></div></div><div style="margin-left: auto;"> <div style=" width: 0px; border-top: 8px solid #F4F4F4; border-right: 8px solid transparent; transform: translateY(16px);"></div> <div style=" background-color: #F4F4F4; flex-grow: 0; height: 12px; width: 16px; transform: translateY(-4px);"></div> <div style=" width: 0; height: 0; border-top: 8px solid #F4F4F4; border-left: 8px solid transparent; transform: translateY(-4px) translateX(8px);"></div></div></div></a> <p style=" margin:8px 0 0 0; padding:0 4px;"> <a href=" ... gn=loading" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">Ook @lidlnederland sluit zich aan bij onze Benefrietactie! #benefrietjes _ Vanaf vandaag in de winkel: Benefrietjes! Omdat wij onze aardappeltelers steunen sluiten we ons aan bij de Benefrietactie. We verkopen nu heerlijke frietaardappelen die eigenlijk voor de horeca waren bestemd! 2,5 kilo voor 0.99! Zo dragen ook wij bij aan minder verspilling. Kijk op voor meer informatie.</a></p> <p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">Een bericht gedeeld door <a href=" ... gn=loading" style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px;" target="_blank"> Benefrietjes</a> (@benefrietjes) op <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2020-06-22T16:21:48+00:00">22 Jun 2020 om 9:21 (PDT)</time></p></div></blockquote>
Posts: 79
Joined: Sun Nov 25, 2012 12:15 pm

Re: RegExp problem, remove iframe

Post by patej »

Hmm.. What's the result you get in Switch? I can't check right now myself, but looking at the code, one problem is that '.{3,8000}' matches everything after it, including </blockquote> so it will match text after that, too, if the character count allows.

Unfortunately Switch scripts don't support lookahead (unless you can use nodejs with Switch 2020...). You could try to get around that if you have a character that doesn't exist in the text anywhere, e.g. 'ø':

Code: Select all

theBody = theBody.replace(/<blockquote/g,'ø').replace(/<\/blockquote> ?/g,'ø').replace(/ø[^ø]{3,8000}ø/g, "");
So here I first replace blockquote tags with ø and then replace everything between the two ø's by looking for "all characters except ø".
Posts: 8
Joined: Tue May 19, 2020 9:58 am

Re: RegExp problem, remove iframe

Post by guidoolijslager »

The code didn't work.
I did replace the first en last blockquote but then nothing happened.
See below.

The first blockquote starts after de text:

Aardappeltelers dienen klacht in tegen reclame Lidl Een slogan van supermarktketen Lidl, die claimt met een actie aardappeltelers te steunen tijdens de coronacrisis, is bij aardappeltelers slecht gevallen. Ze wijzen erop dat de algehele inkoopprijzen van aardappelen door de coronacrisis onder de kostprijs liggen, dus dat van echte steun geen sprake is. De Producenten Organisatie Consumptieaardappelen (POC), met zo’n honderd telers als leden, heeft een klacht ingediend bij de Reclame Code Commissie omdat de advertentie misleidend zou zijn

Aardappeltelers dienen klacht in tegen reclame Lidl Een slogan van supermarktketen Lidl, die claimt met een actie aardappeltelers te steunen tijdens de coronacrisis, is bij aardappeltelers slecht gevallen. Ze wijzen erop dat de algehele inkoopprijzen van aardappelen door de coronacrisis onder de kostprijs liggen, dus dat van echte steun geen sprake is. De Producenten Organisatie Consumptieaardappelen (POC), met zo’n honderd telers als leden, heeft een klacht ingediend bij de Reclame Code Commissie omdat de advertentie misleidend zou zijnø class="instagram-media" data-instgrm-captioned data-instgrm-permalink=" ... gn=loading" data-instgrm-version="12" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:540px; min-width:326px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="16px;"> <a href=" ... gn=loading" style=" background:#FFFFFF; line-height:0; padding:0 0; text-align:center; text-decoration:none; width:100%;" target="_blank"> <div style=" display: flex; flex-direction: row; align-items: center;"> <div style="background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 40px; margin-right: 14px; width: 40px;"></div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center;"> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 100px;"></div> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; width: 60px;"></div></div></div><div style="padding: 19% 0;"></div> <div style="block; height:50px; margin:0 auto 12px; width:50px;"><svg width="50px" height="50px" viewBox="0 0 60 60" version="1.1" xmlns="" xmlns:xlink=""><g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g transform="translate(-511.000000, -20.000000)" fill="#000000"><g><path d="M556.869,30.41 C554.814,30.41 553.148,32.076 553.148,34.131 C553.148,36.186 554.814,37.852 556.869,37.852 C558.924,37.852 560.59,36.186 560.59,34.131 C560.59,32.076 558.924,30.41 556.869,30.41 M541,60.657 C535.114,60.657 530.342,55.887 530.342,50 C530.342,44.114 535.114,39.342 541,39.342 C546.887,39.342 551.658,44.114 551.658,50 C551.658,55.887 546.887,60.657 541,60.657 M541,33.886 C532.1,33.886 524.886,41.1 524.886,50 C524.886,58.899 532.1,66.113 541,66.113 C549.9,66.113 557.115,58.899 557.115,50 C557.115,41.1 549.9,33.886 541,33.886 M565.378,62.101 C565.244,65.022 564.756,66.606 564.346,67.663 C563.803,69.06 563.154,70.057 562.106,71.106 C561.058,72.155 560.06,72.803 558.662,73.347 C557.607,73.757 556.021,74.244 553.102,74.378 C549.944,74.521 548.997,74.552 541,74.552 C533.003,74.552 532.056,74.521 528.898,74.378 C525.979,74.244 524.393,73.757 523.338,73.347 C521.94,72.803 520.942,72.155 519.894,71.106 C518.846,70.057 518.197,69.06 517.654,67.663 C517.244,66.606 516.755,65.022 516.623,62.101 C516.479,58.943 516.448,57.996 516.448,50 C516.448,42.003 516.479,41.056 516.623,37.899 C516.755,34.978 517.244,33.391 517.654,32.338 C518.197,30.938 518.846,29.942 519.894,28.894 C520.942,27.846 521.94,27.196 523.338,26.654 C524.393,26.244 525.979,25.756 528.898,25.623 C532.057,25.479 533.004,25.448 541,25.448 C548.997,25.448 549.943,25.479 553.102,25.623 C556.021,25.756 557.607,26.244 558.662,26.654 C560.06,27.196 561.058,27.846 562.106,28.894 C563.154,29.942 563.803,30.938 564.346,32.338 C564.756,33.391 565.244,34.978 565.378,37.899 C565.522,41.056 565.552,42.003 565.552,50 C565.552,57.996 565.522,58.943 565.378,62.101 M570.82,37.631 C570.674,34.438 570.167,32.258 569.425,30.349 C568.659,28.377 567.633,26.702 565.965,25.035 C564.297,23.368 562.623,22.342 560.652,21.575 C558.743,20.834 556.562,20.326 553.369,20.18 C550.169,20.033 549.148,20 541,20 C532.853,20 531.831,20.033 528.631,20.18 C525.438,20.326 523.257,20.834 521.349,21.575 C519.376,22.342 517.703,23.368 516.035,25.035 C514.368,26.702 513.342,28.377 512.574,30.349 C511.834,32.258 511.326,34.438 511.181,37.631 C511.035,40.831 511,41.851 511,50 C511,58.147 511.035,59.17 511.181,62.369 C511.326,65.562 511.834,67.743 512.574,69.651 C513.342,71.625 514.368,73.296 516.035,74.965 C517.703,76.634 519.376,77.658 521.349,78.425 C523.257,79.167 525.438,79.673 528.631,79.82 C531.831,79.965 532.853,80.001 541,80.001 C549.148,80.001 550.169,79.965 553.369,79.82 C556.562,79.673 558.743,79.167 560.652,78.425 C562.623,77.658 564.297,76.634 565.965,74.965 C567.633,73.296 568.659,71.625 569.425,69.651 C570.167,67.743 570.674,65.562 570.82,62.369 C570.966,59.17 571,58.147 571,50 C571,41.851 570.966,40.831 570.82,37.631"></path></g></g></g></svg></div><div style="padding-top: 8px;"> <div style=" color:#3897f0; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:550; line-height:18px;"> Dit bericht bekijken op Instagram</div></div><div style="padding: 12.5% 0;"></div> <div style="display: flex; flex-direction: row; margin-bottom: 14px; align-items: center;"><div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(0px) translateY(7px);"></div> <div style="background-color: #F4F4F4; height: 12.5px; transform: rotate(-45deg) translateX(3px) translateY(1px); width: 12.5px; flex-grow: 0; margin-right: 14px; margin-left: 2px;"></div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(9px) translateY(-18px);"></div></div><div style="margin-left: 8px;"> <div style=" background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 20px; width: 20px;"></div> <div style=" width: 0; height: 0; border-top: 2px solid transparent; border-left: 6px solid #f4f4f4; border-bottom: 2px solid transparent; transform: translateX(16px) translateY(-4px) rotate(30deg)"></div></div><div style="margin-left: auto;"> <div style=" width: 0px; border-top: 8px solid #F4F4F4; border-right: 8px solid transparent; transform: translateY(16px);"></div> <div style=" background-color: #F4F4F4; flex-grow: 0; height: 12px; width: 16px; transform: translateY(-4px);"></div> <div style=" width: 0; height: 0; border-top: 8px solid #F4F4F4; border-left: 8px solid transparent; transform: translateY(-4px) translateX(8px);"></div></div></div></a> <p style=" margin:8px 0 0 0; padding:0 4px;"> <a href=" ... gn=loading" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">Ook @lidlnederland sluit zich aan bij onze Benefrietactie! #benefrietjes _ Vanaf vandaag in de winkel: Benefrietjes! Omdat wij onze aardappeltelers steunen sluiten we ons aan bij de Benefrietactie. We verkopen nu heerlijke frietaardappelen die eigenlijk voor de horeca waren bestemd! 2,5 kilo voor 0.99! Zo dragen ook wij bij aan minder verspilling. Kijk op voor meer informatie.</a></p> <p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">Een bericht gedeeld door <a href=" ... gn=loading" style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px;" target="_blank"> Benefrietjes</a> (@benefrietjes) op <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2020-06-22T16:21:48+00:00">22 Jun 2020 om 9:21 (PDT)</time></p></div>øBenefrietDe gewraakte claim stond onder andere op een folder van Lidl voor zogeheten benefriet. Dat zijn fritesaardappelen die eigenlijk bedoeld waren voor de horeca, maar door gedwongen sluitingen van cafés en restaurants onverkocht bleven. De aardappelen gaan voor 99 cent per 2,5 kilo over de toonbank.Volgens de POC worden op de markt voor rond de 19 cent per kilo fritesaardappelen ingekocht, wat onder de kostprijs ligt. Er zou pas van echte steun sprake zijn als Lidl per kilo 25 cent zou betalen, aldus de belangenvereniging.
Advanced member
Posts: 358
Joined: Mon Jun 12, 2017 8:48 pm
Location: Belgium

Re: RegExp problem, remove iframe

Post by Padawan »

I also couldn't get it to work using regular expressions. I found an alternative way:

Code: Select all

	var theBodySplitted = theBody.split(/<\/?blockquote>?/);
	var theResult = "";
	for (var i=0;i<theBodySplitted.length;i++) {
		// Check if index is even
		if (i%2 == 0) {
			theResult = theResult + theBodySplitted[i];
	s.log(-1, theResult);
Basically I split the text on the blockquote open and close tags and keep the parts which have even indexes in the splitted array.
Posts: 79
Joined: Sun Nov 25, 2012 12:15 pm

Re: RegExp problem, remove iframe

Post by patej »

Padawan wrote: Wed Jul 01, 2020 8:25 am I also couldn't get it to work using regular expressions. I found an alternative way:
Basically I split the text on the blockquote open and close tags and keep the parts which have even indexes in the splitted array.
Thanks @Padawan, that's much cleaner and easier, somehow I just had my mind fixed in the idea that the solution has to use regexp :roll:
Posts: 8
Joined: Tue May 19, 2020 9:58 am

Re: RegExp problem, remove iframe

Post by guidoolijslager »

Thanks @Padawan and @patej for your assistance.
I works now :)


Post Reply