PDA

View Full Version : Unicode characters, GTFO of my RSS!


cheech151337
03-04-2008, 02:14 PM
You cant use unicode characters in UTF-8 encoding, only representations of them! This is the second time I've caught this; perhaps time to send out a company wide memo or implement some checks?

Current offenders:
http://revision3.com/lilsuperstar/feed/quicktime-high-definition/
http://revision3.com/internetsuperstar/feed/quicktime-high-definition/

Validator:
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Frevision3.com%2Flilsupe rstar%2Ffeed%2Fquicktime-high-definition%2F

line 104, column 49: 'utf8' codec can't decode byte 0x80 in position 8127: unexpected code byte (maybe a high-bit character?)
You know, Iâ??ve been around the Internet block quite a few times, and ...


From the help page:

A common cause of this error is having a high-bit character (such as a curly quote or curly apostrophe) in your RSS feed. This can happen if you copy-and-paste a quote from another page that contains curly quotes. For maximum compatibility with readers, you should remove the invalid character or use a numeric entity equivalent.

phatlip12
03-04-2008, 04:40 PM
Go outside.


;)

Thanks for the heads up.

Bani-Banan
03-04-2008, 09:23 PM
All I have to say is: WOW!

You really need to GTFO of my interwebz.
- You're cock blocking me.


I guess they're thankful.

cheech151337
03-31-2008, 05:51 PM
Here we go again:
http://revision3.com/internetsuperstar/feed/quicktime-high-definition/

Reference to undefined entity 'Atilde'.
Line: 70 Character: 47
Fat kids are out to prove something, and theyâ<99>re getting badly hurt in the process.

gimpbully
03-31-2008, 09:48 PM
Sorry Folks, This should now be fixed.

samureye
03-31-2008, 11:55 PM
Sorry Folks, This should now be fixed.
That's what she said.

cheech151337
05-06-2008, 05:50 PM
This is getting old: http://revision3.com/internetsuperstar/feed/quicktime-high-definition/

"Who Needs a Movie? Fred and Sharon are a husband and wife duo who make videos. But I wouldnâ"

gimpbully
05-06-2008, 06:13 PM
fixed. We are working on a solution to this problem, but for now please just post here if you notice crappy unicode (I blame the producers *cough*)

cheech151337
07-19-2008, 12:05 AM
"Oh no not this guy again...." :)

http://revision3.com/winelibrarytv/feed/quicktime-high-definition/

An invalid character was found in text content.
Line: 64 Character: 124
Episode 500 features 3 Riojas and was taped on location before an amazing live audience at Crushpad in San Francisco&acirc

tokenuser
07-19-2008, 01:55 AM
"Oh no not this guy again...." :)

http://revision3.com/winelibrarytv/feed/quicktime-high-definition/

An invalid character was found in text content.
Line: 64 Character: 124
Episode 500 features 3 Riojas and was taped on location before an amazing live audience at Crushpad in San Francisco&acircNot sure if its been fixed over the past 2 hours ... but no issues for me.

AND ... this gets moved to support where it belongs.

cheech151337
07-19-2008, 01:40 PM
Not sure if its been fixed over the past 2 hours ... but no issues for me.

AND ... this gets moved to support where it belongs.
Yep it's fixed. I think this thread pre-dates the support fourm.. :P

cheech151337
07-25-2008, 01:35 PM
Hey look, a non unicode error!

http://revision3.com/popsiren/feed/quicktime-high-definition?subshow=false


Whitespace is not allowed at this location.
Line: 213 Character: 101
<itunes:keywords>science, burn money, combustion, flammability, twitter, plurk, fringe, apes & androids, electro-pop, history, -gate, gate suffix, andrew keen, the great seduction</itunes:keywords>


I need this thread renamed: Whitespace (in improper locations) AND unicode GTFO of my RSS! :)

tokenuser
07-25-2008, 02:14 PM
Nope - its reported as a whitespace error, but it is in fact a non delimited ampersand.

cheech151337
07-25-2008, 03:46 PM
Nope - its reported as a whitespace error, but it is in fact a non delimited ampersand.
I believe the word you are looking for is escaped, delimited just means separated. But in any case you are correct, It looks like the ampersand is the escape character for xml.

Edit: Just tested my hunch, double ampersand fixes the problem. They really should send out a company wide email to all the producers that bullet points the do's and dont's when writing in fields that will end up in the RSS feed.

darknessgp
07-25-2008, 04:02 PM
...
Edit: Just tested my hunch, double ampersand fixes the problem. They really should send out a company wide email to all the producers that bullet points the do's and dont's when writing in fields that will end up in the RSS feed.

Or just have an automated process that corrects issues like the one above before sticking into the RSS.

gimpbully
07-25-2008, 07:22 PM
heh..
They really should send out a company wide email to all the producers that bullet points the do's and dont's when writing in fields that will end up in the RSS feed.

slonkak
07-25-2008, 09:01 PM
Edit: Just tested my hunch, double ampersand fixes the problem.

Actually, &amp; is the proper way to denote the ampersand. In XML, the "&" character is one way to denote the start of an entity. "amp;" is defined as an entity for the literal "&" character. Therefore, if you want to print an ampersand in an XML document you need to call the ampersand entity via first saying you need an entity, "&", then saying what entity you need, "amp;", which will result in "&" being properly understood.

cheech151337
07-26-2008, 09:45 PM
Attack of the ampersand again, looks like you rolled back the error.

http://revision3.com/popsiren/feed/quicktime-high-definition?subshow=false
Line: 213 Character: 101
<itunes:keywords>science, burn money, combustion, flammability, twitter, plurk, fringe, apes & androids, electro-pop, history, -gate, gate suffix, andrew keen, the great seduction</itunes:keywords>



I'm not sure about this one, IE7 is not giving me an error which is what I usually use to confirm these are feed errors, but my Python script is puking, so I'll post anyway:

http://revision3.com/diggnation/feed/quicktime-high-definition/
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 132: ordinal not in range(128)

u2019 is the right single quotation mark aka '

gimpbully
07-26-2008, 09:58 PM
Sorry, should be fixed now. If people in this world would stop using MS-freaking-word, we'd all be a little bit happier, kinder and functional.
Attack of the ampersand again, looks like you rolled back the error.

http://revision3.com/popsiren/feed/quicktime-high-definition?subshow=false
Line: 213 Character: 101
<itunes:keywords>science, burn money, combustion, flammability, twitter, plurk, fringe, apes & androids, electro-pop, history, -gate, gate suffix, andrew keen, the great seduction</itunes:keywords>



I'm not sure about this one, IE7 is not giving me an error which is what I usually use to confirm these are feed errors, but my Python script is puking, so I'll post anyway:

http://revision3.com/diggnation/feed/quicktime-high-definition/
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 132: ordinal not in range(128)

u2019 is the right single quotation mark aka '

cheech151337
07-26-2008, 10:05 PM
Sorry, should be fixed now. If people in this world would stop using MS-freaking-word, we'd all be a little bit happier, kinder and functional.
?subshow=false link for popSiren is still broke, and the diggnation one is still the same, you sure that ' is allowed to be in there?

gimpbully
07-26-2008, 10:30 PM
http://revision3.com/popsiren/feed/quicktime-high-definition
you happened to catch a url from a code base we've since rolled back

as for the diggnation one... i have no way to reproduce your problem. Works fine in itunes and all browsers i point at it...

?subshow=false link for popSiren is still broke, and the diggnation one is still the same, you sure that ' is allowed to be in there?

gimpbully
07-26-2008, 10:42 PM
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Frevision3.com%2Fdiggnat ion%2Ffeed%2Fquicktime-high-definition%2F

perhaps you have a cached version?
http://revision3.com/popsiren/feed/quicktime-high-definition
you happened to catch a url from a code base we've since rolled back

as for the diggnation one... i have no way to reproduce your problem. Works fine in itunes and all browsers i point at it...

cheech151337
07-26-2008, 10:55 PM
http://feedvalidator.org/check.cgi?url=http%3A%2F%2Frevision3.com%2Fdiggnat ion%2Ffeed%2Fquicktime-high-definition%2F (http://feedvalidator.org/check.cgi?url=http%3A%2F%2Frevision3.com%2Fdiggnat ion%2Ffeed%2Fquicktime-high-definition%2F)

perhaps you have a cached version?
Nah, its not chached (besides my script has no chaching feature).

And to prove it: http://feedvalidator.org/check.cgi?url=http%3A%2F%2Frevision3.com%2Fpopsire n%2Ffeed%2Fquicktime-high-definition%3Fsubshow%3Dfalse (notice ampersand is still in action there)

As for the Diggnation one, I know it validates and stuff, but I don't know if its right, I mean is ' not outside the 128 range, and thus should be &rsquo; ? I'm not sure where the blame goes, is the validator is wrong or my xml library I use? I think the XML spec needs to be pulled and look this one up manually.

Edit: Let me sleep on the Diggnation problem, It may be my fault, I will take a fresh look at it tomorrow.

Edit 2: On the popSiren one, the main feed is fixed, its the non-daily feed, aka the ?subshow=false aka:
http://revision3.com/popsiren/feed/quicktime-high-definition?subshow=false vs
http://revision3.com/popsiren/feed/quicktime-high-definition/

gimpbully
07-27-2008, 03:54 AM
Oh Oh, sorry, the popsiren weekly feed has been fixed. Dunno how that got put in there..

Edit 2: On the popSiren one, the main feed is fixed, its the non-daily feed, aka the ?subshow=false aka:
http://revision3.com/popsiren/feed/quicktime-high-definition?subshow=false vs
http://revision3.com/popsiren/feed/quicktime-high-definition/

cheech151337
07-27-2008, 04:19 AM
ok gimp, take a look at this article, I think this goes along with your MS word problem, and it matches up with the unicode character I'm getting in the Diggnation feed: http://weblogs.asp.net/sbehera/archive/2006/02/28/439299.aspx

I don't know why its validating, but I don't think it should.

BenKing
07-30-2008, 05:03 AM
Also, could you please get rid of the <p> </p> tags in the description fields? It's causing this to happen in iTunes.

http://www.grabup.com/uploads/df2af61df1ade8ad3e181483e67d331f.png

There's no real need for there to be <p> HTML tags surrounding the episode descriptions in the first place.

gimpbully
07-30-2008, 08:04 PM
Thanks for finding the field, fixed.
ok gimp, take a look at this article, I think this goes along with your MS word problem, and it matches up with the unicode character I'm getting in the Diggnation feed: http://weblogs.asp.net/sbehera/archive/2006/02/28/439299.aspx

I don't know why its validating, but I don't think it should.