Page 1 of 1

Unknown unicode character in EPG title crashes scheduled rec

PostPosted: Sat Mar 23, 2019 10:50 pm
by Viewfinder
As recently as the last month, I have begun to notice a strange unicode character in some BBC programmes that can appear in either the EPG title or description. If the unicode character appears in the EPG title and is scheduled to record, the record task will attempt to start and abort because of the unknown character. If the character appears in the description, the record will be fine, it seems only if you use %E epg task name variable in your record filename that this character will cause a failed recording.

The character looks very simiilar to unicode 251C box drawing character ├ but not as long and bold but thinner and shorter but not 251D ┝. Somewhere in between. I have tried to paste the unknown character into this post but it always appears as a strange square . Probably because browser doesn't know what it is. Will attach samples of the character as a text file.

Have searched around to see if anyone else has seen this but can't find any mentions. I have tried a different DVB app - TBSViewer which also sees the same character so I suspect it's something perhaps the BBC has introduced. Possibly some kind of protection perhaps?

The character appears to appear where an apostrophe should be, but not always. There are some programmes that contain apostrophes that show fine so not sure if there's any reason why some apostrophes are fine and some are showing an unknown character.

The only workaround so far is to schedule recording to start much earlier than the programme you want to that it adopts the previous epg event name...

Can't seem to upload a txt file attachment as extension txt is not allowed on this forum?

Re: Unknown unicode character in EPG title crashes scheduled rec

PostPosted: Sun Mar 24, 2019 9:47 am
by SmartDVB
Sounds strange. Are those BBC channels using freeview EPG (28.8E)? Perhaps that character could just be sifted if it's not some other unicode conversion bug (i use UTF8 mostly).

Can't seem to upload a txt file attachment as extension txt is not allowed on this forum?


modified the board settings to allow .txt uploads, so you can try again.

Re: Unknown unicode character in EPG title crashes scheduled rec

PostPosted: Fri Mar 29, 2019 11:02 pm
by Viewfinder
Thanks, not sure if it is 28.8E as using Free To Air (Freeview) TV and not satellite.

I did a search for that odd character in EPG Search and it has brought many matching programmes up containing that character and it seems it's not just BBC only but others. At first I thought it was BBC only. Also appeared in EPG titles or description on Sony Movie Channel, PBS America and Sports Channel Network just to name a few...

Thanks for updating forum for txt files. Have attached txt file with Unicode txt file containing character in some titles and descriptions. Hope it shows up in the file.

I can't find an exact match in ascii / extended code tables so not sure where it is coming from...

Will try and find an upcoming event with it in and see what the logfile contains when it aborts.

Thanks.

Re: Unknown unicode character in EPG title crashes scheduled rec

PostPosted: Fri Mar 29, 2019 11:27 pm
by Viewfinder
EPG Search screenshot attached...

I've scheduled that programme on Sony Movie Channel tomorrow so will see what logfile reveals when it aborts due to event name anomaly.

Re: Unknown unicode character in EPG title crashes scheduled rec

PostPosted: Sat Mar 30, 2019 2:29 pm
by Viewfinder
Logfile extract from attempted schedule record / fail test of that Sony Movie channel programme A Daughters Conviction... Scheduler has 5 min pre padding so log file is from 9.05.

I thought there might have been a 0k recording file but no file was left behind in recording folder.

Re: Unknown unicode character in EPG title crashes scheduled rec

PostPosted: Tue Apr 02, 2019 10:16 pm
by SmartDVB
Logfile extract from attempted schedule record / fail test of that Sony Movie channel programme A Daughters Conviction... Scheduler has 5 min pre padding so log file is from 9.05.

I thought there might have been a 0k recording file but no file was left behind in recording folder.


thanks for this. I managed to find the culprate. Seems to be an EOM (End Of Media character, 0x19 UTF8). What it's doing there i don't know so for now i'll just sift the character out from the recording name when recording as the API fopen calls seem to fail with this EOM char (can't really see it documented in the API's though).

Re: Unknown unicode character in EPG title crashes scheduled rec

PostPosted: Wed Apr 03, 2019 11:55 am
by Viewfinder
Great, glad you found the cause, thank you.

Looking at more EPG event listings, it's also a strange coincidence how it only appears where an apostrophe / end quote would be, but in some other events, the apostrophe appears fine.

In the attached, where a description would ordinarily use 'quotes', not sure if it should be single ' or double ", sometimes the open quote is an up arrow character 0x18 and the close quote is that EOM character 0x19.
Then just to confuse us, other quotes appear fine!

For example: The Hangman of Lyon has uparrow 0x18 character open quote and EOM close quote 0x19. Then in another event it quotes 'diamond-horned sea-unicorn' perfectly fine with single apostrophes. Bizarre. It's not an issue if they only appear in the EPG event description and event title is not affected. At the moment I've not seen the uparrow character in event titles but perhaps it might be another character to sift out in case?

As always if you need me to test a fix, let me know. Thanks again.

Edit: According to StackOverflow curly quotes and apostrophes may be the cause. Also Converting curly quotes:

"You can get those values if you copy and paste a Microsoft Word document with smart quotes turned on... These "smart" single quote and "smart" double characters are being stored as hex 18, 19, 1C, 1D"

I wonder if someone behind the scenes at the Freeview/FTA has recently started to copy and paste curly quotes into DVB-T EPG... :?