Foren: Forum of Decimal BASIC (Thread #40672)

unicode string from a file have extra character (2019-04-22 18:16 by eros #82868)

Hi
at first thanks for this nice and very useful programming language.
using Decimal Basic 8.0.1.6
suppose a text file saved with notepad using encoding UTF-8
reading that text with the following program will add extra character at position 1
this will not happen if we get the string value from inside the code
this is true whatever the language of the text used inside the file

REM OPTION CHARACTER MULTIBYTE
OPEN #1: NAME "test.txt"
INPUT #1: words$
REM LET words$ = "ABC DEF"
for i=1 TO LEN(words$)
PRINT i; words$(i:i) ; ORD(words$(i:i))
NEXT i
CLOSE #1
PRINT "string length = "; LEN(words$)
END

the ouptput is like this:
1  65279
2 A 65
3 B 66
4 C 67
5 32
6 D 68
7 E 69
8 F 70
string length = 8

using BASICAcc2 which are using Lazarus, it adds additional characters at the end
the output is like this
1  65279
2 A 65
3 B 66
4 C 67
5 32
6 D 68
7 E 69
8 F 70
9 0
10 0
string length = 10

using OPTION CHARACTER MULTIBYTE seems will not change this behavior

Best Regards

Reply to #82868×

You can not use Wiki syntax
You are not logged in. To discriminate your posts from the rest, you need to pick a nickname. (The uniqueness of nickname is not reserved. It is possible that someone else could use the exactly same nickname. If you want assurance of your identity, you are recommended to login before posting.) Anmelden

Re: unicode string from a file have extra character (2019-04-23 12:59 by Shiraishi Kazuo #82874)

NotePad adds an extra character called BOM..preceding to the text .
Decimal BASIC 8.0 and BASICAcc 1.2 can not handle BOM, and BOM disturbs right execution.

The following program exposes the BOM.

OPTION CHARACTER BYTE
OPEN #1: NAME "test.txt"
INPUT #1: words$
for i=1 TO LEN(words$)
PRINT ORD(words$(i:i))
NEXT i
CLOSE #1
PRINT "string length = "; LEN(words$)
END
Reply to #82868

Reply to #82874×

You can not use Wiki syntax
You are not logged in. To discriminate your posts from the rest, you need to pick a nickname. (The uniqueness of nickname is not reserved. It is possible that someone else could use the exactly same nickname. If you want assurance of your identity, you are recommended to login before posting.) Anmelden

Re: unicode string from a file have extra character (2019-04-23 17:55 by eros #82878)

Reply To Message #82874
> NotePad adds an extra character called BOM..preceding to the text .
> Decimal BASIC 8.0 and BASICAcc 1.2 can not handle BOM, and BOM disturbs right execution.
>
> The following program exposes the BOM.
>
> OPTION CHARACTER BYTE
> OPEN #1: NAME "test.txt"
> INPUT #1: words$
> for i=1 TO LEN(words$)
> PRINT ORD(words$(i:i))
> NEXT i
> CLOSE #1
> PRINT "string length = "; LEN(words$)
> END

Thank you for the info about the BOM.
i have searched for a replacement of the native windows notepad and have found notepad replacements which can save to a normal text file with UTF-8 by default but without adding BOM. another editor give us the choice to Add a BOM or not to Add. so the problem is solved
Best Regards
Reply to #82874

Reply to #82878×

You can not use Wiki syntax
You are not logged in. To discriminate your posts from the rest, you need to pick a nickname. (The uniqueness of nickname is not reserved. It is possible that someone else could use the exactly same nickname. If you want assurance of your identity, you are recommended to login before posting.) Anmelden