StreetFerret

You are not logged in. Would you like to login or register?



9/11/2020 2:55 pm  #1


Character encoding problem for international street names

Hi, thanks for creating this service!

There's a bug that I think may be caused by incorrect character encoding. In my municipality in Norway there are a lot of street names using the three extra vowels (æ, ø, å) in the Norwegian alphabet.

All the street names are correct in OpenStreetMap so I'm guessing that at some point when the names were imported from OSM something went wrong.

For example, instead of Dølihagen it says: D��lihagen

The odd thing is that StreetFerret has registered activity for some streets with this problem but other streets with this problem have not registered my activity. But maybe that's a separate issue?
 

 

9/17/2020 12:48 pm  #2


Re: Character encoding problem for international street names

Hi - sorry for the delayed response.  Thanks for the report on character encoding.  I'm have to do some research to figure out why those aren't rendering properly.  In any case, that's strictly a display problem which should be solvable.

If there's a specific problem with not registering an activity, please send the details so I can take a look.

Thanks!

 

12/07/2020 10:18 am  #3


Re: Character encoding problem for international street names

Hi,
I found strange doubles, two entries for the same street, e.g. Agavägen in Lidingö, where one has the character encoding problem,
(I'm not allowed to post links yet it seems, so Iäll try to mask it)
www-dot-streetferret-dot-com/sf/street?u=1739334&c=398035&s=20220460
And the second one is spelled correctly
www-dot-streetferret-dot-com/sf/street?u=1739334&c=398035&s=37092406

Now looking at these two, it's partly a new built district. So my guess is that the first one has existed a long time , with the old encoding problem. Then when the new part was added to Openstreetmaps and streetferret, the encoding problem was solved, and it registered correctly, but as a "new" street., instead of an extension to the old one
I can see several examples of similar streets in Lidingö, some of them seems to be streets I have recently updated in OSM, so the same thing happened here
 

 

12/07/2020 6:37 pm  #4


Re: Character encoding problem for international street names

Hi - I did some work some time ago to fix how international characters were stored in the back end, so your theory of stale data may be correct.  Can you test for me -- for that city, press the "report a problem" link and select "report missing streets".  That will force a reload of that city's street data.  Once the city re-appears in your list, it should hopefully come back with the right encoding.  If not -- let me know and I'll dig deeper.

 

12/08/2020 7:44 am  #5


Re: Character encoding problem for international street names

Yes, I tested your fix, and it seems it solved that problem, no more duplicates. But there s still something wrong in the  encoding, and it seems that problem was fixed earlier, but now reappeared, from my example:
Yesterday we had two names for this road:
The first bad one: 
Agav��gen
The second correct one
Agavägen
 
But now there is only one, with a new encoding error:
Agavägen
 
So it seems now all swedish letters ÅÄÖ get a new bad encoding
Another example
BÃ¥gvägen instead of Bågvägen
(Street ID 40558526)
 
This is not any show stopper, but a bit annoying, especially since it seems it worked for a while

 

1/02/2021 1:04 pm  #6


Re: Character encoding problem for international street names

Hi - Thanks for the reports, I'm pretty sure the problem is now fixed for new data.  The system is currently re-loading about 15,000 cities which should take a few days.  Once that is complete, if you encounter any additional cases where the wrong character encoding is showing, please let me know.

 

1/03/2021 5:30 am  #7


Re: Character encoding problem for international street names

Hi! Yes, looks really good now for my hometown Lidingö, but I was confused as to why the street count had gone done with one street. Had to do quite a lot of digging until I found the reason: 
https://www.streetferret.com/sf/street?u=1739334&c=398035&s=42055425
This is actually two streets that you have combined into one. We have both Stamstigen and Stämstigen here, two streets some km apart, but it seems your code considers a and ä the same letter, and thinks the names are identical  (stam means tree trunk, and stäm means voice https://cdn.boardhost.com/emoticons/happy.png
. And stigen is a small road )

 

1/03/2021 10:39 am  #8


Re: Character encoding problem for international street names

Hmmmm, I've got specific code to do case-insensitive matching for street names, but apparently it goes a bit too far.

The library documentation says "Before keys are added to the map or compared to other existing keys, they are converted to all lowercase in a locale-independent fashion by using information from the Unicode data file", so clearly something has gone wrong there.  It does look like they're correctly labelled in OSM.

 

1/03/2021 11:14 am  #9


Re: Character encoding problem for international street names

Yes, I checked OSM, so that's correct...

 

1/03/2021 11:24 am  #10


Re: Character encoding problem for international street names

But actually, this might be a very uncommon problem... I’ve run both these streets, so if it’s very complicated to fix, don’t waste too much work. I can live with it😁

 

Board footera

 

Powered by Boardhost. Create a Free Forum