Thảo luận Thành viên:Laurent Bouvier

Từ điển mở Wiktionary
Buớc tưới chuyển hướng Bước tới tìm kiếm

New message …

Go to the last discussion…

Import
Archives

This page has been a bit reorganised… Many thank to take it into account

EditCountOptIn[sửa]

Pronunciation (Open)[sửa]

The entry grecize gives an oddly incomplete pronunciation. I wonder if this is a widespread issue? – Nguyễn Xuân Minh (thảo luận, đóng góp) 06:38, ngày 3 tháng 7 năm 2006 (UTC)

It seems to be quite spread. That's the problem with the free source. They are not 100% sure ...

Gazetteer (Open)[sửa]

I've been looking at some databases we could use to turn this Wiktionary into a gazetteer: the US military has a very large database of placenames for the entire world (except for the US and Antarctica, which can be found at another US government agency). [1] This database provides placenames in correctly written Vietnamese, as well as coordinates, elevation, and type of placename (city, province, etc.). It has over 48,000 Vietnamese records and 5.7 million records for the whole world!

Other Wiktionaries, such as the Spanish Wikcionario, are including gazetteer entries already. Obviously including the entire database would ridiculously overshadow the dictionary side of Wiktionary, but I think including at least the Vietnamese placenames would be worthwhile, since I haven't seen any Vietnamese census data on the web so far – if we found that kind of data, we could make Wikipedia articles instead.

Using this database will require a lot of planning, since we have room to do a lot here, including generating dotmaps, extracting translations from Wikipedia interlingual links, and transliterating Chinese and Korean placenames. (Unfortunately, these databases only contain Romanized names for China and Korea, but we might be able to use a different database for the names, since these databases all contain FIPS codes.)

 – Nguyễn Xuân Minh (thảo luận, đóng góp) 08:04, ngày 24 tháng 7 năm 2006 (UTC)

I am not sure of what you have in mind. Personally, I think that the cities and locations' names should rather be on wikipedia than here unless there is a philological aspect (history of the word, translations etc.) Laurent Bouvier 23:20, ngày 24 tháng 7 năm 2006 (UTC)
That's what I have in mind. A lot of "substubs" that currently exist at Wikipedia really belong here, because they only give a short definition. Placing such articles here would encourage the addition of pronunciation, etymology, and translation data. Initially, gazetteer entries on Vietnam would contain a short definition, pronunciation (thanks to Trung's work), and (in some cases) translations based on interwiki links at Wikipedia. This could be a solution for the "articles" on non-notable Vietnamese localities that sometimes flood VfD.
As it turns out, at least one mirror of the FVDP has a "Hán-Việt→Việt" dictionary, which contains quite a few placename entries. For example, [2] gives the geographical location of an apparently notable temple in China, its name transliterated into Vietnamese. This entry also gives citations in classical literature. This dictionary also provides biographical entries that would be just long enough for Wikipedia articles, as well as short passages (quotations?) that might go in Wikiquote. The only concern I have is that, unlike the dictionaries we have been using, the Hán-Việt→Việt dictionary provides no copyright info, and I don't know if this is even Hồ Ngọc Đức's work.
 – Nguyễn Xuân Minh (thảo luận, đóng góp) 04:09, ngày 25 tháng 7 năm 2006 (UTC)
Actually, I had purposely removed those articles out of my import list as I considered them as proper nouns to be move to wikipedia. I ll check it out. Laurent Bouvier 06:20, ngày 25 tháng 7 năm 2006 (UTC)
Proper nouns aren't the sole domain of Wikipedia: in fact, the introductions to many articles at the Vietnamese Wikipedia are so overburdened with translations or etymologies that it takes awhile just to get past the first sentence. I think this issue could be resolved by moving to Wiktionary any linguistic information that isn't important for understanding the article's subject. – Nguyễn Xuân Minh (thảo luận, đóng góp) 06:59, ngày 25 tháng 7 năm 2006 (UTC)

Unihan[sửa]

The Unicode Consortium hosts Unihan, a very large database of Chinese, Japanese, and Korean characters with a variety of information, including in many cases a transliteration into Vietnamese. (Note that these transliterations aren't actually translations, since very many of them map to archaic Vietnamese words used only in poetry and formal writing.) The English Wikipedia has already included this data; see for example. Now that we contain a large number of entries on Vietnamese, French, English, Russian, and Dutch words, it'd be nice to add CJK characters using this database. Unihan comes with a plaintext file that we can use. – Nguyễn Xuân Minh (thảo luận, đóng góp) 04:17, ngày 25 tháng 7 năm 2006 (UTC)

I have noticed this upload on the French and the English Wiktionary. From my point of view, this was more an opportunity to increase the number of items than to provide a read content. From my point of view, to add makes sense if it is to describe the word in chinese or japanese but not as a character or we have to make a description of every letter of every alphabet in the Unicode norms (to be consistent) Laurent Bouvier 16:40, ngày 25 tháng 7 năm 2006 (UTC)
This would probably make more sense for the Vietnamese Wiktionary, however. I'm not so much interested in all the technical information provided as I'm interested in the transliteration into Hán-Việt (classical Vietnamese, if you will). These transliterations are more meaningful to Vietnamese speakers than Romanizations are to English or French speakers, because educated Vietnamese speakers know a great many of these transliterations (not the original characters), and they are still considered part of the Vietnamese language. I'm not advocating the addition of CJK entries as character descriptions but as word descriptions.
Alternatively, Hồ Ngọc Đức once remarked on the FVDP forum that the author of a Hán-Việt dictionary (with very useful transliterations and definitions) gave him the contents of that dictionary, but since it was in HTML form (not machine-readable), he wasn't able to use it for his project. Perhaps we can make something of it.
 – Nguyễn Xuân Minh (thảo luận, đóng góp) 17:36, ngày 25 tháng 7 năm 2006 (UTC)
Can someone contact the owner of this database to verify the compliance of the ability to use it and GFDL? Laurent Bouvier 11:57, 16 tháng 8 2006 (UTC)
The author of this dictionary (it isn't actually a databse) was one of the people whom Trung contacted earlier this month. Đặng Thế Kiệt has expressed some degree of interest in our project, but would like to wait about six months and see how things progress before working with us. Trung described one of Kiệt's responses here. Apparently that's "the most positive response" we've gotten from these e-mails. – Nguyễn Xuân Minh (thảo luận, đóng góp) 22:29, 17 tháng 8 2006 (UTC)


Japanese→Vietnamese dictionary[sửa]

The FVDP mirror at Nhatban.net has a Japanese-to-Vietnamese (Nhật-Việt) dictionary that claims to have "more than 175,000 entries with definitions". [3] In fact, the useful entries seem to start at entry 17798, with the earlier entries being Japanese-to-English entries and Gregorian-to-Japanese year conversions. But the actual Japanese-to-Vietnamese entries do seem worthwhile. Note that some of these entries are listed under their Romanizations, though most seem to be listed under their proper Japanese transcriptions. Definitions are separated by slashes (/). To complicate matters a bit, many entries are composed of an English translation, usually followed by the Vietnamese translation, and I've encountered some entries without any Vietnamese definitions. Still, the addition of Japanese entries to this wiki would be nice. Do you think this particular database would be worth looking at? [4] doesn't have any copyright information, so someone would have to contact the author. – Nguyễn Xuân Minh (thảo luận, đóng góp) 04:40, ngày 26 tháng 7 năm 2006 (UTC)

My concern is also the license of this db ... (Compatibility with GFDL) Laurent Bouvier 05:34, ngày 26 tháng 7 năm 2006 (UTC)

Cool, it looks like this database also includes the Hán-Việt (Chinese-to-Vietnamese) dictionary I mentioned earlier. [5] – Nguyễn Xuân Minh (thảo luận, đóng góp) 04:42, ngày 26 tháng 7 năm 2006 (UTC)

I am sorry to tell you that but as I can speak a little of japanese I can confirm that this db entry contain a japanese word. Indeed, as the rest of Asia, the japaneses have imported a lot of chinese word and adapted the pronounciation. if you look at the entries following your (延々 | 延々たる | 延いては | 延ばし板), they contains kanjis as well as kanas. Laurent Bouvier 05:34, ngày 26 tháng 7 năm 2006 (UTC)
By the way, Trần Thế Trung sent e-mails to the authors of four different online dictionaries (2 Japanese→Vietnamese, 2 Sino-Vietnamese→Vietnamese) last Friday, asking for permission to use their databases. He CC'd me, but I haven't heard from any of them – or from Trung – since then. Hopefully they're just behind on their e-mail, like I am. :^) – Nguyễn Xuân Minh (thảo luận, đóng góp) 03:46, 8 tháng 8 2006 (UTC)
This may be a good time to send a reminder ... Laurent Bouvier 11:58, 16 tháng 8 2006 (UTC)

Links[sửa]

There seems to be a problem in Wikifying when a word starts with a capital letter. They're always linked per syllable then. David Da Vit 13:59, 31 tháng 8 2006 (UTC)

Any example ? Laurent Bouvier 17:47, 31 tháng 8 2006 (UTC)
Well, I noticed that the words Cơ Đốc (Christ, Christian) and Cơ Đốc giáo (Chistianity, Christian) are always linked as Đốc and Đốc giáo, even though we have an entry on Cơ Đốc giáo (don't have one on Cơ Đốc yet). – Nguyễn Xuân Minh (thảo luận, đóng góp) 19:34, 31 tháng 8 2006 (UTC)

For some reason all compound words with "ý" are only half-linked... Like "ý kiến" here. David Da Vit 14:23, 20 tháng 9 2006 (UTC)


Duplicated interwiki links[sửa]

Apparently in some cases, the interwiki links are duplicated, when PiedBot adds definitions. Example [6]. Trần Thế Trung 09:35, 29 tháng 9 2006 (UTC)

I have noticed that also; it happens when it is doing two updates for two different languages. In your example, it has copied the interwiki links for the French items and the one for the English ones.

That's stupid. Laurent Bouvier 12:18, 29 tháng 9 2006 (UTC)

Improvement on automatic images[sửa]

Some cases, automatic image addition from Commons is more than excellent. However the mere matching of image name, though simple, can completely fail in other cases. I would like to suggest another method: if a (say English) word has corresponding Wikipedia article, and the first image used in that article comes from Commons, then we can use this image for our article, in the corresponding language section (in this case English). This can have higher chance of success (but may not as simple to implement?).

Example, the image I added to platyhelminth is from the corresponding English Wikipedia article. This example is not canonical, the file name comes from the code "image = Haeckel Platodes.jpg" inside article source (not [[Image:Haeckel Platodes.jpg]]). Trần Thế Trung 14:23, 29 tháng 9 2006 (UTC)

Thats's nearly, what I have in place. I am first checking in the wiktionary in the language of the article for a picture and only if there is none, I search in Commons. I have not copied all wikipedia locally on my PC that's why I did not use them. Laurent Bouvier 19:12, 29 tháng 9 2006 (UTC)
Instead of copying Wikipedia onto your computer, couldn't you use query.php? I'm not sure how good the performance on that is, but it lets you perform batch queries, so you can get the source text of multiple articles at a time. They have a command to list all the links or categories used on a page, but still no way to just list the images used – maybe a Bugzilla report is called for. – Nguyễn Xuân Minh (thảo luận, đóng góp) 20:04, 29 tháng 9 2006 (UTC)
I have done a special program for that. This is only a question of laziness. ;-) Laurent Bouvier 20:15, 29 tháng 9 2006 (UTC)

It's really tricky to add images directly from Wikipedia editions, because the captions often end up inaccurate or unhelpful. For example, động vật (animal), has a picture of a jellyfish from the Vietnamese Wikipedia. That's not a problem, except that the caption reads "animal". There are other examples, such as AIDS, which calls the symbolic red ribbon "AIDS". I could imagine a more confusing scenario: the entry on soldier might have a yellow ribbon, for example, and it would be labeled "military".

It would be more helpful if you could not only extract the image used but also the caption that comes with it. Otherwise, if it's too difficult to extract a good caption, you can always omit the thumb modifier and specify a size of 180px instead.

 – Nguyễn Xuân Minh (thảo luận, đóng góp) 04:09, 24 tháng 10 2006 (UTC)

That's tricky, I was checking the picture and as I am no speaker I found only the issue that Wikipedia is upper case unsensitive created with picture import and a picture of England in the item anh. Laurent Bouvier 14:58, 24 tháng 10 2006 (UTC)

End of automatic loads[sửa]

I have to announce that the load of the Free Vietnamese Dictionary Project is completed. Therefore I will spend a little time here to fix a couple of issues ... and there go back to the French wiktionary (to pass over the Vietnamese one again ;-) )Laurent Bouvier 13:23, 21 tháng 10 2006 (UTC)

Thanks so much (again) for all the work you've done here... in just four months or so! We'll be toiling away here for some time, I suppose. Trung already came up with a list of things for his bot to do, and I'll look into the Unihan database to see if we can use any of the information there. But now that we've finished the big entry imports, we can focus on quality. Good luck! – Nguyễn Xuân Minh (thảo luận, đóng góp) 20:45, 21 tháng 10 2006 (UTC)
We are welcome. I remember that we discussed already of importing data out of the Unihan DB. I was thinking that a transfer from the English or the French wiktionary could be suffisient. Especially the French one has a set of two templates used in a special section character which could be easily transfered. Now, the question is more the relevance... Laurent Bouvier 13:00, 22 tháng 10 2006 (UTC)
I'm certainly not planning to import all the miscellaneous data that the English Wiktionary did; instead, I'd like to import the relevant information only. For example, most of the entries contain pronunciations or romanizations of the characters in Chinese, Japanese, or Korean; a lot of them contain character decompositions (radicals and stroke numbers) and lists of compound words; and quite a few offer translations into Vietnamese (under the "Vietnamese Quốc ngữ" section). As for all of the "Dictionary information" and "Technical information", I might just stuff that into an infobox or leave it out. That's similar to what the French Wiktionary has done, but entries like still look more like lists of codes rather than dictionary entries. If we end up importing these entries from the French Wiktionary, would we be using a bot or Đặc biệt:Import? A bot would be preferable, so that we can change the template names and reorganize the entries a bit. – Nguyễn Xuân Minh (thảo luận, đóng góp) 21:32, 22 tháng 10 2006 (UTC)
If you want to prepare a couple of examples based on the French items or Unihan DB I can verify whether this is automizable or not. Laurent Bouvier 09:45, 23 tháng 10 2006 (UTC)

Russian declension[sửa]

Looks like there's still one Russian declension template left to create: {{rus-noun-f-1d}}. Unfortunately I'm not an expert in declension. – Nguyễn Xuân Minh (thảo luận, đóng góp) 21:44, 25 tháng 11 2006 (UTC)

Norwegian copyright[sửa]

Hi Laurent, glad to see you again. While I was compiling this page, I discovered that the Norwegian→Vietnamese dictionary is copyrighted, and that Hồ Ngọc Đức just converted it to a DICT file. [7] Did we ever get permission to use this database? If not, I'll talk to Trung about asking permission from Tran Ly San. Since they already incorporated the dictionary into Đức's project, they might be willing to license the work. Hopefully. (Otherwise, we'll have to do a mass deletion...) – Nguyễn Xuân Minh (thảo luận, đóng góp) 04:51, ngày 10 tháng 12 năm 2006 (UTC)

It sounds bad. I have not requested any special permission as it was part of the same package ... sounds sad. Laurent Bouvier 23:05, ngày 12 tháng 12 năm 2006 (UTC)
Okay, I've asked Trung to contact them about the database. His Wikipedia talk page says he's on a wikibreak, but he's been checking back periodically. If it comes to deleting the entries, we should get in touch with the developers, who can mass-delete the records from Wiktionary's database, rather than deleting all the entries ourselves. – Nguyễn Xuân Minh (thảo luận, đóng góp) 06:03, ngày 14 tháng 12 năm 2006 (UTC)
Any news? Laurent Bouvier 09:18, ngày 17 tháng 2 năm 2007 (UTC)
No, unfortunately Trung's been away since December, and I'm not confident enough with my Vietnamese to write to them. Hopefully Trung will come back soon; otherwise I'll have to try... – Nguyễn Xuân Minh (thảo luận, đóng góp) 20:06, ngày 17 tháng 2 năm 2007 (UTC)

Russian grammar mistakes[sửa]

Although as a native speaker of Russian I appreciate the interest to Russian, I noticed quite a few mistakes in the Russian grammar. Sometimes I correct them. Anyway, it seems consistently ignoring that ending in the Russian nouns are not simply added to the end, sometimes an ending should be dropped before an ending is added, like plural of скачка is not "скачкаи" but "скачки", a similar situation is with verbs. Russian grammar is complicated and if you don't have a good knowledge or good templates, perhaps you shouldn't use sample sentences with incorrect grammar or inflected forms? Not trying to be harsh. You're welcome to ask any questions at my account at English Wiktionary (User:Atitarev@English Wiktionary). --Anatoli (thảo luận) 04:11, ngày 25 tháng 6 năm 2013 (UTC)