Archive for the ‘1. Work’ Category

Upwork encourages obfuscatory verbiage

Monday, July 27th, 2015

Upwork, a major service that arranges contractual work, has just issued some advice for clients communicating with workers across language barriers:

You may be working with a freelancer for whom English is a second (or third!) language. Use clear and straightforward communication by avoiding metaphors and easy-to-understand vocabulary.


Testing Der Mundo

Monday, September 23rd, 2013

This blog and the website of which it is a part are in English, which is a minority language (as are all other languages in the world).

Various services are available that could translate it into other languages.

With this entry, I am testing one of those services. It is named “Der Mundo”.

Here’s how it works. I give you a special address for this website, and you visit the site using that address. Der Mundo intercepts your request and detects your preferred language on the basis of the information provided by your browser. Der Mundo then submits the page content to a translation service, gets the translated output, and sends it to you in lieu of the original page.

Does it work? Let’s try it. The special address is

One problem, though: If your favorite language is English, then nothing should happen, so, in that case, to test Der Mundo you should temporarily tell your browser or your operating system that your preferred language is something else.

Good riddance, Android USB Utilities

Sunday, August 19th, 2012

My switch from an iPhone to an Android smartphone was motivated in part by the fact that Android permits a device to act like a storage volume (e.g., disk drive). Until a month ago, that worked, but it was awkward, requiring several steps with a setting named “USB Utilities” and the USB cable. As of the current version of the Android operating system, however, just connecting the device to a qualifying computer with the USB cable is enough. The computer recognizes the device, as it would recognize a USB disk or thumb drive. If the computer is a Macintosh, the user must first download and install the Android File Transfer application, but that’s an easy one-time prerequisite.

Another part of the reorganization was to unify the file systems. So, when you plug your device in, you see more directories to read from or write to than was true before.

I mention all this here because I was initially mystified by the change and looked all over for an answer before ultimately telephoning the helpful service folks at CREDO Mobile. So the information could use more exposure, and this posting is one small attempt to provide that.

Charging money for standards

Sunday, August 5th, 2012

The conflict is obvious: Some organizations want everybody to adopt standards, because this makes those organizations more effective. Nonetheless, after collaborating to define and publish standards, such organizations agree to prohibit anybody from getting a copy without paying money for it.

Sure, if I can get you to comply with my standard and also to pay me money to discover what my standard is, why not? But the world isn’t so docile. If I charge you money to see my standard, you are likely to decline the offer and thus ignore and violate the standard.

17 years ago experts were complaining about this conflictual practice of computing-standard creators. And they were only echoing an argument that has been made in the courts since the 19th century: It is in the public interest for the law to be known by all, so that it can be obeyed by all; therefore, the authors of texts that are incorporated into laws may not prohibit, or charge money for, the duplication of those texts. See Veeck v. Southern Building Code Congress International, Inc., for a 2002 statement of that argument.

Today the International Organization for Standardization (ISO) continues to finesse this issue by claiming, with no substantiation, that it can (1) limit the distribution of its standards by charging money for copies and enforcing copyrights while (2) “making sure standards are implemented as widely as possible”.

Nonsense. ISO’s conduct fits a different model: It’s a consortium (or perhaps cartel) of members who derive competitive advantage from their compliance with a standard, and are therefore willing to pay the price of access to (and, indeed, participation in the creation of) that standard. You don’t directly comply with the standard; instead, you buy standard-compliant products from consortium members and thereby enjoy the benefits of interoperability among those members’ products. You don’t need or want to know what the standard is or how it works.

Nonetheless, groups without any obvious competitive purposes (such as linguists who promote uniform language-documentation practices) continue to seek ISO sponsorship of their standards. Doesn’t this status, which prohibits the open publication of the standards, do them more harm than good?

Announcing PanLinx

Friday, June 1st, 2012

PanLinx is the latest experimental interface for PanLex. You can try it now at

In case you asked “What is PanLex”, here’s a quick answer: It’s a database that aims to include every known translation from every word (or dictionary-type phrase) into any language in the world. We’re talking about potentially hundreds of millions of words and trillions of translations.

The PanLex project is sponsored by The Long Now Foundation in San Francisco. The project’s main activity is building the database, but incidentally we have created some interfaces to give people (and machines) access to it. Before PanLinx, the interfaces relied on forms to be completed by users (fill in a text field, click on a button, etc.). This meant that most of the data would be invisible to most search engines, since search engines generally follow links and don’t fill out forms. We decided to create a different, link-only interface that would allow search engines to navigate across the database and reach data about millions of words and their translations. In principle, then, if you entered some obscure word in a search engine, you might be taken to the PanLinx page about that word.

For example, if you entered “bangunan” in a search engine, the hits would include, a page showing all of PanLex’s translations of that (Malay) word, because the search engine would have crawled the links from the main PanLinx page to its millions of subsidiary pages and indexed them all.

Millions of pages? Yes, roughly 18 million at present. But PanLinx isn’t really a collection of 18 million pages sitting on a disk drive. As systems go, it’s a very small system, with a home page containing about 260 links, plus a program (about 100 lines of code) that regenerates that home page periodically to incorporate additions to the database, plus another program (less than 200 lines of code) that creates a new momentary page (also containing about 260 links) whenever anybody clicks on any of those links, and so forth.

Will search engines actually fall for this trick? Well, from our perspective, it isn’t a trick. PanLinx delivers real information about translations among millions of words in thousands of languages. The mission of search engines is to get people to the information that they want. We don’t know which search engines will crawl how far from the root to the leaves of the PanLinx tree, but 3 days after PanLinx went live Google was already showing some hits 2 hops into the tree. Search engines are somewhat secretive about their rules. PanLinx gives us a platform to experiment with methods of making PanLex data findable through search engines. And, even though we built PanLinx primarily with search engines in mind, you are free to explore it yourself. If you have anything to report (such as “I converted PanLinx into a parlor game”), please comment below. Thanks.


PanLex joins Long Now Foundation

Monday, February 27th, 2012

Today’s announcement by The Long Now Foundation, headquartered in San Francisco, makes public the transfer of sponsorship of the PanLex project from Utilika Foundation to Long Now. There, PanLex will be working in partnership with The Rosetta Project, which curates a massive collection of documentation on the languages of the world. PanLex is creating a database that aims to document every known translation of every word in every language in the world. There are about half a billion translations in it so far.

Google Translate hits 64 with Esperanto

Saturday, February 25th, 2012

Google announced two days ago the addition of Esperanto as the 64th language served by Google Translate.

A quick test suggests that Esperanto is in some cases working a bit better than French, German, or Russian. Here’s a sentence from the home page of the PanLex project: “They dread a world in which only English, only Mandarin, or only Hindi has survived.” Here are the translations:

French: “Ils redoutent un monde dans lequel Hindi seulement l’anglais, seulement le mandarin, ou seulement a survécu.”

German: “Sie fürchten eine Welt, in der nur Englisch, nur Mandarin, Hindi oder nur überlebt hat.”

Russian: “Они боятся мира, в котором только на английском языке, только китайском, хинди или только выжила.”

Esperanto: “Ili timis la mondon en kiu nur angla, nur mandarena, aŭ nur hinda postvivis.”

Not perfect, but Esperanto seems to have escaped a weird parsing error that corrupts the others.

PanLex, Copyright, and Licensing

Friday, October 7th, 2011

PanLex is a compilation of lexical data and a set of procedures facilitating the interrogation and modification of the data.

In other locations I have commented on the issues of intellectual property that can arise from a project such as PanLex. These other comments include “PanLex as Intellectual Property”, “Source Citation in PanLex”, and a paragraph in my report to the 1 June 2011 meeting of the Utilika Foundation board of directors, where I wrote:

Intellectual-property claims impose some limits on the expansion of PanLex. The creators of some resources assert rights that, taken literally, would prohibit a person reading a resource from later even making use of what he or she had learned from it. Other resources are in the public domain. Between these extremes, many resources have been published subject to explicit or implicit copyright and various claims and restrictions, including various copyleft-type licenses and prohibitions of commercial use. The above-mentioned metadata that we record for resources used, or to be used, for PanLex include data on intellectual-property claims and permissions. In directing the PanLex project I take such claims into account, insofar as they appear to be understandable and enforceable, but, in most cases, I believe the owners of lexical resources could not prohibit the foundation from recording in PanLex information contained in those resources. This belief is based on the understanding that what we do with a resource is to record some of the facts asserted in it, in a novel (recoded, normalized, structured, interoperable) form. (In other words, PanLex doesn’t copy source X, but instead tells the world that some user of PanLex who has consulted source X claims that source X either states or implies that word Y is a translation of word Z.) In addition, I believe that PanLex typically advances the purposes of a contributing resource’s creator by making the facts contained in the resource more accessible and usable and referring users of those facts to the original resource for more detailed information. Until now, no claimant has asked us to remove facts based on a resource from PanLex. Some (e.g., LINCOM GmbH and SIL International) have expressly approved our use in PanLex of some or all of their data. However, some possessors of resources have demanded payment for providing easily processable versions for use in PanLex, and others have refused to provide such versions at all. The inclusion of funds for legal services in the 2012 budget reflects an assumption that intellectual-property issues, as well as contractual issues more generally, will likely become more complex as the PanLex project progresses.

One of the related issues is the protection of databases as compilations. A discussion of this issue by Daniel Tysver describes the competing originality and effort criteria for making a database copyrightable. Some compilers of databases in some jurisdictions have found copyright claims unenforceable because their databases were unoriginal. Telephone directories are the classic example.

Now there is litigation on another type of arguably unoriginal database: a collection of data on time zones that much of the world has come to depend on. There is much discussion on the merits of the claim. The outcome of this lawsuit may further clarify the limits of copyright protection on data like those in PanLex.