nancylebov: blue moon (Default)
[personal profile] nancylebov
I was reading a discussion about C++'s failings and virtues (mostly the failings, but some people like it), and now I'm wondering-- why isn't there translation between computer languages?

I'm not a programmer, but from what I can gather my instincts for what can and can't be done by computers are fairly good.

In theory, all usable computer languages are Turing equivalent.

Afaik, the reason we don't have good machine translation for natural languages is that natural languages are highly and non-obviously contextual. Also, sometimes even authors aren't quite sure what they mean.

If all computer languages are logically equivalent (except, I suppose, for how deep they go into the hardware) and they aren't ambiguous, what's the problem?

Speaking of instinct, I don't know whether not very technical answers to my question are possible. The only thing I'm sure of is that computer translation between computer languages is remotely feasible, it would have happened by now.

Date: 2008-09-29 03:14 pm (UTC)
sethg: a petunia flower (Default)
From: [personal profile] sethg
Well, for any two sufficiently powerful languages LangA and LangB, it is certainly possible to write a compiler that translates any program in LangA to a program in LangB. (If nothing else, one could simply write an interpreter of LangA in LangB, and execute LangA programs on that interpreter.) And indeed sometimes this is done. The earliest C++ compilers, for example, originally translated C++ to C and then compiled the C.

The problems are

(1) Efficiency: an experienced LangA programmer will expect certain program constructions in LangA to provide certain costs and benefits in terms of execution time, memory usage, etc. LangB (if it's a decent high-level language) has its own constructions with its own complicated tradeoffs. It's unlikely that the author of the LangA-to-LangB translator can make sure that the LangA programmer's expectations are preserved across the translation.

(2) Readability: a program is not just for computers to execute, but also for humans to read. If I'm a LangA programmer in an office full of LangB programmers, and I write a beautiful LangA program and auto-translate it into LangB, it will probably look awful, and the LangB programmer who tries to fix one of my bugs by looking at the LangB version will be cursing me and all my relatives.

Date: 2008-09-29 03:17 pm (UTC)
From: [identity profile] nancylebov.livejournal.com
Thanks. I suspected that readability is a Hard Problem.

Date: 2008-10-01 04:49 pm (UTC)
mneme: (Default)
From: [personal profile] mneme
That's basically it -- Programming languages may appear like engineering, but they're not -- they're natural languages with engineering results as a side effect. The value of good code is frequently that it lets you encode concepts, and then take those concepts and do things with them, building a program something like how you build an essay.

If one doesn't care about efficiency, translation language to language isn't that hard (usually), but the natural language meaning is lost -- and that's where the value in most computer languages comes from in the first place; the ability to express concepts and to understand the same concepts from reading the code.

Date: 2008-09-29 03:23 pm (UTC)
From: [identity profile] dr-zrfq.livejournal.com
Some translators actually *do* exist. They've existed for at least 15 years. None of them work exceptionally well.

One point to make from my theoretical CS background: some problems really aren't solvable... and just because a problem *is* solvable doesn't mean it is solvable in a reasonable amount of time. We do know that the translation problem is solvable, but it may be that *good* translations take too long to be worthwhile. (I don't know; I haven't looked into this.)

Part of the problem is that there's more than one way to skin a cat -- at the machine language level, there are many ways to achieve the same result, and those ways are different in both obvious and non-obvious details. Part of it is that computer languages, over time, tend to develop a limited "idiom" just as natural languages do, just not to the same degree... and translating idiomatic stuff is always difficult. With certain language pairs, part of it is that the underlying paradigm is different (LISP vs SNOBOL vs ALGOL) so the approach to handling an issue is necessarily different, even though they are all fundamentally equivalent deep down. There are almost certainly parts of it that I don't understand well enough to really identify, let alone put into words.

I suspect that part of it is just the the Universe is perverse. ;->

Date: 2008-09-29 06:12 pm (UTC)
From: [identity profile] dichroic.livejournal.com
Yeah, this. We used a converter to translate FORTRAN to C when I worked at the Air Force Research Lab in 1006, but it still had to be gone over by a human afterward.

Date: 2008-09-29 04:44 pm (UTC)
siderea: (Default)
From: [personal profile] siderea
Choreographies are programs for human bodies to run; they compile down to something we call "dance". You are asking what amounts to, "if it all runs on human anatomy, why can't one teach a hip-hop class using only ballet terminology? Or a ballet class in terms of Irish step-dancing?"

Date: 2008-09-29 06:43 pm (UTC)
From: [identity profile] lpetrazickis.livejournal.com
1. It's not useful for computers. A compiler is basically a translator from a programming language into machine language. Translating from one language to another language first is inefficient because the second translation will have to deal with awkward, unnatural phrasing. This interferes with optimization.

2. It's not useful for learning. Manual translation is sufficient for sample programs if you want a different language as "pseudocode". Also, when actively writing code, the error messages of the original language are much more useful than the error messages of a different language you might know better, because you can actually act on the error messages of the original language.

3. It's not useful for editing. You can't translate C into Python, edit Python, and translate back into C any more than you can translate Russian into German, edit German, and translate back into Russian.

Date: 2008-09-29 06:59 pm (UTC)
From: [identity profile] nancylebov.livejournal.com
The thing I'm trying to get at is the question of what's hard about making comprehensible machine translation of computer programs.

Perhaps the high level answer is that you can't make a good translation of a computer program without understanding what the programmer was trying to do, so we're back at some of the problems of natural language.

Date: 2008-09-29 07:16 pm (UTC)
From: [identity profile] thaedeus.livejournal.com
There is also the problem of "closeness to the machine level."
Some languages don't have full access to memory and certain
areas of the "machine," while others do.

Java for instance runs in what they call a "sandbox," so it
can't accidently erase files or do dangerous things on the
machine it is running on.

C and C++ allows direct pointer access to memory.

It would be very difficult to do things in Java that are "easy"
in C++ (like direct manipulation of memory using pointers).

Some languages are close enough to each other that translation
is possible and useful. Some aren't.

Generally businesses have a large repository of code in
one language, with no time or resources to port it to
another language. And even if they could translate it
using a program, there are so many interdependencies that
human intervention would be probable.

Just because you have source code in C++ that you run through
a translator, doesn't mean you have source code to all of the
libraries it may be using. So when you are done, you have a
translated program that won't link or run....

And that is just part of the answer.....








Date: 2008-09-30 12:03 am (UTC)
From: [identity profile] lpetrazickis.livejournal.com
Variable names are hard. One language will let you do things using one variable that another requires two for. What will you call the second? var2 isn't helpful.

For example, swapping variable values in Python:
(a, b) = (b, a);

Compare C:
int var3 = a;
a = b;
b = var3;

That's a simple example. The name of "var3" doesn't need to be descriptive in this case.

But if you were going from procedural to functional, from logical to procedural, or even from routine-driven procedural to object-oriented procedural, you'd have to create entire data structures out of thin air. What would you call them?

Date: 2008-10-01 12:46 am (UTC)
From: [identity profile] henrytroup.livejournal.com
Microsoft has a Java to C# translator, iirc. Googling for "translate Java C#" turns up a fair number of translators.

A former employer implemented a "proprietary language to C" compiler - the output wasn't strikingly readable, but it compiled nicely. We used that to be agile across various flavours of Unix on different processors. Why bother writing a code generator and optimizer? Just use the one that's there.

In a deeper sense, every compiler translates a programming language into another - assembler/machine language. I've on occasion had need to dive into the generated assembler to diagnose compiler bugs - same employer as above, earlier generation of software. (In fact, that might have been part of the inspiration for the "compile to C" trick.)

At present, we don't really program in languages at all; we program in terms of frameworks (either .Net or J2EE in my world), and as other posters have said, that's a touch difficult. In the .Net world, there are an insane number of languages that compile to the same runtime, which is another assault on the same problem (as translation) from a different direction.

Long ago, the only definition of computer science I ever heard make any sense was something like "solving problems by the construction of a set of nested virtual machines." By this definition, translation is just substitution of one layer, and certainly should be possible.

However, anyone who's ever ported a decent sized system from one machine to another has a story or six about the ambiguities and gratuitous incompatibility that they had to deal with. Even porting from C++ (old cfront version) to C++ (ANSI standard) was hard, once upon a time.

My current code based generates a pile of warnings of the general nature of "X is deprecated, use Y instead". Sometimes that's easy; sometimes it's seriously hard. But it's never trivial, so we avoid doing it. When we're forced to, it's a pile of work to verify that nothing broke.

Date: 2008-10-05 06:22 pm (UTC)
From: (Anonymous)
I don't think anyone quite addressed the real point of the original question: if you're forced to use a lousy language, why not write in another language and translate?

It depends on why you're being forced. If you're forced to use C++, it's probably because it's part of a big project and you have coworkers who need to read the code, which won't be possible if it's been translated (or "cross compiled").

Cross compilation is a reasonable option if you can get everyone to agree that the original pre-translated version is the true version, and they know the language. But if each person works in a different language, which are all compiled to C++, everyone may need to learn all the languages. Plus there may be problems at the seams.

But the opposite situation can be solved by cross compiling: if you need many languages, it is often better to work in a single one and cross compile. eg, code in a web browser must be in javascript; but it may need to interact with code on a server, which may be rented from someone who only allows java. I believe there are many tools that allow you to write in one language, which is compiled to javascript for the browser and java for the server. A similar product is Google web toolkit, which compiles from java to javascript, with the purpose of using java tools to debug the code.

Finally, the language constraint may be an arbitrary obstacle in a programming contest. I recently heard about a contest that announced the language at the last minute. It was consistently won by a team that had one person work on translation and the others on solving the problem in their favorite language.

Date: 2008-10-23 06:43 am (UTC)
From: [identity profile] dglenn.livejournal.com
In the 1980s, I spent a while working on translating a graphics library from Pascal to C (or maybe it was the other way around; I forget). Given the size of the project, I kept looking for parts that I could automate with the tools I had on hand -- which at that time mostly meant Word Perfect macros.[*]

It can be more do-able than some folks here make it sound (*ahem* when both languages are as closely related as C and Pascal -- which are both Algol-family languages), but I do agree that even then it's not as easy as it sounds at first blush.

And I too have encountered language-to-other-language compilers, producing varying degrees of readability in their output code -- those are generally not designed to have humans try to maintain the target code after translation, just to simplify a compiler-writer's life, so they're technically what you describe but not, I think, quite what you meant. (I ran into early C++ compilers that were front-ends to C compilers, and the first Pascal compiler I encountered, a third-party compiler on the HP3000, was actually a Pascal-to-SPL translator, because the compiler author didn't know enough about how to generate HP executables directly and SPL was another Algol-family language that came with the system.)

Many language idioms and special language features can be translated into macros or library calls. Others present special challenges. And when the target language has a very different structure and feature set (e.g., not having pointers, not having functions, not supposting recursion natively, allowing constants to be accidentally redefined, only passing arguments by value ... or only passing arguments by reference, not having an equivalent of 'goto'), things get really thorny, really quickly, even if you're not trying to produce human-maintainable code.

Even when a language's idioms can be easily 'unwound', if there are many that need to be handled as special cases instead of just "translating the vocabulary and grammar", the task of identifying all of them that the translator will need to handle can be daunting.


Translating C++ to readable, human-maintainable C would be, I think, feasible even if somewhat harder than just generating something that a C compiler will compile ... translating from Ada to ForTran, or from Lisp to COBOL, or from BASIC to Prolog, would be, er, ah, might be a project that never gets beyond being able to handle trivial programs on the level of "hello world" or a basic cash register. (Unless one resorts to the "write an interpreter for the source language, in the target language" solution already mentioned.)

Interestingly, abstracting a working program to pseudocode and then reimplementing it in another language, while not being something I can really imagine automating for most pairs of languages, feels rather different from 'translating' a program from one language to another, even though it's the same end result. This may provide a clue as to the difficulty of machine-translation, in that it points to what others here have said about 'meaning' and human abstraction (and readability), versus the mechanical equivalence of two sets of instructions.


[*] This was quite a while before I stumbled upon somebody's oh-so-clever set of #define directives that would make a C compiler compile Pascal programs!

Date: 2008-10-24 06:24 am (UTC)
From: [identity profile] nancylebov.livejournal.com
Thanks-- a lot interesting in there.

On the first pass, I thought you were talking about translating a program into inaccessible-to-humans code which would then be used elsewhere, but you were actually talking about something less risky.

December 2025

S M T W T F S
 123456
78910111213
141516 17181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 26th, 2026 03:03 pm
Powered by Dreamwidth Studios