How should I approach reverse engineering this text encoding?
So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!
However, various other message files and quest summaries and the like are not so convenient.
Here's a comparison of two files in a hex editor:
At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.
I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0
Going through it and looking for common and recurring words, I found this:
Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530
Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30
Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530
Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30
Grief
EA30 FC30 D530
Thief
B730 FC30 D530
So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"
Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830
And you don't see the EA30 that starts off "Grief".
I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.
The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.
Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?
encodings
add a comment |
So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!
However, various other message files and quest summaries and the like are not so convenient.
Here's a comparison of two files in a hex editor:
At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.
I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0
Going through it and looking for common and recurring words, I found this:
Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530
Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30
Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530
Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30
Grief
EA30 FC30 D530
Thief
B730 FC30 D530
So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"
Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830
And you don't see the EA30 that starts off "Grief".
I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.
The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.
Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?
encodings
1
I can't help but notice Grief screamer has an extraB030
at the start that isn't present in theGrief
line
– corsiKa
Dec 10 '18 at 16:15
add a comment |
So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!
However, various other message files and quest summaries and the like are not so convenient.
Here's a comparison of two files in a hex editor:
At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.
I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0
Going through it and looking for common and recurring words, I found this:
Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530
Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30
Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530
Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30
Grief
EA30 FC30 D530
Thief
B730 FC30 D530
So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"
Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830
And you don't see the EA30 that starts off "Grief".
I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.
The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.
Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?
encodings
So I'm trying to hack the translation from the PS4 version of a game into the Vita version. The script files were conveniently uncompressed, and I was able to drop them in and have it working without a hitch - great!
However, various other message files and quest summaries and the like are not so convenient.
Here's a comparison of two files in a hex editor:
At first I thought it was a simple byte-pair compression, but there's obviously more to it than that, because if you look at the places which correspond to "Let's go!" and "Let's do this!" they don't start with the same string of characters at all.
I uploaded the PS4/Vita versions of a file with rather more readable text:
https://www.dropbox.com/s/bw0nvexyi9ww2be/hm%20vita?dl=0
https://www.dropbox.com/s/sk74zadvndc8v9t/hm%20ps4?dl=0
Going through it and looking for common and recurring words, I found this:
Goblin Thief
´0Ö0ê0ó0·0ü0Õ0
B430 D630 EA30 F330 B730 FC30 D530
Goblin Thief Archer
´0Ö0ê0ó0·0ü0Õ0¢0ü0Á0ã0ü0
B430 D630 EA30 F330 B730 FC30 D530 A230 FC30 C130 E330 FC30
Ancient Grief
¨0ó0·0§0ó0È0°0ê0ü0Õ0
A830 F330 B730 A730 F330 C830 B030 EA30 FC30 D530
Grief Screamer
°0ê0ü0Õ0¹0¯0ê0ü0Þ0ü0
B030 EA30 FC30 D530 B930 AF30 EA30 FC30 DE30 FC30
Grief
EA30 FC30 D530
Thief
B730 FC30 D530
So you can see that FC30 D530 is "ief".
But then I look for more occurences of "gr"
Deep Grudge
Ç0£0ü0×0°0é0Ã0¸0
C730 A330 FC30 D730 B030 E930 C330 B830
And you don't see the EA30 that starts off "Grief".
I have a feeling FC30 could be some kind of switch byte, either an upper case indication or possibly marking the use of some kind of lookup table? It's also interesting that all the lines which are just objectives/boss names have the --30 structure, but some of the descriptive passages don't seem to.
The additional problem, of course, is that the uncompressed English text from the PS4 version won't be a 100% perfect match for the text from the Vita version -- that's the whole point, after all! So even when I look at the name "Deep Grudge" and notice that I don't see anything which looks like it would correspond to the "Gr" from "Grief", I can't be certain that they didn't change the name in the PS4 version.
Does anyone have any suggestions on how I should be approaching this? If I'm right and there's actually some kind of compressed lookup table business going on, then it might be effectively impossible to reverse, right?
encodings
encodings
asked Dec 10 '18 at 12:10
Celandine Crane
6814
6814
1
I can't help but notice Grief screamer has an extraB030
at the start that isn't present in theGrief
line
– corsiKa
Dec 10 '18 at 16:15
add a comment |
1
I can't help but notice Grief screamer has an extraB030
at the start that isn't present in theGrief
line
– corsiKa
Dec 10 '18 at 16:15
1
1
I can't help but notice Grief screamer has an extra
B030
at the start that isn't present in the Grief
line– corsiKa
Dec 10 '18 at 16:15
I can't help but notice Grief screamer has an extra
B030
at the start that isn't present in the Grief
line– corsiKa
Dec 10 '18 at 16:15
add a comment |
1 Answer
1
active
oldest
votes
The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:
1131 3210 3130 3330 1021 0000 0000 0000 .12.1030.!......
Translates to this on the right:
1100 3100 3200 1000 3100 3000 3300 3000 ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................
This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21
in ASCII or UTF-8 but 0x01ff
is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).
So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"
b430 d630 ea30 f330 b730 fc30 d530
is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:
ゴ go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu
So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.
Can you switch around your hex words to the correct endianness30b4 30d6 ..
? That way they form correct Unicode values for Japanese.
– usr2564301
Dec 10 '18 at 14:21
1
As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 '18 at 14:27
4
Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 '18 at 14:30
3
Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 '18 at 20:07
In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 '18 at 8:01
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "489"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2freverseengineering.stackexchange.com%2fquestions%2f20109%2fhow-should-i-approach-reverse-engineering-this-text-encoding%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:
1131 3210 3130 3330 1021 0000 0000 0000 .12.1030.!......
Translates to this on the right:
1100 3100 3200 1000 3100 3000 3300 3000 ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................
This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21
in ASCII or UTF-8 but 0x01ff
is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).
So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"
b430 d630 ea30 f330 b730 fc30 d530
is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:
ゴ go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu
So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.
Can you switch around your hex words to the correct endianness30b4 30d6 ..
? That way they form correct Unicode values for Japanese.
– usr2564301
Dec 10 '18 at 14:21
1
As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 '18 at 14:27
4
Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 '18 at 14:30
3
Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 '18 at 20:07
In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 '18 at 8:01
add a comment |
The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:
1131 3210 3130 3330 1021 0000 0000 0000 .12.1030.!......
Translates to this on the right:
1100 3100 3200 1000 3100 3000 3300 3000 ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................
This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21
in ASCII or UTF-8 but 0x01ff
is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).
So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"
b430 d630 ea30 f330 b730 fc30 d530
is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:
ゴ go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu
So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.
Can you switch around your hex words to the correct endianness30b4 30d6 ..
? That way they form correct Unicode values for Japanese.
– usr2564301
Dec 10 '18 at 14:21
1
As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 '18 at 14:27
4
Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 '18 at 14:30
3
Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 '18 at 20:07
In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 '18 at 8:01
add a comment |
The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:
1131 3210 3130 3330 1021 0000 0000 0000 .12.1030.!......
Translates to this on the right:
1100 3100 3200 1000 3100 3000 3300 3000 ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................
This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21
in ASCII or UTF-8 but 0x01ff
is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).
So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"
b430 d630 ea30 f330 b730 fc30 d530
is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:
ゴ go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu
So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.
The way you can learn this encoding is to study Japanese. :) From the first line of your text file diff, we can see that this on the left:
1131 3210 3130 3330 1021 0000 0000 0000 .12.1030.!......
Translates to this on the right:
1100 3100 3200 1000 3100 3000 3300 3000 ..1.2....1.0.3.0.
1000 01ff 0000 0000 0000 0000 0000 0000 .................
This is a very strong hint that the file on the left is using an 8-bit encoding and that the one on the right is using a 16-bit encoding. The digits are translated in a very straightforward way, but the exclamation point ("!") is 0x21
in ASCII or UTF-8 but 0x01ff
is the Unicode "fullwidth exclamation mark" encoded as UCS-2 (that is, UTF-16 encoded as little-endian).
So with the hint about encoding we can see that the hex you've identified as "Goblin Thief"
b430 d630 ea30 f330 b730 fc30 d530
is rendered as ゴブリンシーフ which is the Katakana representation of the English words "Goblin Thief". It's common for Japanese speakers to render foreign words into phonetic equivalents with Katakana. So taking it syllable-by-syllable we have:
ゴ go
ブ bu
リ ri
ン n
シ shi
ー (extended vowel)
フ fu
So if we say it aloud, "go bu ri n shi-i fu" sounds like "Goblin Thief" as pronounced by a native Japanese speaker. You can try Google translate to experiment a bit more and to see and hear what this sounds like.
answered Dec 10 '18 at 14:03
Edward
2,0081122
2,0081122
Can you switch around your hex words to the correct endianness30b4 30d6 ..
? That way they form correct Unicode values for Japanese.
– usr2564301
Dec 10 '18 at 14:21
1
As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 '18 at 14:27
4
Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 '18 at 14:30
3
Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 '18 at 20:07
In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 '18 at 8:01
add a comment |
Can you switch around your hex words to the correct endianness30b4 30d6 ..
? That way they form correct Unicode values for Japanese.
– usr2564301
Dec 10 '18 at 14:21
1
As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 '18 at 14:27
4
Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 '18 at 14:30
3
Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 '18 at 20:07
In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 '18 at 8:01
Can you switch around your hex words to the correct endianness
30b4 30d6 ..
? That way they form correct Unicode values for Japanese.– usr2564301
Dec 10 '18 at 14:21
Can you switch around your hex words to the correct endianness
30b4 30d6 ..
? That way they form correct Unicode values for Japanese.– usr2564301
Dec 10 '18 at 14:21
1
1
As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 '18 at 14:27
As noted in the answer, these are Unicode values that just happen to be encoded as UCS-2, also known as UTF-16LE
– Edward
Dec 10 '18 at 14:27
4
4
Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 '18 at 14:30
Wait, this is the bloody Japanese file? That's odd. I need to dig through the package and see if I can track down the actual English one then... I know for a fact that some of the files I extracted are the English ones because they have actual plaintext in them, which is why substituting them worked. Plus I'm 99% certain the package I have is from the SCEA version of the game. Hmmm. Thanks ever so much, that explains a lot that was driving me nuts!
– Celandine Crane
Dec 10 '18 at 14:30
3
3
Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 '18 at 20:07
Follow up note for anyone who was curious: the English text is actually stored in a completely separate file that's unique to the PS4 version of the game and is presumably pulled from to override the Japanese strings which exist in the base files. God only knows if there's any way to convince the Vita build to read from there.
– Celandine Crane
Dec 10 '18 at 20:07
In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 '18 at 8:01
In other word, there is no 'special encoding' to reverse engineer...
– user202729
Dec 13 '18 at 8:01
add a comment |
Thanks for contributing an answer to Reverse Engineering Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2freverseengineering.stackexchange.com%2fquestions%2f20109%2fhow-should-i-approach-reverse-engineering-this-text-encoding%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I can't help but notice Grief screamer has an extra
B030
at the start that isn't present in theGrief
line– corsiKa
Dec 10 '18 at 16:15