pisani Posted August 27 Share Posted August 27 I wanted to attach an example of a left outer join that forces the added variables to None—take a look at AMDA004 in particular—but I couldn't do it because the file is 400 MB. How can I transfer it to you? The behavior is inconsistent, but as you can imagine, in complex flows like the ones I work with, which involve dozens of joins, it can cause significant issues. I desperately need your help to resolve this. Thank you. P.S. I’m using the latest release of Rulex Platform. Want to know more? Link to comment Share on other sites More sharing options...
pisani Posted August 27 Author Share Posted August 27 The title was Join with abnormal behavior, but I couldn't prevent the translator from changing it. Same goes for "how can I transfer it to you," which was originally "how can I transfer the test file, which is 400MB, to you?" Want to know more? Link to comment Share on other sites More sharing options...
pisani Posted August 28 Author Share Posted August 28 I’ve managed to reduce the size of the flow that highlights the issue: the variables are added, but their values have been modified to None. errorejoinleftouter.rfl Want to know more? Link to comment Share on other sites More sharing options...
Davide Intiso Posted August 30 Share Posted August 30 Hi, If I understand correctly, you are wondering why, for example, there are empty values for the AMDA004 attribute in the results of the Join task. That is due to the data contents of the two datasets (left and right), and how the task has been configured. In particular, we can observe the configuration at these points: The "Left outer join" Join type will enforce the behavior explained in the documentation at https://doc.rulex.ai/docs/v12/factory/tasks/preprocessing/join.html: Left outer join: the final dataset includes all the records from the dataset on the left even if the join condition does not find matching records in the right dataset. If a row in the left-hand table does not match any row in the right-hand table the columns relative to the right-hand table will display empty cells. This means that if IDCENTRO and IDANA do not match respectively IdCentro and IdAna, then "the columns relative to the right-hand table will display empty cells". Thus AMDA004, which is on the right dataset, will contain empty cells for such rows. It is the case that some rows do not match between the two datasets for those attributes. For instance, notice how in the results AMDA004 is empty for IdCentro equal to 151 and IdAna equal to 2. As a matter of fact, if you inspect the right dataset and look for IDCENTRO equal to 151 and IDANA equal to 2, you will not find any rows. If you wish to obtain a different behavior depending on your needs, you could modify the Join type or other options accordingly. For more information on the Join task and its options, take a look at our documentation here: https://doc.rulex.ai/docs/v12/factory/tasks/preprocessing/join.html Let us know if you were referring to this problem and if you have other doubts. Want to know more? Link to comment Share on other sites More sharing options...
pisani Posted August 30 Author Share Posted August 30 Thanks. Your response made me realize that the issue is the disappearance of some rows that should be present in the left table. Specifically, 530 out of 1930 rows are missing. The disappearance is due to an Import from task, which, when re-executed now, imports all 1980 rows. However, when executed in batch mode, 530 rows were missing... Thanks again! I’m going to try to further investigate this problem. Want to know more? Link to comment Share on other sites More sharing options...
pisani Posted September 5 Author Share Posted September 5 However, I still haven’t found any explanation for the malfunctioning of that Import from task and of several other Import from tasks, though when I have recomputed them, they all worked correctly. Going back through the whole calculation process, I’ve noticed that the left outer join option behaves differently than it did in R4 when dealing with special cases, like having a key on the right table that is absent in the left table. Unfortunately, I haven’t been able to pinpoint exactly what the difference is, because I no longer have R4 and can’t do a comparative analysis on the individual join, but only on the final result of a very complex flow (400 Data Manager tasks, 200 Join task, etc.). Want to know more? Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.