Jump to content

Join task with unexpected behavior


Recommended Posts

I wanted to attach an example of a left outer join that forces the added variables to None—take a look at AMDA004 in particular—but I couldn't do it because the file is 400 MB.

How can I transfer it to you?

The behavior is inconsistent, but as you can imagine, in complex flows like the ones I work with, which involve dozens of joins, it can cause significant issues.

I desperately need your help to resolve this.

Thank you.

P.S. I’m using the latest release of Rulex Platform.

Link to comment
Share on other sites

Hi,

If I understand correctly, you are wondering why, for example, there are empty values for the AMDA004 attribute in the results of the Join task.

That is due to the data contents of the two datasets (left and right), and how the task has been configured.

In particular, we can observe the configuration at these points:

image.png

The "Left outer join" Join type will enforce the behavior explained in the documentation at https://doc.rulex.ai/docs/v12/factory/tasks/preprocessing/join.html

  • Left outer join: the final dataset includes all the records from the dataset on the left even if the join condition does not find matching records in the right dataset. If a row in the left-hand table does not match any row in the right-hand table the columns relative to the right-hand table will display empty cells.

This means that if IDCENTRO and IDANA do not match respectively IdCentro and IdAna, then "the columns relative to the right-hand table will display empty cells". Thus AMDA004, which is on the right dataset, will contain empty cells for such rows.

It is the case that some rows do not match between the two datasets for those attributes. For instance, notice how in the results AMDA004 is empty for IdCentro equal to 151 and IdAna equal to 2. As a matter of fact, if you inspect the right dataset and look for IDCENTRO equal to 151 and IDANA equal to 2, you will not find any rows.

image.png

If you wish to obtain a different behavior depending on your needs, you could modify the Join type or other options accordingly. For more information on the Join task and its options, take a look at our documentation here: https://doc.rulex.ai/docs/v12/factory/tasks/preprocessing/join.html

Let us know if you were referring to this problem and if you have other doubts.

image.png

Link to comment
Share on other sites

Thanks.

Your response made me realize that the issue is the disappearance of some rows that should be present in the left table. Specifically, 530 out of 1930 rows are missing.

The disappearance is due to an Import from task, which, when re-executed now, imports all 1980 rows. However, when executed in batch mode, 530 rows were missing...

Thanks again! I’m going to try to further investigate this problem.

Link to comment
Share on other sites

However, I still haven’t found any explanation for the malfunctioning of that Import from task and of several other Import from tasks, though when I have recomputed them, they all worked correctly.

Going back through the whole calculation process, I’ve noticed that the left outer join option behaves differently than it did in R4 when dealing with special cases, like having a key on the right table that is absent in the left table. Unfortunately, I haven’t been able to pinpoint exactly what the difference is, because I no longer have R4 and can’t do a comparative analysis on the individual join, but only on the final result of a very complex flow (400 Data Manager tasks, 200 Join task, etc.).

Link to comment
Share on other sites

  • Federica Musante changed the title to Join task with unexpected behavior

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...