Skip to content

Fixing bug setting index when parsing Google Vertex AI results #117287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 22, 2024

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Nov 21, 2024

For the Google Vertex AI rerank API, for an infer request with the following example inputs:

{
  "input": [
    "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright.",
    "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."
  ]
}

We create a request that looks like this, following the API specs

"records": [
   {
       "id": "1",
       "content": "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."
   },
   {
       "id": "2",
       "content": "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."
   }
]

Note that for the ID, we use the index of the entry in the input array

for (int recordId = 0; recordId < inputs.size(); recordId++) {
builder.startObject();
{
builder.field(ID_FIELD, String.valueOf(recordId));
builder.field(CONTENT_FIELD, inputs.get(recordId));
}

The API response is an array of records, sorted in descending order by score, for example

{
    "records": [
        {
            "id": "2",
            "score": 0.98,
            "content": "The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."
        },
        {
            "id": "1",
            "score": 0.64,
            "content": "A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."
        }
    ]
}

In our response parser, we were not using the record ID (index of the entry in the input array) as the index into our result array, resulting in results where the score did not match up to the entry in the input array.

@ymao1 ymao1 changed the title Using record ID as index value when parsing Google Vertex AI rerank r… Fixing bug setting index when parsing Google Vertex AI results Nov 21, 2024
@ymao1 ymao1 force-pushed the google-vertex-rerank-bug branch from 4360eb2 to 194f36d Compare November 21, 2024 21:24
@ymao1 ymao1 self-assigned this Nov 21, 2024
@ymao1 ymao1 added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.18.0 labels Nov 22, 2024
@ymao1 ymao1 marked this pull request as ready for review November 22, 2024 13:03
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @ymao1, I've created a changelog YAML for you.

throw new IllegalStateException(format(FAILED_TO_FIND_FIELD_TEMPLATE, RankedDoc.ID.getPreferredName()));
}

return new RankedDocsResults.RankedDoc(Integer.parseInt(parsedRankedDoc.id), parsedRankedDoc.score, parsedRankedDoc.content);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we catch the NumberFormatException that could be thrown by parseInt and provide a more friendly message about what went wrong?

Maybe add a test for an invalid integer too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 68e2721

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!

@ymao1 ymao1 merged commit 94c3e82 into elastic:main Nov 22, 2024
16 checks passed
@ymao1 ymao1 deleted the google-vertex-rerank-bug branch November 22, 2024 16:10
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

ymao1 added a commit to ymao1/elasticsearch that referenced this pull request Nov 22, 2024
…ic#117287)

* Using record ID as index value when parsing Google Vertex AI rerank results

* Update docs/changelog/117287.yaml

* PR feedback
elasticsearchmachine pushed a commit that referenced this pull request Nov 22, 2024
…) (#117358)

* Using record ID as index value when parsing Google Vertex AI rerank results

* Update docs/changelog/117287.yaml

* PR feedback
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Nov 22, 2024
…ic#117287)

* Using record ID as index value when parsing Google Vertex AI rerank results

* Update docs/changelog/117287.yaml

* PR feedback
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
…ic#117287)

* Using record ID as index value when parsing Google Vertex AI rerank results

* Update docs/changelog/117287.yaml

* PR feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :ml Machine learning Team:ML Meta label for the ML team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants