AWS Athena Query JSON Array with AND Condition
Introduction
AWS Athena is a serverless query service that allows users to analyze data stored in Amazon S3 using SQL. When working with JSON data, it can be challenging to write efficient queries that extract specific fields or apply conditions. In this article, we will explore how to use AWS Athena’s UNNEST function to flatten an array of objects and then filter the results based on AND conditions.
Background
JSON (JavaScript Object Notation) is a popular data format used to represent structured data in a human-readable format. When working with JSON data, it is often stored as a single string or document in Amazon S3. To analyze this data using AWS Athena, we need to convert the JSON string into a tabular format that can be queried using SQL.
UNNEST Function
The UNNEST function in AWS Athena allows us to flatten an array of objects into separate rows. This is useful when working with nested arrays or objects that contain multiple values. However, it also means that we need to carefully consider how to filter the results based on conditions that apply to individual elements.
Sample Data
Let’s take a look at some sample JSON data stored in S3:
{
"sample_data": {
"people": [
{
"firstName": "Emily",
"address": {
"streetAddress": "101",
"city": "abc",
"state": "",
"phoneNumbers": [
{
"type": "home",
"number": "11"
},
{
"type": "city",
"number": "4"
}
]
}
},
{
"firstName": "Smily",
"address": {
"streetAddress": "102",
"city": "def",
"state": "",
"phoneNumbers": [
{
"type": "home",
"number": "1"
},
{
"type": "city",
"number": "1"
}
]
}
}
]
}
}
This data contains two people with their address and phone numbers. The UNNEST function can be used to flatten the phoneNumbers array into separate rows.
Querying JSON Array
To query this data, we need to use the UNNEST function to flatten the array of objects:
SELECT
idx,
JSON_EXTRACT_SCALAR(x.n, '$.address.streetAddress') as street_address,
JSON_EXTRACT_SCALAR(x.n, '$.address.city') as city,
JSON_EXTRACT_SCALAR(x.m, '$.type') as type,
JSON_EXTRACT_SCALAR(x.m, '$.number') as value
FROM
sample_data1 CROSS JOIN
UNNEST (CAST(JSON_EXTRACT(sample_data,'$.people') AS ARRAY<JSON>) ) AS x(n)
CROSS JOIN
UNNEST (CAST(JSON_EXTRACT(x.n,'$.address.phonenumbers') AS ARRAY<JSON>) ) WITH ordinality AS x(m,idx);
This query uses UNNEST to flatten the people array and then another UNNEST to flatten the phoneNumbers array.
Applying AND Conditions
To apply an AND condition, we can use the WHERE clause with a combination of cardinality and filter. The cardinality function returns the number of elements that match the filter condition. By using this combination, we can check for both conditions (home = 2 and city = 4) in a single query.
Query
Here is an example query that applies the AND conditions:
SELECT
street_address,
city
FROM (
SELECT
JSON_EXTRACT_SCALAR(json_str, '$.address.streetAddress') as street_address,
JSON_EXTRACT_SCALAR(json_str, '$.address.city') as city,
CAST(
JSON_EXTRACT(json_str, '$.address.phoneNumbers') AS ARRAY<JSON>) phones
FROM dataset
)
WHERE
cardinality(
filter(
phones,
js->json_extract_scalar(js, '$.type') = 'home'
AND try_cast(json_extract_scalar(js, '$.number') as integer) > 2
)
) > 0
AND
cardinality(
filter(
phones,
js->json_extract_scalar(js, '$.type') = 'city'
AND json_extract_scalar(js, '$.number') = '4'
)
) > 0;
This query uses UNNEST to flatten the phoneNumbers array and then applies the AND conditions using a combination of cardinality and filter.
Output
The output of this query is:
street_address city
101 abc
Only one row matches both conditions (home = 2 and city = 4).
Conclusion
AWS Athena’s UNNEST function allows us to flatten arrays of objects into separate rows, making it easier to apply conditions. By combining cardinality and filter, we can check for multiple conditions in a single query. This example demonstrates how to use these features to extract specific fields from JSON data stored in S3 using AWS Athena.
Additional Tips
- When working with arrays or objects, consider using
UNNESTto flatten the data into separate rows. - Use
cardinalityandfiltercombinations to apply multiple conditions in a single query. - Be mindful of performance when applying filters to large datasets.
Last modified on 2025-04-07