Definite: The Missing Card Catalog for Your S3 Data Library

4/14/2024

Mike Ritchie

Many companies dump data into S3 thinking, "we'll figure out how to use it later". They often never figure it out. It too complicated to query S3 data directly (e.g. Spark) and it's too expensive to pipe it to a data warehouse (e.g. Snowflake).

In the vast, sprawling library that is your S3 data lake, volumes of information sit gathering dust on the shelves. Each book holds valuable insights, but without a card catalog or a knowledgeable librarian, finding the right nugget of data is like searching for a needle in a haystack. You wander the aisles aimlessly, pulling out books at random, hoping to stumble upon the answers you seek.

You need an AI librarian. One that will organize your library and guide you to insights. Enter Definite: the missing card catalog for your S3 data library.

From Raw Data to Actionable Insights

1. Seamless Connection and Direct Queries:

  • No cumbersome data pipelines and intermediary databases. Definite directly connects to your S3 buckets with your access key and secret.
  • Once connected, querying your data becomes as simple as writing SQL. Let's say you have taxi trip data stored in Parquet format:
SELECT passenger_count, trip_distance, total_amount
FROM read_parquet('s3://your-bucket/taxi_trips/2023-06.parquet') 
WHERE company = 'Uber'
ORDER BY trip_distance DESC
LIMIT 10;

2. Building Intuitive Data Models (Cubes):

  • Definite introduces "cubes" - data models that encapsulate business logic and metrics. Define your dimensions and measures in a simple YAML format:
cubes:
  taxi_trips:
    measures:
      trip_count:
        type: count
      average_distance:
        type: average
        sql: trip_distance
      total_revenue:
        type: sum
        sql: total_amount
    dimensions:
      company:  
        type: string
      day_of_week:
        type: string
      passenger_count:
        type: number

This code defines a taxi_trips cube with measures like trip count, average distance, and total revenue, along with dimensions like company, day of the week, and passenger count. This cube becomes a versatile tool for analyzing various aspects of your taxi trip data.

3. Fi: Your AI-Powered Data Companion:

  • Definite's AI assistant, Fi, breaks down the barrier of technical expertise. Ask questions in plain English:
    • "Show me the average trip distance for Lyft on weekends."
    • "Compare total revenue for Uber and Lyft for each day of the week."
    • "Which day of the week has the highest number of trips with more than 2 passengers?"

Fi interprets your questions, interacts with the defined cubes, and presents the answers in visualizations and tables.

4. Crafting and Sharing Data Stories through Dashboards:

  • Definite's dashboards provide a canvas for weaving your data insights into compelling narratives.
  • Combine charts and tables generated by Fi to present a holistic view of trends and patterns.
  • Filter data, drill down into specifics, and customize the layout to suit your audience and message.
  • Share dashboards with your team, fostering collaboration and data-driven decision making across your organization.

Definite: Democratizing Data Analysis for the Modern Era

By eliminating technical barriers and providing an intuitive, AI-powered platform, Definite empowers everyone to become data-driven. From data analysts to business users, anyone can now explore, analyze, and extract valuable insights from their S3 data lakes.

So if you're ready to transform your S3 data lake from a cluttered, disorganized mess into a well-indexed, easily navigable library, Definite is here for you. With Definite, the answers you seek are always just a question away.