79 tools
Prolific is an online platform dedicated to connecting researchers with participants worldwide, designed to efficiently and reliably collect human data for academic research, AI model training and evaluation, and market research. Through rigorous participant screening and quality controls, it helps you obtain reliable multimodal datasets.
Kaggle is a global leading platform for the data science and machine learning community, helping practitioners enhance their skills, solve real-world problems, and connect with experts worldwide through competitions, datasets, and collaborative environments.
MongoDB is a modern document-oriented database platform. Its flagship cloud offering, MongoDB Atlas, provides a fully managed database service. Atlas includes native vector search capabilities to help developers build generative-AI-powered applications and to support enterprises in modernizing data management and system architecture.

Micro1 AI is an AI platform focused on converting human expertise into high-quality, structured training data. It provides essential data infrastructure and services for AI model training, evaluation, and evolution by integrating expert recruitment, data production, quality assessment, and reinforcement learning environments. It serves cutting-edge AI labs and large tech companies.
clickworker is a crowdsourcing-based data services provider that leverages a global network of vetted, certified workers to deliver data labeling, analysis, and data collection services. These capabilities are primarily used for AI model training and business decision optimization, helping clients turn unstructured data into actionable insights.

Labelbox is a data factory platform for AI teams, offering high-quality data labeling, model evaluation, and hosted labeling services, helping accelerate AI projects from prototype development to scalable production.

Appen is a platform dedicated to delivering high-quality data services for AI, enabling enterprises to accelerate the development and deployment of AI applications through multimodal data annotation, model evaluation, and a global crowdsourcing network.

Firecrawl AI is an open-source web data extraction API designed for AI applications. It can transform any webpage content into structured data ready for use by large language models (LLMs), helping developers efficiently build RAG systems and AI data pipelines.

Bright Data is a globally leading platform for web data collection and proxy services, providing scalable, compliant solutions for public web data to help enterprises efficiently acquire market intelligence and AI training data.
Scale AI is a global leader in AI data and model training platforms, offering high-quality data labeling, model evaluation, and end-to-end solutions to help enterprises accelerate the development and deployment of AI applications.

Oxylabs is a premium enterprise-grade proxy service and data collection platform that provides scalable, intelligent public network data harvesting solutions, helping users efficiently access global web data.
Thordata Proxy is an enterprise-grade proxy service focused on web data extraction. It offers residential, mobile, static ISP, and datacenter proxies. Leveraging a vast global IP network, it helps users bypass anti-scraping measures and achieve anonymous, stable data collection, suitable for market research, ad verification, SEO optimization, and other business scenarios.
Raybit is an AI-powered, no-code web scraping tool that enables you to automatically extract structured data from a wide range of websites with a simple click, helping sales, operations, and research teams efficiently conduct market monitoring and information gathering.

SuperAnnotate AI is an end-to-end AI data platform focused on producing, managing, and governing high-quality training and evaluation data for machine learning models. The platform accelerates dataset creation, model performance evaluation, and optimization of AI agent workflows through multimodal data annotation, intelligent assistance tools, and end-to-end quality control.
Qdrant is an open-source, high-performance vector database and similarity search engine designed for AI applications, enabling efficient storage and retrieval of high-dimensional vector data. It is ideal for building RAG, recommendation systems, and other intelligent solutions.

Airbyte is an open-source data integration platform that helps enterprises build ELT pipelines with 600+ pre-built connectors, enabling efficient data synchronization and activation across applications, databases, and data warehouses.

CVAT is a leading open-source image and video data annotation platform designed for machine learning and AI workflows, helping teams worldwide to efficiently complete data labeling tasks.
InfluxDB is a leading time-series database designed for high-performance ingestion, storage, and real-time analytics of massive time-series data, enabling data-driven decision-making across industrial IoT, IT operations monitoring, and other domains.

Dataiku AI is an enterprise-grade, end-to-end data science and AI platform designed to simplify the full lifecycle from data preparation to AI application deployment. It provides a unified environment for data cleaning, machine learning modeling, generative AI development, and model operations, helping enterprises efficiently handle complex data while fostering cross-team collaboration and data-driven business innovation.

OpenTrain AI is a global talent marketplace focused on AI training and data labeling, connecting enterprise buyers with professional data labeling experts and service providers. The platform delivers end-to-end talent recruitment, project management, and payment solutions, helping teams quickly assemble remote capabilities while giving freelancers and service providers a centralized hub for project opportunities.
79 items total