Dify
English
English
  • Getting Started
    • Welcome to Dify
      • Features and Specifications
      • List of Model Providers
    • Dify Community
      • Deploy with Docker Compose
      • Start with Local Source Code
      • Deploy with aaPanel
      • Start Frontend Docker Container Separately
      • Environment Variables Explanation
      • FAQs
    • Dify Cloud
    • Dify Premium on AWS
    • Dify for Education
  • Guides
    • Model
      • Add New Provider
      • Predefined Model Integration
      • Custom Model Integration
      • Interfaces
      • Schema
      • Load Balancing
    • Application Orchestration
      • Create Application
      • Chatbot Application
        • Multiple Model Debugging
      • Agent
      • Application Toolkits
        • Moderation Tool
    • Workflow
      • Key Concepts
      • Variables
      • Node Description
        • Start
        • End
        • Answer
        • LLM
        • Knowledge Retrieval
        • Question Classifier
        • Conditional Branch IF/ELSE
        • Code Execution
        • Template
        • Doc Extractor
        • List Operator
        • Variable Aggregator
        • Variable Assigner
        • Iteration
        • Parameter Extraction
        • HTTP Request
        • Agent
        • Tools
        • Loop
      • Shortcut Key
      • Orchestrate Node
      • File Upload
      • Error Handling
        • Predefined Error Handling Logic
        • Error Type
      • Additional Features
      • Debug and Preview
        • Preview and Run
        • Step Run
        • Conversation/Run Logs
        • Checklist
        • Run History
      • Application Publishing
      • Structured Outputs
      • Bulletin: Image Upload Replaced by File Upload
    • Knowledge
      • Create Knowledge
        • 1. Import Text Data
          • 1.1 Import Data from Notion
          • 1.2 Import Data from Website
        • 2. Choose a Chunk Mode
        • 3. Select the Indexing Method and Retrieval Setting
      • Manage Knowledge
        • Maintain Documents
        • Maintain Knowledge via API
      • Metadata
      • Integrate Knowledge Base within Application
      • Retrieval Test / Citation and Attributions
      • Knowledge Request Rate Limit
      • Connect to an External Knowledge Base
      • External Knowledge API
    • Tools
      • Quick Tool Integration
      • Advanced Tool Integration
      • Tool Configuration
        • Google
        • Bing
        • SearchApi
        • StableDiffusion
        • Dall-e
        • Perplexity Search
        • AlphaVantage
        • Youtube
        • SearXNG
        • Serper
        • SiliconFlow (Flux AI Supported)
        • ComfyUI
    • Publishing
      • Publish as a Single-page Web App
        • Web App Settings
        • Text Generator Application
        • Conversation Application
      • Embedding In Websites
      • Developing with APIs
      • Re-develop Based on Frontend Templates
    • Annotation
      • Logs and Annotation
      • Annotation Reply
    • Monitoring
      • Data Analysis
      • Integrate External Ops Tools
        • Integrate LangSmith
        • Integrate Langfuse
        • Integrate Opik
    • Extension
      • API-Based Extension
        • External Data Tool
        • Deploy API Tools with Cloudflare Workers
        • Moderation
      • Code-Based Extension
        • External Data Tool
        • Moderation
    • Collaboration
      • Discover
      • Invite and Manage Members
    • Management
      • App Management
      • Team Members Management
      • Personal Account Management
      • Subscription Management
      • Version Control
  • Workshop
    • Basic
      • How to Build an AI Image Generation App
    • Intermediate
      • Build An Article Reader Using File Upload
      • Building a Smart Customer Service Bot Using a Knowledge Base
      • Generating analysis of Twitter account using Chatflow Agent
  • Community
    • Seek Support
    • Become a Contributor
    • Contributing to Dify Documentation
  • Plugins
    • Introduction
    • Quick Start
      • Install and Use Plugins
      • Develop Plugins
        • Initialize Development Tools
        • Tool Plugin
        • Model Plugin
          • Create Model Providers
          • Integrate the Predefined Model
          • Integrate the Customizable Model
        • Agent Strategy Plugin
        • Extension Plugin
        • Bundle
      • Debug Plugin
    • Manage Plugins
    • Schema Specification
      • Manifest
      • Endpoint
      • Tool
      • Agent
      • Model
        • Model Designing Rules
        • Model Schema
      • General Specifications
      • Persistent Storage
      • Reverse Invocation of the Dify Service
        • App
        • Model
        • Tool
        • Node
    • Best Practice
      • Develop a Slack Bot Plugin
      • Dify MCP Plugin Guide: Connect Zapier and Automate Email Delivery with Ease
    • Publish Plugins
      • Publish Plugins Automatically
      • Publish to Dify Marketplace
        • Plugin Developer Guidelines
        • Plugin Privacy Protection Guidelines
      • Publish to Your Personal GitHub Repository
      • Package the Plugin File and Publish it
      • Signing Plugins for Third-Party Signature Verification
    • FAQ
  • Development
    • Backend
      • DifySandbox
        • Contribution Guide
    • Models Integration
      • Integrate Open Source Models from Hugging Face
      • Integrate Open Source Models from Replicate
      • Integrate Local Models Deployed by Xinference
      • Integrate Local Models Deployed by OpenLLM
      • Integrate Local Models Deployed by LocalAI
      • Integrate Local Models Deployed by Ollama
      • Integrate Models on LiteLLM Proxy
      • Integrating with GPUStack for Local Model Deployment
      • Integrating AWS Bedrock Models (DeepSeek)
    • Migration
      • Migrating Community Edition to v1.0.0
  • Learn More
    • Use Cases
      • DeepSeek & Dify Integration Guide: Building AI Applications with Multi-Turn Reasoning
      • Private Deployment of Ollama + DeepSeek + Dify: Build Your Own AI Assistant
      • Build a Notion AI Assistant
      • Create a MidJourney Prompt Bot with Dify
      • Create an AI Chatbot with Business Data in Minutes
      • Integrating Dify Chatbot into Your Wix Website
      • How to connect with AWS Bedrock Knowledge Base?
      • Building the Dify Scheduler
      • Building an AI Thesis Slack Bot on Dify
    • Extended Reading
      • What is LLMOps?
      • Retrieval-Augmented Generation (RAG)
        • Hybrid Search
        • Re-ranking
        • Retrieval Modes
      • How to Use JSON Schema Output in Dify?
    • FAQ
      • Self-Host
      • LLM Configuration and Usage
      • Plugins
  • Policies
    • Open Source License
    • User Agreement
      • Terms of Service
      • Privacy Policy
      • Get Compliance Report
  • Features
    • Workflow
Powered by GitBook
On this page
  • You Will Learn
  • 1. Setting Stablility API Key
  • 2. Configure Model Providers
  • 3. Build an Agent
  • Write Prompts
  • Don't want to write prompts? Of course you can!
  • Publish
  • Question 1: How to Specify the Style of Generated Images?
  • Question 2: How to Reject Certain Requests from Some Users?
  1. Workshop
  2. Basic

How to Build an AI Image Generation App

PreviousBasicNextIntermediate

Last updated 4 months ago

Author: Steven Lynn. Dify Technical Writer.

With the rise of image generation, many excellent image generation products have emerged, such as Dall-e, Flux, Stable Diffusion, etc.

In this article, you will learn how to develop an AI image generation app using Dify.

You Will Learn

  • Methods for building an Agent using Dify

  • Basic concepts of Agent

  • Fundamentals of prompt engineering

  • Tool usage

  • Concepts of large model hallucinations

1. Setting Stablility API Key

If you haven't registered yet, you will be asked to register before entering the API management page.

After entering the management page, click copy to copy the key.

  • Log in to Dify

  • Enter Tools

  • Select Stability

  • Click Authorize

  • Fill in the key and save

2. Configure Model Providers

To optimize interaction, we need an LLM to concretize user instructions, i.e., to write prompts for generating images. Next, we will configure model providers in Dify following these steps.

The Free version of Dify provides 200 free OpenAI message credits.

If the message credits are insufficient, you can customize other model providers by following the steps in the image below:

Click Your Avatar - Settings - Model Provider

If you haven't found a suitable model provider, the groq platform provides free call credits for LLMs like Llama.

Click Create API Key, set a desired name, and copy the API Key.

Back to Dify - Model Providers, select groqcloud, and click Setup.

Paste the API Key and save.

3. Build an Agent

Back to Dify - Studio, select Create from Blank.

In this experiment, we only need to understand the basic usage of Agent.

What is an Agent

An Agent is an AI system that simulates human behavior and capabilities. It interacts with the environment through natural language processing, understands input information, and generates corresponding outputs. The Agent also has "perception" capabilities, can process and analyze various forms of data, and can call and use various external tools and APIs to complete tasks, extending its functional scope. This design allows the Agent to handle complex situations more flexibly and simulate human thinking and behavior patterns to some extent.

Select Agent, fill in the name.

Next, you will enter the Agent orchestration interface as shown below.

Select the LLM. Here we use Llama-3.1-70B provided by groq as an example:

Select Stability in Tools:

Write Prompts

Prompts are the soul of the Agent and directly affect the output effect. Generally, the more specific the prompts, the better the output, but overly lengthy prompts can also lead to negative effects.

The engineering of adjusting prompts is called Prompt Engineering.

In this experiment, you don't need to worry about not mastering Prompt Engineering; we will learn it step by step later.

Let's start with the simplest prompts:

Draw the specified content according to the user's prompt using stability_text2image.

Each time the user inputs a command, the Agent will know this system-level instruction, thus understanding that when executing a user's drawing task, it needs to call stability tool.

For example: Draw a girl holding an open book.

Don't want to write prompts? Of course you can!

Click Generate in the upper right corner of Instructions.

Enter your requirements in the Instructions and click Generate. The generated prompts on the right will show AI-generated prompts.

However, to develop a good understanding of prompts, we should not rely on this feature in the early stages.

Publish

Click the publish button in the upper right corner, and after publishing, select Run App to get a web page for an online running Agent.

Copy the URL of this web page to share with other friends.

Question 1: How to Specify the Style of Generated Images?

We can add style instructions in the user's input command, for example: Anime style, draw a girl holding an open book.

But if we want set the default style to anime style, we can add it to the system prompt because we previously learned that the system prompt is known each time the user command is executed and has a higher priority.

Draw the specified content according to the user's prompt using stability_text2image, the picture is in anime style.

Question 2: How to Reject Certain Requests from Some Users?

In many business scenarios, we need to avoid outputting some unreasonable content, but LLMs are often "dumb" and will follow user instructions without question, even if the output content is wrong. This phenomenon of the model trying hard to answer users by fabricating false content is called model hallucinations. Therefore, we need the model to refuse user requests when necessary.

Additionally, users may also ask some content unrelated to the business, and we also need the Agent to refuse such requests.

We can use markdown format to categorize different prompts, writing the prompts that teach the Agent to refuse unreasonable content under the "Constraints" title. Of course, this format is just for standardization, and you can have your own format.

## Task
Draw the specified content according to the user's prompt using stability_text2image, the picture is in anime style.

## Constraints
If the user requests content unrelated to drawing, reply: "Sorry, I don't understand what you're saying."

For example, let's ask: What's for dinner tonight?

In some more formal business scenarios, we can call a sensitive word library to refuse user requests.

Add the keyword "dinner" in Add Feature - Content Moderation. When the user inputs the keyword, the Agent app outputs "Sorry, I don't understand what you're saying."

to go to the Stability API key management page.

Next, you need to fill in the key in by following these steps:

Log in to

Click here
Dify - Tools - Stability
groq API Management Page