AI-Enhanced Real-time Algorithm for Streamlined Rendering in a 3D Modeling workflow

AIR STREaM


An anime-style illustration of Mickey Mouse inspired by “Steamboat Willie,” blending the classic black and white aesthetic with vibrant anime elements. Picture Mickey in his iconic pose, steering the boat, but with exaggerated anime features: large, expressive eyes and a more dynamic, fluid pose. The background should be a lively reinterpretation of the original steamboat setting, maintaining the vintage feel but with added depth and detail characteristic of anime. Use a monochrome palette with subtle hints of color to bridge the old and the new, creating a piece that’s both nostalgic and fresh. The overall mood is playful and energetic, capturing the essence of Mickey’s character in a new, yet familiar way. –aspect 3:2 (Niji Journey)

Samarth Reddy (3210635)
Digital Futures 2024
OCAD University


Introduction

Artificial intelligence (AI) has become a game-changer in many areas of technology, especially in digital design and modeling. My work, “AI-Enhanced Real-time Algorithm for Streamlined Rendering in a 3D Modeling Workflow (AIR STREaM)”, talks about how AI is changing the way we do 3D modeling, making it faster and more efficient. I used a method called Research through Design, which means I tried out lots of different AI platforms, models, and APIs.

At first, I divided my research into three parts: text, image, and audio-based AI platforms. For text, I looked at platforms like ChatGPT, OpenAI API, Jan AI, LM Studio, VSCode, Perplexity, and Copilot by Microsoft. For images, I used tools like Automatic 1111, Invoke, Comfy UI, Vlad Diffusion, Discord, Stability AI, Microsoft Designer, Adobe Firefly, Nvidia Omniverse, Foocus, Blockade Labs and Stable Matrix. And for audio, I tried out Eleven Labs and Natural Reader.

But as I got deeper into my research, I started to focus more on the models and user interfaces that these platforms use. In the text AI area, I looked at models like OpenAI API, GPT-3.5 and 4, Mixtral 8*7B Instruct Q4, Phi 3B Q8, GitHub Copilot, Claude 2.1, and Gemini Pro. For image models and web user interfaces, I used tools like Stable Diffusion, Midjourney, Niji Journey, Stability AI, Microsoft Designer, and Adobe Firefly.

Then, I started to think about how these AI models could be used in real-world workflows. In the text area, I tried out applications like the Obsidian note-taking app, VSCode, Cursor IDE, WebStorm IDE, Zotero, Rhino 3D and Grasshopper modeling software. For image applications, I used tools like Chainner and Grasshopper.

At the end of all this research, I did three experiments:

  • I made my own GPT’s.
  • I trained a custom image mode generation model using Stable Diffusion Web UI.
  • I developed a near-real-time rendering engine using Grasshopper in Rhino with Stable Diffusion API integration.

These experiments showed not only how AI can be used in practice, but also how it can make our daily tasks easier by integrating it into our existing workflows. In the third experiment, I explored on how AI can improve rendering for 3D modeling workflows. This shows how AI could change the world of digital design and rendering.

Objectives

My research was guided by a few main goals, all focused on showing how AI can help in digital design:

  • Exploring AI Platforms: I sorted AI platforms into types (text, image, audio) like ChatGPT and OpenAI API, and looked at how they can be used in digital design.
  • Investigating AI Models/UIs: I looked at specific AI models and user interfaces, such as GPT and Stable Diffusion, to see how they can be used in design tasks.
  • AI Integration in Workflows: I showed how AI can be added to daily design workflows with tools like VSCode and Grasshopper, making them more efficient and creative.
  • Targeted Experiments: I did experiments to show how AI can be used in tasks like creating custom GPT models and training image models with Stable Diffusion Web UI.
  • AI’s Impact on Design: I showed how AI can improve 3D modeling and rendering, especially in real-time applications.
  • Evaluating AI in Workflows: I looked at the pros and cons of adding AI into existing digital design workflows.

Theme/ Topic

“AI-Driven Design Visualization”

3D Modeling | AI | Rendering

Previous work/ Context

The idea of using artificial intelligence in 3D modeling and rendering is a hot topic right now, and a lot of research is being done in this area. In this section, I’ll talk about two important pieces of work that have helped shape my own project, the AI-Enhanced Real-time Algorithm for Streamlined Rendering in a 3D Modeling Workflow (AIR STREaM).

  • 3DALL-E: Using Text-to-Image AI in 3D Design Workflows

A paper by Vivian Liu, Jo Vermeulen, George Fitzmaurice, and Justin Matejka on 3DALL-E is a big step forward in using AI with computer-aided design (CAD) software. They used large language models like DALL-E, GPT-3, and CLIP to help create images in 3D design workflows. Here’s why their work is important:

  • Text and Image Prompt Construction: 3DALL-E lets users make text and image prompts that use GPT-3’s huge design language knowledge to help create parts, styles, and designs. This is a big step forward in using AI to help with creative thinking in 3D modeling.
  • Dynamic Interaction with CAD Viewport: The system lets designers use their current 3D progress in the CAD software as part of their image+text prompts. This makes the design process more connected and aware of the context.
  • Design Process and AI Utilization Patterns: A study with 13 designers showed different ways of using AI—’AI-first’, ‘AI-last’, and ‘AI-throughout’—showing that AI can be used flexibly at different stages of the design process. This flexibility is important for the AIR STREaM project, which aims to use AI tools at various stages of the workflow.
  • Inspiration Complexity and Management: Designers can explore prompts from simple to very complex (with over 10 concepts), showing that AI can handle a wide range of design challenges. Also, the idea of “prompt bibliographies” as a way to track the history of human-AI design could be very useful in understanding the AI-enhanced design process.
  • Ambrosinus Toolkit in Grasshopper Github Repo with Video Demonstration

The Ambrosinus Toolkit is a great example of using AI in Grasshopper, and there’s a video that shows how it works. Here are some key points from the video and the github repo:

  • Practical Use of Local AI Models: The toolkit shows how locally installed AI models can be used in Grasshopper, showing how it can be used in 3D modeling workflows.
  • Potential for AI-Enhanced Rendering: Even though it’s in a raw state and uses an older version of Stable Diffusion’s control net, the toolkit shows a lot of promise for using AI in rendering processes in 3D modeling.
  • Relevance to AIR STREaM Project: This real-world example gives valuable insights for the AIR STREaM project, especially in developing AI-enhanced rendering engines within design tools like Grasshopper.
  • This video not only shows what the toolkit can do, but also aligns with the goals of the AIR STREaM project in using AI in 3D modeling and rendering workflows.
  • Link: http://tinyurl.com/3se87dt2

Research through Design approach

My journey with the AIR STREaM project was guided by a “Research through Design” method, which means I tried out a lot of different AI tools across text, image, and audio areas, like ChatGPT, Stable Diffusion, and Natural reader. Through this method, I not only sorted and evaluated different AI platforms like OpenAI API and Adobe Firefly, but also used these tools in real design workflows. This led to the creation of custom AI applications like GPT models, image training with Stable Diffusion Web UI, and a near-real-time rendering engine, showing how AI can make 3D modeling and rendering processes more efficient and creative.

TextImageAudio
Chat GPTAutomatic 1111Eleven Labs
Open AI APIInvokeNatural Reader
Jan AIComfy AI 
LM StudioVlad Diffusion
VsCodeDiscord
PerplexityStability AI
CopilotMicrosoft Designer
 Adobe Firefly
Nvidia Omniverse
Fooocus
Stable Matrix
Table 1: Platforms Explored in the Study
Models/Web UIAPI
TextImageTextImage
OpenAI APIStable DiffusionObsidianChainner
Chat GPTMid JourneyVScodeGrasshopper
GPT 3.5/4Niji JourneyCursor 
Mixtral 8*7B q4Stability AiWebstorm
Phi-2 3B q8Microsoft DesignerZotero
Github CopilotAdobe FireflyRhino/ Grasshopper
Claude 2.1  
Gemini Pro
Table 2: Overview of Models, Web UIs, and API Integrations Examined
ModelVersionTypeSourceRating
Anime1AnimeHugging Face⭐⭐⭐⭐
A.Pastel Dream1AnimeCivitai⭐⭐⭐
Animerge26Mixed ModelCivitai⭐⭐⭐⭐⭐
Aniverse16Mixed ModelCivitai⭐⭐⭐⭐⭐
Architec. Exterior40ArchitectureCivitai⭐⭐⭐
Architecture RMix1ArchitectureCivitai⭐⭐⭐
Merged Protogen1Merged ModelCivitai⭐⭐⭐⭐
Counterfeit30AnimeCivitai⭐⭐⭐
Cyber Realistic41RealisticCivitai⭐⭐⭐⭐
Deliberate5RealisticHugging Face⭐⭐⭐⭐⭐
Dreamshaper8RealisticCivitai⭐⭐⭐⭐⭐
Inkpunk2InkHugging Face⭐⭐⭐⭐
Lyriel13 EOL3D RealisticCivitai⭐⭐⭐⭐
Meina Pastel1AnimeCivitai⭐⭐⭐⭐
Moon Mix Fantasy20Mixed ModelCivitai⭐⭐⭐⭐⭐
Novel Inkpunk1InkHugging Face⭐⭐⭐⭐
Perfect Deliberate5RealisticCivitai⭐⭐⭐⭐⭐
Product Design10RealisticCivitai⭐⭐⭐
ProtogenX22RealisticCivitai⭐⭐⭐⭐
ProtogenX34RealisticCivitai⭐⭐⭐⭐
ProtogenX53RealisticCivitai⭐⭐⭐⭐
Real Cartoon 3D133D CartoonCivitai⭐⭐⭐⭐⭐
Realistic Vision5RealisticCivitai⭐⭐⭐⭐⭐
Reliberate3RealisticHugging Face⭐⭐⭐⭐⭐
RevAnimated12 EOLMixed ModelCivitai⭐⭐⭐⭐⭐
Inkpunk StyleSDXL 1InkCivitai⭐⭐⭐
Seekyou1Cartoon/ AnimeCivitai⭐⭐⭐⭐
Synthwave Punk2InkCivitai⭐⭐⭐
Toonyou3Cartoon/ AnimeCivitai⭐⭐⭐⭐
Base Model1.5StandardHugging Face⭐⭐
Midjourney6Mixed ModelMidjourney⭐⭐⭐⭐⭐
Niji Journey5Anime/ CartoonNiji Journey⭐⭐⭐⭐⭐
MS DesignerMixed ModelMS Designer⭐⭐⭐⭐
Table 3: Text to Image AI Models explored and rated based on the image output

I used Stable Diffusion, especially the WebUI (Automatic 1111) version, to create the images (numbered 0 to 30) mentioned above. The configuration details for the creation process are as follows: The standard configuration is kept the same across all generations of different models, with the changes limited to the seed and the model used. For Midjourney and Niji Journey, only positive prompts are used. Also, the prompt for Microsoft Designer is different because of an unexpected copyright error. Along with the main checkpoint/safety tensor model, the above images are the result of two LoRAs: Mickey Mouse and M1cky5. Both LoRAs were given weights of 0.4 and 0.7, respectively, for the best results.

Stable Diffusion:

Positive Prompt:
(((stylized))), vivid ((upper body)) closeup portrait of ((Mickey Mouse)) in a horror style in an epic horror environment with a Focus on capturing the iconic features: large ears, bright button eyes, and a creepy smile. The image should be high-resolution and ultra-detailed, showcasing Mickey’s classic red shorts and white gloves. Use a clear, plain background to emphasize Mickey’s vibrant character. The overall mood is grunge and creepy, reminiscent of disney, with a focus on sharp, clean lines and a bright color palette.

LoRA: <lora:mickey mouse:0.4> <lora:M1cky5(0.7-1)CIVIT:0.4>

Negative Prompt: CS-OBG ng_deepnegative_v1_75t Unspeakable-Horrors-Composition-4v, Watermark, Text, censored, deformed, bad anatomy, disfigured, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, huge calf, bad hands, fused hand, missing hand, disappearing arms, disappearing thigh, disappearing calf, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, Abnormal hands, abnormal legs, abnormal feet, abnormal fingers

Steps: 20 Sampler: DDIM CFG scale: 7
Seed: -1
Size: 600×600 Model: -1
Denoising strength: 0.5
Hires upscale: 2
Hires upscaler: Latent

—-

Microsoft Designer

Prompt: Stylized and vivid closeup portrait of Mickey Mouse in epic horror style

Experiments

I carried out a few experiments to show how AI can be used in practice. These experiments covered a range of areas, from creating custom GPT models to advanced AI model training and integration into design tools.

Experiment 1: Text-based AI Experiments for Custom GPTs

In this experiment, I focused on creating custom GPT models for different uses:

Ava: This is a GPT model that’s all about UX research. It’s great at understanding and explaining complex UX papers and ideas. I didn’t upload any specific training data for this one.

Taco: This model is a guide for ethical human research. It interprets and applies TCPS 2 guidelines and gives precise advice. I used general public available data as custom instruction for this one.

Vada: This model is made for international students. It provides Indian recipes and cultural insights. I didn’t upload any specific training data for this one.

Maple: This is a guide for Canadian English. It specializes in cultural nuances, idioms, and regional language variations. I didn’t upload any specific training data for this one.

Poet: This model is designed to generate imaginative prompts for AI-generated images, specifically tailored for Mid Journey. I created custom instructions for training this one.

Experiment 2: Custom Safe Tensor Model Training

In this experiment, I trained a custom Safe tensor model using Khoya GUI to understand the alien language logograms from the movie “Arrival.” Even though the training was successful, the model had a hard time with the complexities of language translation. This shows the challenges in creating AI models that need to interpret and infer language deeply. Links:GitHub – WolframResearch/Arrival-Movie-Live-Coding: Documents from a live coding session by Christopher Wolfram related to content from the 2016 film Arrival

Images used for training

Experiment 3: Making Visuals More Realistic in 3D Modeling

In this experiment, I used the Ambrosinus Toolkit in Rhino Grasshopper and neural networks like Control Net version 1 to generate image-to-image. I turned the Rhino viewport into realistic or stylized visuals based on positive and negative prompts. This shows a new way of using AI in 3D design visualization. I’ll talk more about this experiment in the next sections.

Workflows

Exploration in Stable Diffusion:

When I was playing around with Stable Diffusion, I was really interested in how it could generate images from other reference images. I tried this out by taking a screenshot of a simple 3D model from Rhino’s viewport and using it in Automatic 1111 to make a new image. The process was cool, but it didn’t give me much control over what the new images looked like. Most of the time, the results were hit or miss, and only sometimes did I get an image that looked good. This showed me that we need a better way to use AI in design that we can control more. So, I started using the Control Net extension with Stable Diffusion, which was a big improvement. I also looked into using depth maps, canny, and pose models.

Traditional Design and Animation Workflows:

The usual way of doing design and animation takes a lot of steps and time. You have to model in 3D, create meshes, add textures, unwrap, bake, set up lighting, etc. All these steps and more take a lot of work and slow down how quickly I can see and change my designs in 3D spaces like Rhino. It’s hard to get quick feedback on my 3D models because it takes so long to see the final look.

Proposed AI-Assisted Workflows:

To make the workflow seamless, I came up with a new pipeline that uses AI to help see designs better while I’m modeling in 3D. This new way uses Stable Diffusion and Control Net and fits right into the design process:

  • I do the 3D modeling on one screen or viewport.
  • Then, in the Grasshopper interface, I type in what I want to see and what I don’t (positive and negative prompts) and start the process with just a click.
  • Right away, I can see the AI’s take on my 3D model in the Grasshopper space, which lets me quickly try out different looks.

This AI-assisted method speeds up the whole design process and gives me a quick way to see changes, which really helps me be more creative and efficient when I’m working on 3D modeling and coming up with new designs.

Experiment 3 – AI-Enhanced Real-time Algorithm for Streamlined Rendering in a 3D Modeling workflow (AIR STREaM)

This experiment is all about using extension models like Control Net version 1 of stable diffusion in the Rhino and Grasshopper workflow through a Local API for image-to-image generation. The main goal is to change Rhino’s 3D viewport model view into realistic or stylized visuals based on certain prompts and parameters.

Here’s how it works: I take a screenshot from Rhino’s viewport that shows a 3D model I’ve made. This screenshot is then used as the input image for the Control Net neural network within the Grasshopper environment. Here, I give relevant prompts, both positive and negative, along with parameters.

Once I’ve finalized the prompts and tuned the parameters, I start the AI-based image generation with a single click within Grasshopper. This makes Control Net change the Rhino viewport screenshot into a realistic or stylized image, as specified by the prompts. Below is the full algorithm description with screenshots.

Demo
Step 1:

This algorithm is the first step in connecting a local installation of Stable Diffusion (Automatic 1111) through its API. Users can choose a startup configuration that fits their hardware capabilities and usage scenarios. To start, it’s important to specify the directory of Stable Diffusion using a path file component. After linking all the necessary components and finishing the setup, the next step is to set up the WebUI. Once this is done, starting the program is as simple as pressing the “start” button. When it starts, Stable Diffusion begins operation, which you can see on the terminal/ CLI loading screen. The components used in this Grasshopper algorithm include both native Grasshopper elements and several using the Abrosinus Tool kit.

Initiation of Stable Diffusion using LoadSD Local component

Step 2:

After getting Stable Diffusion started through the API, the next thing I do is pick the model I want to use for making the image. The model I choose here will decide if the image looks realistic or more like a style. I have to pick the right model based on what kind of look I’m going for. In this part of the Grasshopper algorithm, I use a mix of the usual Grasshopper parts and some parts from the Abrosinus Toolkit.

Loading Local Models for selection

Step 3:

With the model picked out, I then start modeling in 3D in Rhino. This is just the normal way I work, where the viewport I’m using for modeling is where I get the image from. I take a screenshot of this viewport with the Grasshopper algorithm, and this screenshot is what I’ll use in the Stable Diffusion process to make a new image.

Rhino 3D viewport for 3D modeling

Step 4:

While I’m working on the 3D model, I move on to step 4, where the main part of the algorithm does its job. This part is all about making a new image from an image. It’s important to know that every time I change something in the algorithm, Stable Diffusion makes a new image. Depending on how fast my computer is, it might seem like Grasshopper has stopped working.Here’s what the algorithm includes:

  • A way to choose the Control Net Version
  • A model selector for Control Net (like Depth, Canny, Open pose, Scribble, etc.)
  • Space to write the positive and negative prompts
  • A place to put the directory path for the input images
  • Settings for steps, CFG scale, weight, and guidance scale
  • Options for the image width and height
  • Choices for the sampler method (like DDIM, Eular a, Karras, etc.)
  • A spot for the seed

Stable Diffusion is different from Midjourney, even though they both make images with AI. With Stable Diffusion, I have more control and can really shape the outcome. When I’m writing prompts for Stable Diffusion, especially with LoRA models, I have to be pretty specific to make sure the prompts work right. This need to write prompts in a certain way is what makes Stable Diffusion unique in how it makes images compared to Midjourney.

Main Algorithm for Image to Image generation (Prompt input and Control net)

The “Generated Image Info” part of the algorithm keeps all the details about an image. It saves this info and also shows it in the Grasshopper interface, which makes it easy to see and understand all the details about the images I make.

Here’s an example of how I set it up:

Prompt: Brick home with vintage feeling

Negative Prompt: Watermark, Text, censored, deformed, bad anatomy, disfigured, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, bad hands, fused hand, missing hand, disappearing arms, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, Abnormal hands, abnormal legs, abnormal feet, abnormal fingers (Standard Negative Prompt)

N images: 1,
Process: CN v1.0
Steps: 13
Sampler: DDIM
CFG scale: 7.3
Seed: 1168416200
Face restoration: CodeFormer
Size: 512×457
Model hash: 879db523c3
Model: dreamshaper_8
ControlNet Enabled: True
ControlNet Module: depth
ControlNet Model: control_sd15_depth [fef5e48e]
ControlNet Weight: 0.93
ControlNet Guidance Start: 0.04
ControlNet Guidance End: 1.0
Version: v1.7.0

Step 5:

The last step is pretty simple. I just set up a way to take screenshots of the Rhino 3D modeling interface. These screenshots go into the Step 4 algorithm to start making a new image. This step is like the control panel; it decides which view gets captured and sent to the main algorithm for making the image. It also has a part that saves all the screenshots in a certain folder, so I can keep track of all the images I’ve used in the process.

Image/ Screenshot capture algorithm

Below are some samples of the 3D model to Render workflow:

Prompt: Create an image of a Voxel Man standing, set against a simple, empty background to emphasize his form. This character, constructed from brightly colored 3D voxel cubes, radiates a pixelated, retro video game charm. He stands upright, showcasing a straightforward pose that highlights his blocky structure and the clean lines of his voxel-based design. The empty background is a solid color, perhaps a soft neutral or pastel shade, to ensure the Voxel Man is the central focus. The lighting is straightforward and evenly distributed, casting subtle shadows that define his cubic form without overwhelming the simplicity of the scene. The mood is playful yet minimalistic, capturing the essence of voxel art with a modern, clean aesthetic. The overall composition is crisp and focused, with an emphasis on the geometric beauty and vibrant colors of the voxel cubes.

Negative Prompt: Watermark, Text, censored, deformed, bad anatomy, disfigured, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, huge calf, bad hands, fused hand, missing hand, disappearing arms, disappearing thigh, disappearing calf, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, Abnormal hands, abnormal legs, abnormal feet, abnormal fingers

N images: 1,
Process: CN vl .0
Steps: 13
Sampler: DDIM
CFG scale: 7.3
Seed:3902787
Face restoration: CodeFormer
Size: 1000xl 098
Model hash: 980cb713af
Model: reliberate_vl0
ControlNet Enabled: True
ControlNet Module: depth
ControlNet Model: control_sd1S_depth [fef5e48e]
ControlNet Weight: 0.93
ControlNet Guidance Start: 0.04
ControlNet Guidance End: 1.0
Version: vl .7.0

Prompt: In a landscape of red Martian soil and jagged rocks, envision a Brutalist-style habitat, characterized by its stark, monolithic appearance. The habitat, massive and geometric, is composed of rugged, unadorned concrete with sharp lines and an imposing, fortress-like demeanor. The structure dominates the barren Martian terrain, with large, angular windows that are few and strategically placed, allowing a glimpse of the harsh, yet mesmerizing Martian environment outside. The habitat?s exterior is weathered, showcasing the effects of Martian dust storms, while the interior glimpses reveal a contrastingly warm and lively human presence. The habitat stands under a deep, rust-colored sky, casting long, dramatic shadows in the low sunlight, embodying a sense of isolation and resilience. The image conveys a mood of austere beauty and the stark reality of extraterrestrial survival, perfectly suited for a highly detailed, realistic rendering.

Negative Prompt: Watermark, Text, censored, deformed, bad anatomy, disfigured, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, huge calf, bad hands, fused hand, missing hand, disappearing arms, disappearing thigh, disappearing calf, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, Abnormal hands, abnormal legs, abnormal feet, abnormal fingers

N images: 1,
Process: CN vl .0
Steps: 13
Sampler: DDIM
CFG scale: 7.3
Seed:1279902862
Face restoration: CodeFormer
Size: 1000xl 098
Model hash: ef76aa2332
Model: realisticVisionVS1_vS1VAE
ControlNet Enabled: True
ControlNet Module: depth
ControlNet Model: control_sd1S_depth [fef5e48e]
ControlNet Weight: 0.95
ControlNet Guidance Start: 0.04
ControlNet Guidance End: 1.0 Version: vl .7.0

Prompt: Milk Bottle, masterpiece, HD, unreal engine, 4K, Minimal,Product Design, Innovative Stickers, Futuristic Design

Negative Prompt: Watermark, Text, censored, deformed, bad anatomy, disfigured, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, huge calf, bad hands, fused hand, missing hand, disappearing arms, disappearing thigh, disappearing calf, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, Abnormal hands, abnormal legs, abnormal feet, abnormal fingers

N images: 1,
Process: CN vl .0
Steps: 13
Sampler: DDIM
CFG scale: 7.3
Seed:2004396693
Face restoration: CodeFormer
Size: 512×562
Model hash: 770d3d3185
Model: animerge_v26
ControlNet Enabled: True
ControlNet Module: depth
ControlNet Model: control_sd1S_depth [fef5e48e]
ControlNet Weight: 0.85
ControlNet Guidance Start: 0.04
ControlNet Guidance End: 1.0
Version: vl .7.0

Prompt: Create an image of an exquisitely crafted perfume bottle, shimmering with the clarity of fine glass. This bottle, unique in its design, features sharp, star-like edges that catch and refract light, casting a spectrum of dazzling colors. Imagine it set against a softly blurred, monochromatic background that accentuates its clear, angular form. The bottle should be central in the composition, with a focus on its geometric precision and the way its surfaces create a dance of light and shadow. The lighting should be soft yet directional, highlighting the sharp edges and the graceful curves of the bottle, evoking a sense of luxury and elegance. The overall mood is one of sophisticated serenity, where the bottle’s delicate beauty stands out in a harmonious balance of light and form.

Negative Prompt: Watermark, Text, censored, deformed, bad anatomy, disfigured, poorly drawn face, mutated, extra limb, ugly, poorly drawn hands, missing limb, floating limbs, disconnected limbs, disconnected head, malformed hands, long neck, mutated hands and fingers, bad hands, missing fingers, cropped, worst quality, low quality, mutation, poorly drawn, huge calf, bad hands, fused hand, missing hand, disappearing arms, disappearing thigh, disappearing calf, disappearing legs, missing fingers, fused fingers, abnormal eye proportion, Abnormal hands, abnormal legs, abnormal feet, abnormal fingers

N images: 1,
Process: CN vl .0
Steps: 13
Sampler: DDIM
CFG scale: 7.3
Seed:2008545859
Face restoration: CodeFormer
Size: 512×511
Model hash: 925bd947d7
Model: cyberrealistic_v41BackToBasics
ControlNet Enabled: True
ControlNet Module: depth
ControlNet Model: control_sd1S_depth [fef5e48e]
ControlNet Weight: 1.0
ControlNet Guidance Start: 0.04
ControlNet Guidance End: 1.0
Version: vl .7.0

Tools and Technologies

For my research on making 3D modeling workflows faster with AI, I used a bunch of different tools and technologies.

Hardware

I needed some pretty powerful hardware to handle the AI image generation with Stable Diffusion and the 3D modeling and design in Rhino3D/Grasshopper:

  • RAM: I used 32 GB of RAM so I could run different processes at the same time.
  • GPU: I had an Nvidia RTX 3080 with 10GB, which is good for making the AI models work faster.
  • Storage: I used a 500GB disc space to keep all the model assets and the images I created.
Software

Here are the main software tools and platforms I used:

  • Automatic 1111: This is the Stable Diffusion Web UI that I used to make images.
  • Python 3.10.6: Used for autmatic 1111.
  • Git: version control and download the recent changes made in Automatic 1111.
  • Model Files: I got AI model files from places like HuggingFace and Civitai.
  • CUDA Toolkit: This is for making the GPU work better.
  • Rhinoceros 7: This is the 3D modeling software I used.
  • Grasshopper: It’s a tool for visual scripting in Rhino.
  • Ambrosinus Toolkit: This helped me integrate AI in Grasshopper.
  • Lady Bug: I used this for previwing images on the canvas.
  • Moonlight: Dark mode extension.
  • Bifocals: I used this to see what components are on screen.
  • Discord: I used this for Midjourney and Niji Journey generations.

Reflections on the Outcome

Working on this project, which utilized a Research through Design approach, was particularly engaging, especially given the involvement of cutting-edge technologies. While I relished the opportunity to delve into various AI tools spanning text, image, and audio domains, including ChatGPT, Stable Diffusion, and Natural reader, I do wish there had been more time to explore augmented reality (AR) projects.

This endeavor involved a comprehensive exploration of different AI platforms such as OpenAI, Adobe Firefly, Midjourney, and Stable Diffusion. More than just an assessment, it integrated these tools into practical design workflows. This integration led to the creation of custom AI applications, including tailored GPT models, image training with Stable Diffusion’s Web UI, and the development of a near-real-time rendering engine. Such advancements underscore the transformative potential of AI in boosting efficiency and creativity within the realms of 3D modeling and rendering.

This journey provided invaluable insights into AI’s capabilities in augmenting design workflows but also highlighted the challenges in seamlessly incorporating such rapidly evolving tools. Key takeaways include:

  • The versatility of AI is immense, and its potential to enhance design workflows is significant. However, selecting the right tools through meticulous testing is crucial for success.
  • Effective integration of AI technologies requires thoughtful user experience (UX) design and appropriate hardware provisions to ensure that existing workflows are not disrupted.
  • Customizing prompts and model development is critical to ensure AI outcomes align with ethical standards and to prevent the propagation of bias.
  • Realizing the full capabilities of AI hinges on careful tool selection, UX considerations, technical infrastructure, and skilled prompt engineering.
  • The requirement for substantial computational resources presents a notable challenge.
  • The installation process and ensuring error-free functioning demand considerable effort and expertise.
  • API integrations, while powerful, can be complex and often lack sufficient guidance.
  • Testing a multitude of models is a time-intensive endeavor.

Evaluation

To assess the efficacy of the AI integration experiments conducted in enhancing real-time rendering for 3D modeling, the following criteria were evaluated:

SlCriteriaDescriptionYes/ Needs ImprovementJustification
1Learnability/ Ease of use.Easy to learn and use.Needs ImprovementThe integration of stable diffusion with Rhino, Grasshopper requires a steep learning curve. Better tutorials and onboarding processes are needed.
2Efficiency.Workflow improvement.YesThe integration speeds up design workflows by enabling quick iterative visualization.
3Errors & FailuresDifficulty in handling errors.YesThe algorithm needs to improve its error handling capabilities.
4User AutonomyProvides control to the user.YesUsers can control AI parameters like prompts and control net, allowing creative control over outcomes.
5AccessibilityDesigned for diverse users.Needs ImprovementThe design needs to be more inclusive and accessible for diverse user groups.
6Workflow integrationSeamless integration into existing workflows.Needs ImprovementThe user experience needs refining to ensure seamless integration with existing design processes.
7Rendering PerformanceQuality of Visualization.Needs ImprovementThe rendering process, while functional, needs improvements for consistency and reliability.
8Creative AmplificationExpandable use case.YesThe algorithm shows potential in advancement with many more versions to come.
Github Repo with the Code and Screenshot

calluxpore/AI-Enhanced-Real-time-Algorithm-for-Streamlined- Rendering-in-a-3D-Modeling-workflow-AIR-STREaM- (github.com)

Hi-res Image of algorithm

https://calluxpore.github.io/AI-Enhanced-Real-time-Algorithm- for-Streamlined-Rendering-in-a-3D-Modeling-workflow-AIR-STREaM-/Hi-Res%20Algorithm.png

Presentation

https://pitch.com/embed-link/ir92zi

AI Tools Declaration

In this project, I worked with a bunch of AI models that were already trained to understand text, images, audio, and to create new images. I got these models/ UI from places like HuggingFace, Civitai, Automatic1111, Midjourney, Niji Journey, and Microsoft Designer. I made sure to use all the models the right way, following their licenses and rules. I also tried making some custom models of my own, especially in the first experiment where I was making custom GPTs. These custom models were trained with data that was either free for anyone to use or data that I owned. For example, I used the Stable Diffusion Web UI (Automatic 1111) to train a custom model with symbols from the movie “Arrival” that I found on a Github Repo. This model was just for testing and wasn’t shared with others; I deleted it after I made it because it didn’t work the way I hoped it would.

CategoryPlatforms
TextChat GPT, Open AI API, Jan AI, LM Studio, VsCode, Perplexity, Copilot (Bing)
ImageAutomatic 1111, Invoke, Comfy UI, Vlad Diffusion, Discord, Stability AI, Microsoft Designer, Adobe Firefly, Nvidia Omniverse, Foocus, Stable Matrix
AudioEleven Labs, Natural reader
Platforms Explored/ Used
CategoryPlatforms
TextOpen Ai API, Chat GPT, GPT 3.5/4, Mixtral 8*7B Instruct q4, Phi-2 3B q8, Github Copilot, Claude 2.1, Gemini Pro
ImageStable Diffusion, Mid Journey, Niji Journey, Stability AI, Microsoft Designer, Adobe Firefly
Models/UI Explored/ Used
CategoryPlatforms
TextObsidian, VsCode, Cursor, Webstorm, Zotero, Rhino/ Grasshopper
ImageChainner, Grasshopper
API Explored/ Used

References

* LM Studio—Discover and run local LLMs. (n.d.). Retrieved January 29, 2024, from https:// lmstudio.ai

AI Photo Editor—Adobe Photoshop. (n.d.). Retrieved January 29, 2024, from https:// www.adobe.com/products/photoshop/ai.html

AI Voice Generator & Text to Speech | ElevenLabs. (n.d.). Retrieved January 29, 2024, from https://elevenlabs.io/

AI Voices—NaturalReader Home. (n.d.). Retrieved January 29, 2024, from https:// www.naturalreaders.com/

Ambrosinus Toolkit. (2022, October 19). [Text]. Food4Rhino. https://www.food4rhino.com/en/ app/ambrosinus-toolkit

Anime Pastel Dream—Soft—Baked vae | Stable Diffusion Checkpoint | Civitai. (2023, May 19). https://civitai.com/models/23521/anime-pastel-dream

AniMerge—V2.6 | Stable Diffusion Checkpoint | Civitai. (2024, January 9). https://civitai.com/ models/144249/animerge

AniVerse—V2.0 (HD) | Stable Diffusion Checkpoint | Civitai. (2024, January 16). https:// civitai.com/models/107842/aniverse

architecture_Exterior_SDlife_Chiasedamme—V4.0 Exterior | Stable Diffusion Checkpoint | Civitai. (2024, January 15). https://civitai.com/models/114612/ architectureexteriorsdlifechiasedamme

ArchitectureRealMix—V1repair | Stable Diffusion Checkpoint | Civitai. (2023, June 28). https:// civitai.com/models/84958/architecturerealmix

AUTOMATIC1111. (2022). Stable Diffusion Web UI [Python]. https://github.com/ AUTOMATIC1111/stable-diffusion-webui (Original work published 2022)

Bifocals. (2016, February 22). [Text]. Food4Rhino. https://www.food4rhino.com/en/app/bifocals chaiNNer-org/chaiNNer. (2024). [Python]. chaiNNer. https://github.com/chaiNNer-org/chaiNNer (Original work published 2021)

ChatGPT. (n.d.). Retrieved January 29, 2024, from https://openai.com/chatgpt

Claude 2. (n.d.). Retrieved January 29, 2024, from https://www.anthropic.com/news/claude-2 comfyanonymous. (2024). Comfyanonymous/ComfyUI [Python]. https://github.com/ comfyanonymous/ComfyUI (Original work published 2023)

Copilot. (n.d.). Retrieved January 29, 2024, from https://www.bing.com/copilot? toWww=1&redig=D2D0F5A1CFAA43FCBBDB85C6CCD45C85

Counterfeit-V3.0—V3.0 | Stable Diffusion Checkpoint | Civitai. (2023, October 24). https:// civitai.com/models/4468/counterfeit-v30

CUDA Toolkit—Free Tools and Training | NVIDIA Developer. (n.d.). Retrieved January 29, 2024, from https://developer.nvidia.com/cuda-toolkit

Cursor—The AI-first Code Editor. (n.d.). Retrieved January 29, 2024, from https://cursor.sh/ CyberRealistic—V4.1 “Back to Basics” | Stable Diffusion Checkpoint | Civitai. (2023, December 11). https://civitai.com/models/15003/cyberrealistic

Discord | Your Place to Talk and Hang Out. (n.d.). Retrieved January 29, 2024, from https:// discord.com/

DreamShaper—8 | Stable Diffusion Checkpoint | Civitai. (2024, January 3). https://civitai.com/ models/4384/dreamshaper

Envvi/Inkpunk-Diffusion · Hugging Face. (n.d.). Retrieved January 29, 2024, from https:// huggingface.co/Envvi/Inkpunk-Diffusion

Gemini—Google DeepMind. (n.d.). Retrieved January 29, 2024, from https://deepmind.google/ technologies/gemini/

Git. (n.d.). Retrieved January 29, 2024, from https://git-scm.com/

GitHub Copilot · Your AI pair programmer. (n.d.). GitHub. Retrieved January 29, 2024, from https://github.com/features/copilot? ef_id=_k_9851d333cd3d17412792b68ba71719d3_k_&OCID=AIDcmmofjgdhvp_SEM                                                                                                                                           k_9851d3 33cd3d17412792b68ba71719d3_k_&msclkid=9851d333cd3d17412792b68ba71719d3

GitHub: Let’s build from here. (n.d.). GitHub. Retrieved January 29, 2024, from https:// github.com/

InvokeAI. (n.d.). GitHub. Retrieved January 29, 2024, from https://github.com/invoke-ai Ladybug Tools. (2016, October 8). [Text]. Food4Rhino. https://www.food4rhino.com/en/app/ ladybug-tools

Liu, V., Vermeulen, J., Fitzmaurice, G., & Matejka, J. (2023). 3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows (arXiv:2210.11603). arXiv. https://doi.org/10.48550/arXiv.2210.11603 lllyasviel. (2024). Lllyasviel/Fooocus [Python]. https://github.com/lllyasviel/Fooocus (Original work published 2023)

LykosAI/StabilityMatrix. (2024). [C#]. Lykos AI. https://github.com/LykosAI/StabilityMatrix (Original work published 2023)

Lyriel—V1.6 | Stable Diffusion Checkpoint | Civitai. (2023, August 13). https://civitai.com/ models/22922/lyriel

MeinaPastel—V6 ( Pastel ) | Stable Diffusion Checkpoint | Civitai. (2023, July 2). https:// civitai.com/models/11866/meinapastel

Mickey mouse—V1.0 | Stable Diffusion LoRA | Civitai. (2023, November 29). https://civitai.com/ models/105071/mickey-mouse

Microsoft Designer—Stunning designs in a flash. (n.d.). Retrieved January 29, 2024, from https:// designer.microsoft.com

Midjourney. (n.d.). Midjourney. Retrieved January 29, 2024, from https://www.midjourney.com/ home?callbackUrl=%2Fexplore

Moonlight | Food4Rhino. (n.d.). Retrieved January 29, 2024, from https://www.food4rhino.com/ en/app/moonlight

MoonMix—V1.8 | Stable Diffusion Checkpoint | Civitai. (2023, March 13). https://civitai.com/ models/19129/moonmix

Network, S. D. created this N. (n.d.). Grasshopper. Retrieved January 29, 2024, from https:// www.grasshopper3d.com/

Nijijourney. (n.d.). Niji・journey. Retrieved January 29, 2024, from https://nijijourney.com/en/

Novel Inkpunk F222—V1 | Stable Diffusion Checkpoint | Civitai. (2022, December 23). https:// civitai.com/models/1186/novel-inkpunk-f222

NVIDIA Omniverse. (n.d.). NVIDIA. Retrieved January 29, 2024, from https://www.nvidia.com/en- us/omniverse/

Obsidian—Sharpen your thinking. (n.d.). Retrieved January 29, 2024, from https://obsidian.md/

OpenAI API. (n.d.). Retrieved January 29, 2024, from https://openai.com/blog/openai-api Open-source ChatGPT Alternative | Jan. (n.d.). Retrieved January 29, 2024, from https://jan.ai/ PerfectDeliberate—V4.0 Review | Civitai. (n.d.). Retrieved January 29, 2024, from https:// civitai.com/posts/263839

Perplexity. (n.d.). Retrieved January 29, 2024, from https://www.perplexity.ai/

Product Design (minimalism-eddiemauro)—Eddiemauro 2.0 | Stable Diffusion Checkpoint | Civitai. (2023, June 14). https://civitai.com/models/23893/product-design-minimalism- eddiemauro

Protogen v2.2 (Anime) Official Release—Protogen v2.2 | Stable Diffusion Checkpoint | Civitai. (2023, March 27). https://civitai.com/models/3627/protogen-v22-anime-official-release Protogen x3.4 (Photorealism) Official Release—Protogen x3.4 | Stable Diffusion Checkpoint | Civitai. (2023, August 18). https://civitai.com/models/3666/protogen-x34-photorealism-official- release

Protogen x5.3 (Photorealism) Official Release—Protogen x5.3 | Stable Diffusion Checkpoint | Civitai. (2023, March 14). https://civitai.com/models/3816/protogen-x53-photorealism-official- release

RealCartoon3D – V13 | Stable Diffusion Checkpoint | Civitai. (2023, December 31). https:// civitai.com/models/94809/realcartoon3d

Realistic Vision V6.0 B1—V5.1 (VAE) | Stable Diffusion Checkpoint | Civitai. (2024, January 21). https://civitai.com/models/4201/realistic-vision-v60-b1

ReV Animated—V1.2.2-EOL | Stable Diffusion Checkpoint | Civitai. (2023, November 30). https://civitai.com/models/7371/rev-animated

Rhino—Rhinoceros 3D. (n.d.). Retrieved January 29, 2024, from https://www.rhino3d.com/ Runwayml/stable-diffusion-v1-5 · Hugging Face. (n.d.). Retrieved January 29, 2024, from https:// huggingface.co/runwayml/stable-diffusion-v1-5

Stability AI. (n.d.). Stability AI. Retrieved January 29, 2024, from https://stability.ai SynthwavePunk—V2 | Stable Diffusion Checkpoint | Civitai. (2023, June 12). https://civitai.com/ models/1102/synthwavepunk

The Stable Diffusion Guide �. (n.d.). Retrieved January 29, 2024, from https://huggingface.co/ docs/diffusers/v0.14.0/en/stable_diffusion

TheBloke/Mixtral-8x7B-v0.1-GGUF · Hugging Face. (n.d.). Retrieved January 29, 2024, from https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF

TheBloke/phi-2-GGUF · Hugging Face. (n.d.). Retrieved January 29, 2024, from https:// huggingface.co/TheBloke/phi-2-GGUF

ToonYou—Beta 6 . | Stable Diffusion Checkpoint | Civitai. (2023, September 4). https:// civitai.com/models/30240/toonyou

Visual Studio Code—Code Editing. Redefined. (n.d.). Retrieved January 29, 2024, from https:// code.visualstudio.com/

Vlado Mandic, V. (2022). SD.Next [Python]. https://github.com/vladmandic/automatic (Original work published 2022)

WebStorm: The JavaScript and TypeScript IDE, by JetBrains. (n.d.). JetBrains. Retrieved January 29, 2024, from https://www.jetbrains.com/webstorm/

Welcome to Python.org. (n.d.). Retrieved January 29, 2024, from https://www.python.org/

WolframResearch/Arrival-Movie-Live-Coding. (2024). [Mathematica]. Wolfram Research, Inc. https://github.com/WolframResearch/Arrival-Movie-Live-Coding (Original work published 2017) XpucT (Khachatur). (2024, January 8). https://huggingface.co/XpucT

Yoo, S., Lee, S., Kim, S., Hwang, K. H., Park, J. H., & Kang, N. (2021). Integrating deep learning into CAD/CAE system: Generative design and evaluation of 3D conceptual wheel. Structural and Multidisciplinary Optimization, 64(4), 2725–2747. https://doi.org/10.1007/s00158-021-02953-9 Zotero | Your personal research assistant. (n.d.). Retrieved January 29, 2024, from https:// www.zotero.org/


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *