You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can download the latest package, docker image, example, documentation, and platform-tool from [RKLLM_SDK](https://console.zbox.filez.com/l/RJJDmB), fetch code: rkllm
2. API usage demo: [rkllm_api_demo](https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_api_demo)
84
+
3. API server demo: [rkllm_server_demo](https://github.com/airockchip/rknn-llm/tree/main/examples/rkllm_server_demo)
85
+
78
86
# Note
79
87
80
88
- The modifications in version 1.1 are significant, making it incompatible with older version models. Please use the latest toolchain for model conversion and inference.
@@ -85,7 +93,7 @@ You can download the latest package, docker image, example, documentation, and p
1. This demo demonstrates how to deploy the Qwen2-VL-2B model. The Vision + Projector component is exported as an RKNN model using the `rknn-toolkit2`, while the LLM component is exported as an RKLLM model using the `rkllm-toolkit`.
3
+
2. The open-source model used in this demo is available at: [Qwen2-VL-2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)
["The image depicts an astronaut in a white spacesuit, reclining on a green chair with his feet up. He is holding a green beer bottle in his right hand. The astronaut is on a moon-like surface, with the Earth visible in the background. The scene is set against a backdrop of stars and the moon's surface, creating a surreal and whimsical atmosphere."]
23
+
```
24
+
25
+
## 3. Model Conversion
26
+
-### convert to onnx
27
+
28
+
1. Export the Vision + Projector component of the Qwen2-VL-2B model to an ONNX model using the `export/export_vision.py` script.
29
+
30
+
2. Since RKNN currently supports only `float32`, if the data type is restricted when loading weights, you need to set the `"use_flash_attn"` parameter in `config.json` to `false`.
31
+
32
+
```bash
33
+
python export/export_vision.py
34
+
```
35
+
36
+
-### convert to rknn
37
+
38
+
1. After successfully exporting the ONNX model, you can use the `export/export_vision_rknn.py` script along with the `rknn-toolkit2` tool to convert the ONNX model to an RKNN model.
39
+
40
+
```bash
41
+
python export/export_vision_rknn.py
42
+
```
43
+
44
+
-### convert to rkllm
45
+
46
+
1. We collected 20 image-text examples from the MMBench_DEV_EN dataset, stored in `data/datasets.json` and `data/datasets`. To use these data, you first need to create `input_embeds` for quantizing the RKLLM model. Run the following code to generate `data/inputs.json`.
47
+
48
+
```bash
49
+
#Modify the Qwen2VL ModelPath in data/make_input_embeds_for_quantize.py, and then
50
+
python data/make_input_embeds_for_quantize.py
51
+
```
52
+
53
+
2. Use the following code to export the RKLLM model.
54
+
55
+
```bash
56
+
python export/export_rkllm.py
57
+
```
58
+
59
+
## 4. C++ Demo
60
+
In the `deploy` directory, we provide example code for board-side inference. This code demonstrates the process of "image input to image features," where an input image is processed to output its corresponding image features. These features are then used by the RKLLM model for multimodal content inference.
61
+
62
+
### 1. Compile and Build
63
+
Users can directly compile the example code by running the `deploy/build-linux.sh` or `deploy/build-android.sh` script (replacing the cross-compiler path with the actual path). This will generate an `install/demo_Linux_aarch64` folder in the `deploy` directory, containing the executables `imgenc`, `llm`, `demo`, and the `lib` folder.
The user can view the relevant runtime logs in the terminal and obtain the `img_vec.bin` file in the current directory, which contains the image features corresponding to the input image.
99
+
100
+
Multimodal Example
101
+
102
+
```
103
+
user: <image>What is in the image?
104
+
robot: The image depicts an astronaut on the moon, enjoying a beer. The background shows the Earth and stars, creating a surreal and futuristic scene.
robot: The RK3588 is a new generation of high-end processors with high computational power, low power consumption, strong multimedia capabilities, and rich data interfaces.
{"image_path": "data/datasets", "image": "1.jpg", "input": "Question: What is correct Python code to generate the content of the image?\nOptions:\nA. for x in range(6):\n print(x)\nelse:\n print(\"Finally finished!\")\n\nB. thisdict = {\n\"brand\": \"Ford\",\n\"model\": \"Mustang\",\n\"year\": 1964\n}\n\nprint(len(thisdict))\nC. x = 1\ny = 2.8\nz = 1j\n\nprint(type(x))\nprint(type(y))\nprint(type(z))\n\nD. fruits = [\"apple\", \"banana\", \"cherry\"]\nfor x in fruits:\n print(x)\nPlease select the correct answer from the options above. \n", "target":"D"},
3
+
{"image_path": "data/datasets", "image": "2.jpg", "input": "Question: What is correct Python code to generate the content of the image?\nOptions:\nA. class Person:\n def __init__(self, name, age):\n self.name = name\n self.age = age\n\np1 = Person(\"John\", 36)\n\nprint(p1.name)\nprint(p1.age)\nB. fruits = [\"apple\", \"banana\", \"cherry\"]\nfor x in fruits:\n print(x)\nC. x = min(5, 10, 25)\ny = max(5, 10, 25)\n\nprint(x)\nprint(y)\nD. a = 33\nb = 200\nif b > a:\n print(\"b is greater than a\")\nPlease select the correct answer from the options above. \n", "target":"D"},
4
+
{"image_path": "data/datasets", "image": "21.jpg", "input": "Question: Which one is the correct caption of this image?\nOptions:\nA. A man rides a surfboard on a large wave.\nB. a young boy barefoot holding an umbrella touching the horn of a cow\nC. A giraffe standing by a stall in a field.\nD. A stop sign that has been vandalized with graffiti.\nPlease select the correct answer from the options above. \n", "target":"B"},
5
+
{"image_path": "data/datasets", "image": "22.jpg", "input": "Question: Which one is the correct caption of this image?\nOptions:\nA. A narrow kitchen filled with appliances and cooking utensils.\nB. A person with glasses and a tie in a room.\nC. Tray of vegetables with cucumber, carrots, broccoli and celery.\nD. A pretty young woman riding a surfboard on a wave in the ocean.\nPlease select the correct answer from the options above. \n", "target":"A"},
6
+
{"image_path": "data/datasets", "image": "241.jpg", "input": "Hint: The passage below describes an experiment. Read the passage and then follow the instructions below.\n\nMadelyn applied a thin layer of wax to the underside of her snowboard and rode the board straight down a hill. Then, she removed the wax and rode the snowboard straight down the hill again. She repeated the rides four more times, alternating whether she rode with a thin layer of wax on the board or not. Her friend Tucker timed each ride. Madelyn and Tucker calculated the average time it took to slide straight down the hill on the snowboard with wax compared to the average time on the snowboard without wax.\nFigure: snowboarding down a hill.\nQuestion: Identify the question that Madelyn and Tucker's experiment can best answer.\nOptions:\nA. Does Madelyn's snowboard slide down a hill in less time when it has a thin layer of wax or a thick layer of wax?\nB. Does Madelyn's snowboard slide down a hill in less time when it has a layer of wax or when it does not have a layer of wax?\nPlease select the correct answer from the options above. \n", "target":"B"},
7
+
{"image_path": "data/datasets", "image": "252.jpg", "input": "Hint: People can use the engineering-design process to develop solutions to problems. One step in the process is testing if a potential solution meets the requirements of the design.\nThe passage below describes how the engineering-design process was used to test a solution to a problem. Read the passage. Then answer the question below.\n\nLaura and Isabella were making batches of concrete for a construction project. To make the concrete, they mixed together dry cement powder, gravel, and water. Then, they checked if each batch was firm enough using a test called a slump test.\nThey poured some of the fresh concrete into an upside-down metal cone. They left the concrete in the metal cone for 30 seconds. Then, they lifted the cone to see if the concrete stayed in a cone shape or if it collapsed. If the concrete in a batch collapsed, they would know the batch should not be used.\nFigure: preparing a concrete slump test.\nQuestion: Which of the following could Laura and Isabella's test show?\nOptions:\nA. if the concrete from each batch took the same amount of time to dry\nB. if a new batch of concrete was firm enough to use\nPlease select the correct answer from the options above. \n", "target":"B"},
8
+
{"image_path": "data/datasets", "image": "362.jpg", "input": "Hint: Native copper has the following properties:\nsolid\nnot made by living things\nfound in nature\nfixed crystal structure\nmade of the metal copper\nQuestion: Is native copper a mineral?\nOptions:\nA. no\nB. yes\nPlease select the correct answer from the options above. \n", "target":"B"},
9
+
{"image_path": "data/datasets", "image": "364.jpg", "input": "Hint: Plastic has the following properties:\nsolid\nno fixed crystal structure\nnot a pure substance\nmade in a factory\nQuestion: Is plastic a mineral?\nOptions:\nA. yes\nB. no\nPlease select the correct answer from the options above. \n", "target":"B"},
10
+
{"image_path": "data/datasets", "image": "448.jpg", "input": "Hint: Read the text.\nButterflies and moths are easily mistaken for each other, but one distinction between them often appears during their pupal stage. When most butterfly caterpillars reach full size, they attach themselves to a leaf or other object and shed their skin a final time, forming a chrysalis, a hard, shell-like skin, which protects the pupa inside. The chrysalis may be dull and rough or shiny and smooth, usually blending into its surroundings. Most moth caterpillars, by contrast, create a cocoon to protect the pupa, rather than forming a chrysalis. The cocoons usually resemble hard silk pouches, but some moths also incorporate materials like hairs and twigs.\nQuestion: Which term matches the picture?\nOptions:\nA. cocoon\nB. chrysalis\nPlease select the correct answer from the options above. \n", "target":"B"},
11
+
{"image_path": "data/datasets", "image": "477.jpg", "input": "Hint: Read the text.\nHeat transfer can occur in different ways. Two common ways are through conduction and convection. Conduction occurs when molecules from one object collide with molecules from another object. Burning your hand by touching a hot car door on a sunny summer day is an example of conduction.\nConvection is another form of heat transfer. When a liquid or gas is heated, the heated matter rises upward, away from the heat source. Hot bubbles rising in a pot of water boiling on a stove is an example of convection.\nQuestion: Which term matches the picture?\nOptions:\nA. conduction\nB. convection\nPlease select the correct answer from the options above. \n", "target":"B"},
12
+
{"image_path": "data/datasets", "image": "1231.jpg", "input": "Question: Which image is more brightful?\nOptions:\nA. The first image\nB. The second image\nPlease select the correct answer from the options above. \n", "target":"A"},
13
+
{"image_path": "data/datasets", "image": "1232.jpg", "input": "Question: Which image is more brightful?\nOptions:\nA. The first image\nB. The second image\nPlease select the correct answer from the options above. \n", "target":"A"},
14
+
{"image_path": "data/datasets", "image": "1085.jpg", "input": "Question: is this place crowded?\nOptions:\nA. yes\nB. no\nPlease select the correct answer from the options above. \n", "target":"A"},
15
+
{"image_path": "data/datasets", "image": "1086.jpg", "input": "Question: is this place crowded?\nOptions:\nA. yes\nB. no\nPlease select the correct answer from the options above. \n", "target":"A"},
16
+
{"image_path": "data/datasets", "image": "1128.jpg", "input": "Question: In this picture, are the two dolphins the same size?\nOptions:\nA. same\nB. Not the same\nC. Can't judge\nPlease select the correct answer from the options above. \n", "target":"B"},
17
+
{"image_path": "data/datasets", "image": "1129.jpg", "input": "Question: In this picture, are the two butterfly wings the same shape?\nOptions:\nA. same\nB. Not the same\nC. Can't judge\nPlease select the correct answer from the options above. \n", "target":"B"},
18
+
{"image_path": "data/datasets", "image": "1200.jpg", "input": "Question: What will happen next?\nOptions:\nA. the motorcyle is gonna go forward\nB. the motorcyle is gonna crash\nC. the motorcyle is gonna go backward\nD. both A,B, and C\nPlease select the correct answer from the options above. \n", "target":"B"},
19
+
{"image_path": "data/datasets", "image": "1201.jpg", "input": "Question: What will happen next?\nOptions:\nA. this person is gonna stay still\nB. this person is gonna keep walking\nC. this person is gonna fall into the water\nD. both A,B, and C\nPlease select the correct answer from the options above. \n", "target":"C"},
20
+
{"image_path": "data/datasets", "image": "1554.jpg", "input": "Question: The object shown in this figure:\nOptions:\nA. Is a colorless, flammable liquid that is commonly used as a solvent and fuel\nB. Has a boiling point of 64.7°C\nC. Can be toxic if ingested or absorbed through the skin\nD. None of these options are correct.\nPlease select the correct answer from the options above. \n", "target":"C"},
21
+
{"image_path": "data/datasets", "image": "1555.jpg", "input": "Question: The object shown in this figure:\nOptions:\nA. Is a lustrous, white metal that is highly reflective and ductile\nB. Has the highest electrical and thermal conductivity of all metals\nC. Has a boiling point of 2,162°C\nD. All of these options are correct.\nPlease select the correct answer from the options above. \n", "target":"D"}
0 commit comments